Static Reflection for Schema Types (FieldKey and FieldType)

Hi everyone,

I’m working on a proposal and a prototype to bring type-safe reflection to schema-defined types in Python. This aims to solve the “magic string” problem prevalent in libraries like SQLAlchemy, Pydantic, and various data frame tools, without incurring the implementation complexity caused by Python’s dynamic nature.

I would love to get the community’s feedback on the core concept before finalizing the PEP.

The Problem

In our project, we frequently write utility classes or functions that rely on string literals to reference object attributes. Without static reflection, we cannot validate these strings or infer the types they point to.

Consider a UserFieldGetter class designed to extract specific fields from a User. Today, we are forced to type this using str and Any, losing all type safety:

class UserFieldGetter:
    def __init__(self, field_name: str):
        # We cannot validate that field_name actually exists on User
        self._field_name = field_name

    def __call__(self, user: User) -> Any:
        # The return type is unknown (Any), leading to bugs downstream
        return getattr(user, self._field_name)

# This fails silently at runtime!
getter = UserFieldGetter("invalid_field")

Currently, we rely on a Mypy plugin to handle this.

The Proposal

I am proposing two new special forms for the typing module: FieldKey and FieldType.

These are restricted to “schema-defined” types:

  • Standard @dataclasses.dataclass
  • TypedDict and NamedTuple
  • Classes using @typing.dataclass_transform (e.g., Pydantic models, SQLAlchemy declarative bases)

By limiting the scope to these types, we can validly apply a “Closed World Assumption”—treating the set of fields as finite and known at analysis time—which is impossible for standard dynamic Python classes.

How it works

1. FieldKey[T]
Evaluates to a Literal union of all valid field names in T.

  • If T is a Union (A | B), FieldKey returns the intersection of fields (only keys present in all members).
  • It respects inheritance (MRO).

2. FieldType[T, K]
Evaluates to the type of the field named K on class T.

  • K must be a subtype of FieldKey[T].
  • If K is a union of literals, the result is the union of the field types.
  • It correctly handles Generics (e.g., extracting the type from Box[int] vs Box[str]).

Example Usage

Here is how the UserFieldGetter example looks with the proposal. The type checker now validates the field name and correctly infers the return type.

from dataclasses import dataclass
from typing import FieldKey, FieldType

@dataclass
class User:
    name: str
    age: int

class UserFieldGetter[K: FieldKey[User]]:
    def __init__(self, field_name: K):
        self._field_name = field_name

    def __call__(self, user: User) -> FieldType[User, K]:
        return getattr(user, self._field_name)

# Usage
name_getter = UserFieldGetter("name")
age_getter = UserFieldGetter("age")

user = User(name="Hello", age=1)

val1 = name_getter(user) # Inferred type: str
val2 = age_getter(user)  # Inferred type: int

# Static Error: "invalid" is not assignable to Literal['name', 'age']
invalid_getter = UserFieldGetter("invalid")

Why this approach?

Standard Python classes are open, attributes can be added at runtime, so KeyOf[Object] effectively degrades to str.

By restricting this to Dataclasses and Schemas, we leverage the existing static guarantees of these structures. This allows libraries to define type-safe APIs for filtering, ordering, and serialization without requiring custom plugins for every tool.

Prototype

To demonstrate the feasibility of this proposal, I have created a working prototype based on a fork of Pyright (disclaimer: it’s mainly generated by LLMs).

While this is not intended to be a reference implementation, it is now functional for experimentation. You can check out the branch to see FieldKey and FieldType inference in action:

Here is an example usage with the Pyright output:

from typing import TypedDict, TypeVar, FieldKey, FieldType, NamedTuple, Literal
from dataclasses import dataclass


class Foo(TypedDict):
    x: int
    y: str


def get_container_value[K: FieldKey[Foo]](container: Foo, key: K) -> FieldType[Foo, K]:
    return container[key]


reveal_type(get_container_value(Foo(x=1, y="hello"), "x"))
# information: Type of "get_container_value(Foo(x=1, y="hello"), "x")" is "int"
reveal_type(get_container_value(Foo(x=1, y="hello"), "y"))
# information: Type of "get_container_value(Foo(x=1, y="hello"), "y")" is "str"


@dataclass
class User:
    name: str
    age: int


class UserFieldGetter[K: FieldKey[User]]:
    def __init__(self, field_name: K):
        self._field_name = field_name

    def __call__(self, user: User) -> FieldType[User, K]:
        return getattr(user, self._field_name)


name_getter = UserFieldGetter("name")
age_getter = UserFieldGetter("age")

user = User(name="Hello", age=1)

reveal_type(name_getter(user))
# information: Type of "name_getter(user)" is "str"
reveal_type(age_getter(user))
# information: Type of "age_getter(user)" is "int"


invalid_getter = UserFieldGetter("invalid")
# error: Argument of type "Literal['invalid']" cannot be assigned to parameter "field_name" of type "K@UserFieldGetter" in function "__init__"
#     Type "Literal['invalid']" is not assignable to type "Literal['name', 'age']"
#       Type "Literal['invalid']" is not assignable to type "Literal['name', 'age']"
#         "Literal['invalid']" is not assignable to type "Literal['name']"
#         "Literal['invalid']" is not assignable to type "Literal['age']" (reportArgumentType)


class Result(NamedTuple):
    returncode: int
    reason: str | None


def get_reason(result: Result) -> FieldType[Result, Literal["reason"]]:
    return result.reason


reveal_type(get_reason(Result(0, "")))
# information: Type of "get_reason(Result(0, ""))" is "str | None"


T1 = TypeVar("T1")
T2 = TypeVar("T2")


@dataclass
class Box[T1, T2]:
    value1: T1
    value2: T2


def get_value1[T1, T2](box: Box[T1, T2]) -> FieldType[Box[T1, T2], Literal["value1"]]:
    return box.value1


reveal_type(get_value1(Box(value1=1, value2="hello")))
# information: Type of "get_value1(Box(value1=1, value2="hello"))" is "int"
reveal_type(get_value1(Box(value1="hello", value2=1)))
# information: Type of "get_value1(Box(value1=1, value2="hello"))" is "str"

Feedback Requested

I am finalizing the PEP text now, but I wanted to gauge sentiment on:

  1. Naming: Are FieldKey and FieldType clear? Is FieldName better than FieldKey?
  2. Semantics: Does the intersection rule for Unions (FieldKey[A | B] only returns shared keys) align with your expectations for safety?
  3. Annotated Types: Should FieldType preserve Annotated metadata (e.g., FieldType[T, "age"] returns Annotated[int, Gt(0)]) or strip it down to the base type (int)?
  4. Properties: Should FieldKey include @property definitions? Currently, I have excluded them to align strictly with dataclasses.fields() and declared schema fields, but I know some serializers (like Pydantic’s computed_field) treat them as fields.
  5. Performance: Computing FieldKey for large Unions or complex hierarchies could be expensive. Are there specific constraints or lazy-evaluation strategies type checker maintainers would recommend?

Thank you!

8 Likes

Is that the reason why previous proposals have failed to gain traction? Even if type checkers can’t know all the fields of a normal class, they can know those defined with

class C:
    x: int

so might that not be enough?

That said, I’m fine with restricting this to dataclasses – and TypedDicts – for the start.

Previous proposals were mostly about TypedDict. Here are some of them:

It probably makes sense to have one PEP that provides this functionality for both dataclasses and TypedDicts.

2 Likes

This is certainly a sensible suggestion! But stating that the ‘closed world assumption’ can be applied to all typed dicts and dataclasses is not necessarily accurate.

For example, for TypedDicts with extra_items,

class Foo(TypedDict, extra_items=ReadOnly[float]):
    x: int
    y: str

FieldKey[Foo] “Evaluates to a Literal union of all valid field names in T” - but any valid string is a valid key in Foo. However, FieldType[Foo, K] could easily be determined from the extra_items type.

And even for TypedDicts without extra_items, from the typing spec: “By default, TypedDicts are open, meaning they may contain an unknown set of additional items.”

Now, FieldKey[T] could evaluate to a Literal union of all specified field names in T, or some similar wording.

Also note that because of subclassing, any “shema-defined” type that is not decorated with @final could have extra items, but I think that’s a non-issue if you simply modify the wording to specify that FieldKey[T] is the valid keys on objects exactly of type T, ignoring subtypes.

I think restricting this to schema types is essential for soundness. If we support normal classes, we risk introducing a lot of false positives (flagging valid dynamic code as errors). So, I’m inclined to stick to the current restriction to schema types.

Thanks for pointing this out!

I plan to restrict FieldKey to explicitly defined keys only. If we allow any string, we lose the static safety against typos. Users can simply use str if they need to access undeclared extra items.

Accordingly, I will refine the FieldType logic:

  • Specific Key: FieldType[T, Literal['known_key']] → The declared type.
  • Generic String: FieldType[T, str] → Resolves to the extra_items type (if defined) or Any (if open).

Example:

from typing import TypedDict
from typing import FieldKey, FieldType

class Data(TypedDict, extra_items=int):
    name: str

# 1. Strict access (FieldKey)
# Constraints K to explicit keys only: Literal['name']
def safe_get[K: FieldKey[Data]](d: Data, k: K) -> FieldType[Data, K]:
    return d[k]

d: Data = {"name": "Alice"}

safe_get(d, "name")  # Inferred type: str
safe_get(d, "naem")  # Error. (FieldKey strictly reflects the declared schema to catch typos)

# 2. Dynamic access (str)
# Allows arbitrary strings, resolving to extra_items type if defined
def dynamic_get[K: str](d: Data, k: K) -> FieldType[Data, K]:
    return d[k]

dynamic_get(d, "random") # Inferred type: int (from extra_items)
1 Like

Could classes with __match_args__ be supported?

I find myself reaching for something like this a lot. Specifically because I usually want to wrap poorly or dynamically defined schemas in a regular dict so it isn’t strict at runtime. The runtime keys and types can then be tested against the schema (usually a `TypedDict`) for changes.

Soft failure is sometimes best, especially when dealing with APIs that change a lot. It means you can find errors using your type checker instead of relying on exceptions:

from collections.abc import Mapping
from typing import Any, TypedDict

class Schema(TypedDict):
    key1: str
    key2: int
    
class SchemaModel[S: Mapping[str, Any]]:
    def __init__(self, schema: S):
        # This is properly typed
        self._schema = schema
    
    def __getitem__(self, key: str) -> Any:
        return self._schema[key]

model = SchemaModel[Schema](get_api_response())

model['key1'] # Any
model._schema['key1'] # str

Allowing reflection on the schema type would mean you could elevate the typing more easily to a wrapper class:

class Schema(TypedDict):
    key1: str
    key2: int
    
class SchemaModel[S: Mapping[str, Any]]:
    def __init__(self, schema: S):
        # This is properly typed
        self._schema = schema
    
    def __getitem__(self, key: FieldName[S]) -> FieldType[S]:
        return self._schema[key]

model = SchemaModel[Schema](get_api_response())

model['key1'] # str
model._schema['key1'] # str

This could be an absolutely braindead way to manage something like this so I’ll happily listen to any and all criticism of why this is terrible idea or application of something like this. I just know I’ve used patterns like this before.

I think the __getitem__ has to be written more like this:

# ...
    def __getitem__[K: FieldName[S]](self, key: K) -> FieldType[S, K]:
        return self._schema[key]

otherwise it’s not clear what the connection between FieldName and FieldType is.

1 Like

I like the syntax here

But doesn’t this run into the issue of Generic constraints? Having the key be function generic, then binding it using FieldName and FieldType seems to avoid that.

# ...
    def __getitem__[K](self, key: FieldName[S, K]) -> FieldType[S, K]:
        return self._schema[key]

Here, K would just become a Literal union of all key names, and FieldType would behave very similarly to TypedDict. Something like this could replace @overload now that I think about it:

class Modes(TypedDict):
    rt: str
    rb: bytes

def read[Mode](*args: Any, mode: FieldName[Modes, Mode]) -> FieldType[Modes, Mode]:
    if mode == 'rt':
        return "Here's a string"
    if mode == 'rb':
        return b"Here's some bytes"
    else:
        raise ValueError(f'Unsupported mode: {mode}')

Not sure if this would make things more or less confusing. Since overloads make it clear exactly what each signature will give you.

Since FieldKey is treated as a special form, the type checker can resolve FieldKey[S] into a concrete Literal set once S is known.

I updated my Pyright prototype to verify this. While the implementation is still experimental, it demonstrates that this syntax is feasible. Here is the actual output from the branch:

from typing import TypedDict, TypeVar, FieldKey, FieldType, NamedTuple, Literal, Mapping, Any


class Schema(TypedDict):
    key1: str
    key2: int
    

class SchemaModel[S: Mapping[str, Any]]:
    def __init__(self, schema: S):
        # This is properly typed
        self._schema = schema
    
    def __getitem__[K: FieldKey[S]](self, key: K) -> FieldType[S, K]:
        return self._schema[key]


def get_api_response() -> Schema:
    return Schema(key1="hello", key2=1)


model = SchemaModel[Schema](get_api_response())


reveal_type(model['key1'])
# information: Type of "model['key1']" is "str"
reveal_type(model['key2'])
# information: Type of "model['key2']" is "int"
1 Like

In my experience, __match_args__ is often used selectively for only the most common positional arguments. Using it as a source for FieldKey would likely result in an incomplete schema reflection, causing valid attributes to be incorrectly flagged as errors. For this reason, I am currently inclined not to support __match_args__.

I’ve been longing for something like this for a long time! Returning to Python from TypeScript, Python’s type system often feels limiting, and having something like keyof is particularly painful.

I think the names you’ve chosen are fine. Given that the operator is supposed to work for classes and dicts (as I understand), the naming can’t be perfect because classes have fields (or attributes, not sure which term is more prevalent) with string names while dicts have keys which can be any hashable type.

I’m wondering whether it would be better to separate the two. So FieldName and FieldType for classes and KeyType and ValueType for dicts. This might make these operators more useful for other things. ValueType could be valid for all collections, KeyType for all mappings.

100%, this is very important to have to be able to create functions that can operate safely on types with shared keys. In case you need the union of all allowed keys you can use FieldName[T1] | FieldName[T2].

I’m not sure what use preserving metadata would have. I would vote against this intuitively.

That’s a tricky one. When you define a property you usually want it to act just like any other field. But I feel it would be smarter to go for the more restrictive solution now and add new special forms PropertyName and PropertyType or something later. You can use a union if you want to act on fields and properties the same.

I can’t help you with the last one, unfortunately.

Thank you for the initiative!

1 Like

There is actually a proposal, Proposal: KeyType and ElementType for TypedDicts (also, Map), which served as a key reference for me during my investigation. Maybe we can defer the TypedDict specifics to that proposal and limit the scope of FieldKey and FieldType strictly to dataclass and dataclass_transform, calling them FieldName and FieldType instead.

Libraries like pydantic and hypothesis make heavy use of the Annotated type, and stripping that metadata out could cause issues with their annotation processing.

Let’s say you have a pydantic model that annotates a field as Annotated[int, Lt(255)] then you use FieldName and FieldType to extract that type. You now have just int and any downstream validations will lose that annotated context.

2 Likes

Do you have any more specific examples of why it would be good to separate these ideas?
It seems to me that remembering the exact spellings of all four names and when they need to use would just be cumbersome.

Technically they are different concepts (attributes are not keys), so I’m not against it. I’d just like to understand if there’s any practical reason.

1 Like

Would it maybe be better to just decouple it from the concepts of keys and attributes? Something like ItemName and ItemType so you don’t have the baggage of an existing concept and also don’t need 4 names?

Would you be open to also the idea of FieldType[T] or a similar concept like FieldValue[T] (in addition to FieldKey[T] and FieldType[T, K], not instead). It would be a union of the value/member types? e.g. FieldValue[User] == str | int .

That’s definitely an approach to consider.

My main dilemma right now is whether to keep this PEP focused solely on dataclass and dataclass_transform, or to cast a wider net to cover similar patterns. I plan to reach out to the author and the participants of the discussion at Proposal: KeyType and ElementType for TypedDicts (also, Map) · python/typing · Discussion #1412 · GitHub for their input. Perhaps we could collaborate on a joint PEP, or I could defer the TypedDict-related aspects to their proposal."

2 Likes

Of course. Do you have a specific use case in mind where FieldValue[User] would be useful?

I’m the author of that, but I’m not really interested anymore in pursuing that proposal further. I don’t work much with TypedDicts anymore.

But I would expect that other people are interested in working on a PEP about this. The topic has sparked many discussions over the years.

3 Likes