This is the discussion to accompany the draft PEP 746.
Background
Pydantic is planning to introduce a new “pipeline” API that allows for more expressive and type safe definition of constraints, transformations and parsing:
from typing_extensions import Annotated
from pydantic import BaseModel, parse
class User(BaseModel):
username: Annotated[str, parse(str).transform(str.lower)]
birthday: Annotated[int, (pars(int) | parse(str).transform(str.strip).parse(int)).gt(0)]
We think this API can be a huge improvement over the current status quo.
However while this API is type safe within it’s own boundary it is not type safe across the entire type system since metadata in PEP593 Annotated currently has no way of interacting with the type system.
There are 4 things that I think would make this better, not only for Pydantic but also for other users of runtime type checking and even for some of the examples given in the original PEP 593 proposal.
I am going to list them in order of what I think is clearest / easiest to implement to more ambitious / not sure about. While some of these can probably be handled on their own I want to present the whole picture in hopes that we can cover multiple in one go or avoid shooting ourselves in the foot by making future work harder.
Type checking out result type
This means giving metadata a mechanism to declare the types it can be applied to. For example, consider this case similar to the examples given in PEP 593:
Annotated[int, Int64()]
This is fine but there is nothing stopping someone from doing:
Annotated[str, Int64()]
Which clearly is wrong. Currently Int64
has no way to declare that it can be applied only to ints.
Similarly for Pydantic:
Annotated[int, parse(str).transform(str.strip)] # user forgot `.transform(int)` ?
This could also be used to give basic type checking of the applicability of refinement to types:
Annotated[int, Gt(0)]
As currently implemented by Pydantic, msgspec and Hypothesis (Pydantic and Hypothesis use annotated-types for this, msgspec uses its own but equivalent metadata).
@Jelle proposed that we add a TypedMetadata[T]
class to typing.py
that would look something like:
class TypedMetadata[AnnotatedT]:
pass
Then type checkers would special case AnnotatedT
so that it must match the type of the field. Int64
could then be defined as:
class Int64(TypedMetadata[int]):
pass
And type checkers would flag Annotated[str, Int64()]
as invalid.
Pydantic could do something like:
class OutputMarker[T](TypedMetadata[T]): ...
def parse[T](func: Callable[..., T]) -> OutputMarker[T]: ...
And type checkers would catch the above case.
Context for field type
It can be useful in cases to have information (statically) about the type of the field. Consider:
from pydantic import transform
Annotated[str, transform(lambda x: x.lower()]
↑ this type is not known statically, it would be nice to default it to `str`
If we were sticking with the pre PEP 695 type parameter syntax I would say this could be implemented by adding AnnotatedT
to typing.py
directly (instead of TypedMetadata[AnnotatedT]
) and type checkers would special case that to have the value of the real type in Annotated
with the context of it’s metadata.
But I am not sure how to solve this in a world where we don’t declare type variables directly.
Type checking of input types
While the above proposals help with PEP 712 like use cases (which Pydantic and other similar libraries have) because it allows type checking the output it does not allow declaring that a field typed as Annotated[int, parse(str).transform(str.strip).parse(int) | parse(int)]
can accept a str
or an int
as it’s input.
These behaviors are library specific and may not be expressible in a sane way via the type system, which is what I think is a major problem with how things are proposed in the converters PEP (which also narrows the use case too much IMO).
The only way I can see to solve this would be to have another special cased class or type variable that marks the allowed input types. So then parse(str).parse(int)| parse(int)
would return an object that is parameterized with AnnotatedT: int
and FieldInputT: str | int
or something like that. I’m not sure how feasible this is and it seems like this might be more work to integrate into the type system. But the nice thing about a proposal like this is that it’s much more lax about the API implementation both from a library / user perspective and from the type systems, leaving a lot more room for innovation and future development.