PEP 746: TypedMetadata for type checking of PEP 593 Annotated

adriangb · May 21, 2024, 1:25am

This is the discussion to accompany the draft PEP 746.

Background

Pydantic is planning to introduce a new “pipeline” API that allows for more expressive and type safe definition of constraints, transformations and parsing:

from typing_extensions import Annotated

from pydantic import BaseModel, parse

class User(BaseModel):
    username: Annotated[str, parse(str).transform(str.lower)]
    birthday: Annotated[int, (pars(int) | parse(str).transform(str.strip).parse(int)).gt(0)]

We think this API can be a huge improvement over the current status quo.
However while this API is type safe within it’s own boundary it is not type safe across the entire type system since metadata in PEP593 Annotated currently has no way of interacting with the type system.

There are 4 things that I think would make this better, not only for Pydantic but also for other users of runtime type checking and even for some of the examples given in the original PEP 593 proposal.
I am going to list them in order of what I think is clearest / easiest to implement to more ambitious / not sure about. While some of these can probably be handled on their own I want to present the whole picture in hopes that we can cover multiple in one go or avoid shooting ourselves in the foot by making future work harder.

Type checking out result type

This means giving metadata a mechanism to declare the types it can be applied to. For example, consider this case similar to the examples given in PEP 593:

Annotated[int, Int64()]

This is fine but there is nothing stopping someone from doing:

Annotated[str, Int64()]

Which clearly is wrong. Currently Int64 has no way to declare that it can be applied only to ints.
Similarly for Pydantic:

Annotated[int, parse(str).transform(str.strip)]  # user forgot `.transform(int)` ?

This could also be used to give basic type checking of the applicability of refinement to types:

Annotated[int, Gt(0)]

As currently implemented by Pydantic, msgspec and Hypothesis (Pydantic and Hypothesis use annotated-types for this, msgspec uses its own but equivalent metadata).

@Jelle proposed that we add a TypedMetadata[T] class to typing.py that would look something like:

class TypedMetadata[AnnotatedT]:
    pass

Then type checkers would special case AnnotatedT so that it must match the type of the field. Int64 could then be defined as:

class Int64(TypedMetadata[int]):
    pass

And type checkers would flag Annotated[str, Int64()] as invalid.

Pydantic could do something like:

class OutputMarker[T](TypedMetadata[T]):    ...

def parse[T](func: Callable[..., T]) -> OutputMarker[T]: ...

And type checkers would catch the above case.

Context for field type

It can be useful in cases to have information (statically) about the type of the field. Consider:

from pydantic import transform

Annotated[str, transform(lambda x: x.lower()]
                                                  ↑  this type is not known statically, it would be nice to default it to `str`

If we were sticking with the pre PEP 695 type parameter syntax I would say this could be implemented by adding AnnotatedT to typing.py directly (instead of TypedMetadata[AnnotatedT]) and type checkers would special case that to have the value of the real type in Annotated with the context of it’s metadata.
But I am not sure how to solve this in a world where we don’t declare type variables directly.

Type checking of input types

While the above proposals help with PEP 712 like use cases (which Pydantic and other similar libraries have) because it allows type checking the output it does not allow declaring that a field typed as Annotated[int, parse(str).transform(str.strip).parse(int) | parse(int)] can accept a str or an int as it’s input.

These behaviors are library specific and may not be expressible in a sane way via the type system, which is what I think is a major problem with how things are proposed in the converters PEP (which also narrows the use case too much IMO).

The only way I can see to solve this would be to have another special cased class or type variable that marks the allowed input types. So then parse(str).parse(int)| parse(int) would return an object that is parameterized with AnnotatedT: int and FieldInputT: str | int or something like that. I’m not sure how feasible this is and it seems like this might be more work to integrate into the type system. But the nice thing about a proposal like this is that it’s much more lax about the API implementation both from a library / user perspective and from the type systems, leaving a lot more room for innovation and future development.

mikeshardmind · May 21, 2024, 3:20am

If we start with the simplest of your examples, it seems that range validation is becoming common enough that people want to be able to have tooling do it for them. Other dynamic languages have this (eg. Elixir guards (and this is incorporated into various type systems)), some implicitly, some explicitly, taking your int64 example…

Annotated[int, Int64()]

It would be better if in the type system, not just in annotated, we could express more refinement types, and therefore also get appropriate static checks.

(throwaway example of one possible syntax)

type Int64 = int, value where -9_223_372_036_854_775_808 <= value <= 9_223_372_036_854_775_807

(The elixir link up above is also probably a good place to look for what kind of comparisons to allow)

This plays nicely with type theory, with what can be statically analyzed, and with what people have use cases for at runtime, so it would be nice if this didn’t need to be stuck into a runtime-only construct.

It’s also worth noting that Python’s type system already has limited support for refinement types via Literal

I must admit that I find the example that does transformations on the values, rather than only validation of the types of values significantly less compelling, and this is not how I write typed code. Transformation is a separate step that is only valid after confirming you have an appropriate input.

So while I understand the desire to map input and output types here, I don’t see a sensible way for this to be a part of some pipeline with annotated within annotated. Conceptually, it seems like that pipeline should exist and accept a value and a type or annotated instance and return the transformed value, which I believe would be typable with just the pending typeform pep.

tmk · May 22, 2024, 1:15pm

Something like Annotated[int, Gt(0)] makes sense to me as augmented type information, but putting transformations into Annotated seems weird to me. And it also seems weird to me to make static type checkers pay attention to the second argument of Annotated, which was specifically intended to be ignored by static type checkers.

You mention PEP 712 as comparison but I don’t fully understand what specifically you don’t like about it. The syntax of PEP 712 at least seems nicer to me.

adriangb · May 22, 2024, 2:14pm

I agree that it would be nice if more of these things were built into the type system, and where it can Pydantic uses existing type system constructs (eg it went from having its own runtime only way of expressing literals to using and recommending typing.Literal), but how do we get there? I believe that one possible route is for the community to experiment with things like annotated-types and gradually integrate things into the type system. For example, let’s say Annotated[int, Interval(0, 100)] becomes the de-facto standard for expressing this at runtime (it’s the closest thing we have right now to a standard at least). Type checkers could then begin to support it and eventually a PEP could be written to land it in the language itself. Then at some point we decide Annotated[int, Interval(0, 100)] is too verbose so we invent special syntax for Annotated: type X = int, Interval(0, 100). Now that’s looking pretty close to your proposed syntax. In Summary, I agree that having dedicated language features for these sorts of things makes sense, but I also think those can be hard to get right and it might be better community tools to iterate towards a solution than try to introduce new syntax to the language out of the gate. Hence why I’m asking for more minor changes that can enhance this iteration.

Regarding transformations in type annotations. I understand it may be a bit strange but the reality is that it is often by far the easiest way to do some very useful and simple things. The line between obviously good and obviously bad may not be as black and white as it seems at first. I would argue that being able to parse a datetime or UUID object from a wire protocol like JSON that doesn’t have a native type is an extremely useful and sensible feature. But it requires a transformation, even if Pydantic or whatever library hides it from you. But not all such transformations are universal and expressible via a single type. You may have a payload that serializes enumeration as integers, but you want to map it to a typing.Literal because employee_role=“manager” is much better than employee_role=3. Often the “vanilla” way to handle this would be to parse into a non-transformed model, then make a separate model with the output of transformations and have a function that applies the transformations for you. That is a lot more verbose, more error prone, has worse locality of behavior and forces you to create a type hierarchy to share fields (if only we had a partial type), etc. I don’t recommend nor do I think people are going to start putting complex logic in functions inside of type hints but something simple like Annotated[int, parse(int) | parse(str).strip().parse(int)] seems both useful and understandable to me.

adriangb · May 22, 2024, 2:19pm

I don’t intend this thread to turn into a PEP 712 discussion but I did mention it so I will say that PEP 712 just would not have been useful for Pydantic. It’s too narrow of a use case and implementation. I believe the proposal here is more general and would satisfy not only Pydantic but also other existing and future use cases. That said I expect that explaining to a type checker that a parameter can have a different input type than the field type based on some function or parsing logic is hard in the general case, hence why I put it lower down on the list.

mikeshardmind · May 22, 2024, 2:31pm

The thing I don’t get here is in Anntoated, you can already create this type if you want. This doesn’t require a pep or standard library inclusion to be useful unless you want it as part of the type system (which you have indicated you do, you want type checkers to check that the types match), but placing this in the type system seems to, as someone else already put it, violate the point of Annotated.

Getting this right shouldn’t be hard, there are other languages to look to for refinement types, and there are already examples across various libraries that are all reimplementing the same things in a different ways. While your examples come from the pydantic PoV, other similar libraries are also supporting a small, but sensible set of constraints, msgspec’s support for comparison

As for type theory, this one is actually extremely simple.

A refinement type is a subtype of the type which it was refined from. the subtyping relation between multiple refinement types is calculable from whether or not one type’s possible values are a strict subset of the other type’s

adriangb · May 22, 2024, 2:57pm

I don’t think this is true.

This PEP adds an Annotated type to the typing module to decorate existing types with context-specific metadata. Specifically, a type T can be annotated with metadata x via the typehintAnnotated[T, x] . This metadata can be used for either static analysis or at runtime

I think the PEP makes it clear that static typing may use the metadata. It doesn’t restrict how.

adriangb · May 22, 2024, 3:03pm

I agree that refinement types are the simplest form of this and that is why I put it at the top of the request. My proposal would also benefit msgspec (@jcristharif please check me on that). It seems that you agree with this but don’t think it’s worth taking the step of making the existing and widely used Annotated[int, Gt(0)] type checkable (even if the refinement isn’t enforced, that can always be a future step taken on a case by case basis) and instead we should be introducing refinement types, along with new syntax, into the language? Keep in mind that my proposal can be back ported to Python 3.8, refinement types wouldn’t benefit users for years to come. And they aren’t mutually exclusive.

mikeshardmind · May 22, 2024, 3:10pm

The entire motivation section of that pep then goes on to point exclusively at cases outside the type system and using this as a surrogate for things yet to be included in the type system, and does not suggest that type checkers would be the consumer of the annotations, and even directly states (including what has survived to the living specification) that tools may choose what annotations they consume, this seems to preclude mandating that this be part of the type system to me when viewed with proper context.

I don’t think there’s anything preventing a backport via something as simple as:

typing_extensions.Refinement[int, "< 5"]

this might not be the ideal expression of this, but considering that I imagine refinement types would be limited in what they allow for valid constraints, not allowing arbitrary constraints, it’s certainly not the worst.

Maybe that’s even just a better form in general than the specific new syntax, but I don’t really like it being “stringly-typed”

adriangb · May 22, 2024, 3:14pm

It’s not clear to me if you think Annotated[int, Gt(0)] seems like a reasonable way to implement refinement types or not. I don’t see much advantage to Refinement[int, “…”]

mikeshardmind · May 22, 2024, 3:16pm

Refinement[int, Gte(0), Lt(512)] would be fine too if the list of allowed comparisons are all supported, doesn’t actually need to be new syntax if you think new syntax would be the barrier here, I just don’t think that overloading Annotated into now also being something that type checkers must enforce is the right way forward on this.

mikeshardmind · May 22, 2024, 3:17pm

Well, if we look at the typeform pep that’s pending, you’d lose the refinements when processing annotated…

adriangb · May 22, 2024, 3:20pm

Could you quote the section of the living spec or PEP that says type checkers should ignore this metadata? From my view the PEP states that (1) type checkers may use the metadata and (2) any tool (static or runtime) should ignore annotations it doesn’t recognize.

@Jelle was the one that proposed that type checkers should be forced to analyze this metadata: PEP 746: TypedMetadata for type checking of PEP 593 Annotated by adriangb · Pull Request #3785 · python/peps · GitHub. Jelle correct me if I misunderstood.

And I’m not proposing we force type checkers to interpret any particular piece of metadata in there, right now. I think there may be a place for that but I’m not going to go into that debate here because it’s not what I’m asking for and IMO should only come after other things settle first.

adriangb · May 22, 2024, 3:22pm

That’s a good point I’ll make a mental note of looking at that PEP again. But I doubt that pointing out a hypothetical future use case is going to stop something useful today. I suspect we could always extend type form to support transmitting that information.

mikeshardmind · May 22, 2024, 3:26pm

I disagree on your interpretation of each of (1) and (2), but this comes down to interpretations and you’ve already gotten at the two places where the interpretation would matter.

refinement types aren’t simply metadata, this is type information.
this is the part where I think the spec gives room for a type checker to decide to ignore this and be compliant.

I’m 100% certain that this would require further special casing rather than just be something where the refinement type is it’s own type.

The pep’s current language wants to imply that any use of annotated is only interested in the first parameter, and even my suggested amendment to it wouldn’t have a way to handle selecting out only some annotations to retain as something expressible in the type system

mikeshardmind · May 22, 2024, 3:30pm

I don’t see any advantage for something which will exist in the type system to use Annotated here, and I do see disadvantages.

I would expect that with an actual refinement construct anything like this:

Annotated[Refinement[int, Gte(0)], Rename(camel_case=True)]

should still be fine, and while this looks more verbose inline, I expect real use would look more like

PositiveInt = Refinement[int, Gte(0)]

class Payload:
    snake_case_name: Annotated[PositiveInt, Rename(camel_case=True)]

adriangb · May 22, 2024, 6:51pm

I’d like to summarize your points to see if I understand them correctly and as a checkpoint for future readers.

Any sort of refinements (Gt, Pattern, etc.) would be better served by adding refinement types to the language via a construct like Refinement[int, Gte(0)] instead of trying to get type checkers to understand Annotated[int, Gte(0)], thus taking any steps towards making Annotated[int, Gte(0)] more ergonomic or usable are wasted effort.
The other use cases proposed (e.g. transform(str.strip)) should not be done via type annotations, not even as metadata in Annotated and thus we shouldn’t consider any feature that are meant to enhance that usage.

Does that sound right to you?

mikeshardmind · May 22, 2024, 7:15pm

Not quite, the rationale is wrong in both cases, and while I disagree with the second case, I don’t have a strong opposition to it, it’s the first case that I take a stronger issue with the problems I see with it…

It’s better to have refinement types as an actual type construct, and not simply as annotation metadata that is special-cased as something that type checkers should need to check, and which may require further special casing as it interacts with other issues regarding annotated. This will compose better with other typing features, including other ones that pydantic was looking for.

I have a feeling that there’s a better way to support this than putting it into annotated if you want this to be type-checked statically, but I’m more okay with this one being part of annotated. The refinements are more generally useful, whereas these typed transform functions /pipelines are rather narrow in the things that would use them (specifically, this limits the scope to libraries which use annotations at runtime and opt into processing these), so any issue if we find a better way or something where it does compose poorly is much more limited in scope.

adriangb · May 22, 2024, 7:23pm

I’m sorry I’m having trouble understanding your points. Thank you for clarifying. I’ll focus on refinement types since that is what you seem to be most concerned with.

Would you be willing to back a concrete proposal for refinement types as you see them? I want to make sure we’re not choosing to hold out on improving an existing useful pattern that many, many Python users are using in favor of a possible future feature that has no strong backing. Can’t we make progress with what we have right now and then augment it with better support in the type system later if they are not mutually incompatible?

mikeshardmind · May 22, 2024, 7:31pm

Yes, including contributing time on specification language if that’s needed.

Very understandable. As far as I see it, I have no strong opposition to your second point here, and the first one is only in ensuring that we specify it in a way where it composes properly, not to change the actual thing being enabled.

Usually, I’d agree, but in a foreseeable case like this one, I’d rather not out of consideration for both the people who will need to implement support for then both of these into type checkers, as well as potential churn for library authors and devs.

As for foreseeable incompatibility:

TypeForm: Spelling for a type annotation object at runtime

FWIW, the proposed handling of TypeForm in combination with Annotated[...] is already defined to work that way, stripping out the metadata component:
count: int | str = -1
if ismatch(count, Annotated[int, ValueRange(1, float('inf'))]):
    assert_type(count, int)  # NOT: Annotated[int, ...]
else:
    assert_type(count, int | str)

There’s another pep being drafted and revised as we discuss this one that would interact poorly