Type checking Annotated metadata

adriangb · July 14, 2023, 4:40pm

PEP 593 introduced Annotated as a way to attach metadata to types that type checkers ignore. This has been a huge boon for runtime type checking libraries like pydantic since it lets us replace horrid hacks like foo: constr(pattern=r”[0-9]+”) with Annotated[str, Pattern(…)].

I feel that one thing that is missing is type checking between the metadata and the type it’s attached to. For example, Annotated[int, Pattern(…)] makes no sense. The same thing applies to various examples in the PEP. Of course libraries like Pydantic could try to check this at runtime but we’d essentially be inventing type checkers all over again. If we’re going to recommend these APIs to users it’d be great if they worked well with the tooling that already exists and that they use.

Could we introduce some sort of type variable or other mechanism for metadata to declare that in must be of a certain type? In particular an API like Annotated[int, accept(int) | accept(str).transform(str.strip).transform(int)] that verifies that the output of the transformations is an int would be an amazing developer experience.

Obviously this would have to be opt-in as to not break any existing stuff.

ajoino · July 14, 2023, 6:22pm

I don’t think I understand what your proposal is, and I’m especially confused by this part

Isn’t the point of pydantic that it does data validation, which includes runtime type checking? To me it makes sense that pydantic would check if the annotation is valid for the type. I haven’t tested v2 but I assume this is how it works.

Could you add some examples of how your proposed feature would work, both of passing and failing type checking?

adriangb · July 14, 2023, 9:55pm

The point of Pydantic is to validate unknown data at runtime, not to do static type checking on the users code at runtime. They’re two different things. Even if Pydantic wanted to do that, it is not currently designed in a way in which it could and the changes necessary would be the equivalent of building a type checker.

Sure, I’ll give some examples from a user’s perspective without being prescriptive about any sort of implementation.

Pydantic currently provides a thing called an AfterValidator which is metadata that goes into Annotated (docs). This is just an example of metadata that goes in Annotated but like I said before even the PEP has examples this could be applicable to. All this does at runtime is call the user-provided function with the value after verifying that it is an instance of the type or coercing it to one. So in the case of Annotated[int, AfterValidator(lambda x: x * 2)] the lambda will never get called for the JSON data '"abc"'.

I’d expect that Annotated[str, AfterValidator(str.lower)] is valid but Annotated[int, AfterValidator(str.lower)] is not.

I don’t have any strong opinions on how this is achieved but I’ll give one idea just as a starting point. AfterValidator can return an object that implements __parametrize__(tp: Type[T]) -> None where the actual type must be valid as T. Then AfterValidator could look something like:

class Aftervalidator(Generic[T]):
    def __init__(self, f: Callable[[T], ...]) -> None:
        ...
    def __parametrize__(self, tp: Type[T]) -> None:
       ...

__parametrize__ is never called at runtime or anything but type checkers will throw an error if it is present on the metadata and would not be callable with the type.

ajoino · July 15, 2023, 10:07am

So if I understand you correctly you want to formalize somesort static-typing connection between the first and the rest of the arguments to Annotated? I don’t think that should fall on Python’s shoulders, as (to my limited understanding, I have never used Annotated) any behaviour is supported. You also run the risk of essentially duplicating your logic, once in the type system and once in the actual code that gets executed. I think this is where you’d want plugins for the type checkers rather than extending the language.

adriangb · July 15, 2023, 3:13pm

Yes

You can put arbitrary objects in Annotated but I don’t see why that means there can be no type checking. It still has to be valid Python.

I’m not sure what you mean by this. I am precisely asking for a solution that avoids anyone duplicating logic.

ajoino · July 15, 2023, 10:07pm

Yeah you can scratch the last point.

I don’t think this is a bad idea (it sounds similar to contract based programming in Ada which I find interesting [And apparently it has been suggested before]), but I see some issues with adding this feature:

There is no consensus on what kinds of features should be checked for
There is no consensus on the design of the objects describing those checks
AFAICT there is little demand for this feature (for static typing)

I haven’t seen Annotated being used in the wild yet (only pydantic v2 examples using annotated-types). That leads me to believe that the use of Annotated is rare (unless the code I look at is an outlier) and the community needs time to figure out if this feature is common and useful enough to be included in the stdlib, and then what a good design for that feature might be. Until we know what we want I think it’s best to let the community experiment.

ajoino · July 15, 2023, 10:19pm

@Jelle @AlexWaygood have design-by-contracts or similar ideas been brought up in the python-typing community before?

adriangb · July 23, 2023, 10:57am

I will say that there may be little demand now but I anticipate there being a lot of demand soon. We added this feature to Pydantic v2 and so far we’ve seen nothing but positive feedback. Same for FastAPI and other libraries that are starting to use Annotated. These packages get millions of downloads, usage is going to ramp up very quickly. Once usage is more widespread and in larger projects people are going to start hitting the sorts of bugs that static type checkers would have prevented.

As far as there not being consensus on what the actual feature should be or how it should work, totally agree, that’s why I’m trying to get the ball rolling on a discussion so that we can start thinking about it even if it doesn’t happen immediately.

mdrissi · July 23, 2023, 12:09pm

Here’s one way this could be done. Define a method called __annotate__ and allow subclassing Annotated like,

class MyAnnotated(Annotated):
  def __annotate__(self, typ: type, metadata: object,  other_metadata: object):
    ...

annotate here would define the type arguments allowed to Annotate[…]. Only positional arguments would be allowed and existing Annotated is like signature I put where it takes at least 2 arguments, first must be type, rest no restrictions. To define custom annotated that only allows str as first type then,

class Pattern(Annotated):
  def __annnotate__(self, typ: type[str], pattern: str):
    ...

This would then be used like,

foo: Pattern[str, r"[0-9]+"] # Type checker would just treat this as str.

The first argument of Annotated must always be annotated with type, but can be used to restrict type by having type[X]. Other arguments can be constrained as needed. From type checker view it can always treat any Annotated subclass as only part of type for foo is first argument. annotate only adds addition type constraints for that specific subclass usage. subclassing here is mainly just to specify expected type signature of Annotated arguments.

For pattern specifically that might work in type system today using phantom types. In PEP 695 (new generic syntax),

type Pattern[T: type[str], P: str] = T

foo: Pattern[str, Literal["[0-9]+"]]

foo type simplifies to just str since type alias doesn’t use P, but at runtime the full annotation is available still. This also type checks that first argument to Pattern is str/subtype and second argument is of type str. This does really on Literal allowing str values to be used and if you wanted to allow floats as 1 argument instead wouldn’t work out. If type vars allow something similar to bound except where values were allowed instead of types (bound_value) so that you could do (mixing old type var vs new type var syntax),

T = TypeVar('T', bound=float)
f = TypeVar('f', bound_value=float)

FloatConstraint[T, f] = T
FloatConstraint[float, 5.0] # Valid

This looks somewhat like dependent types, but if you forbid f (any type variable using bound_value) from being on right hand side of type alias most of dependent type complexity (and power) goes away.

I do have internal library that’s been making heavy usage of Annotated and could make use of this. In my case I only have one common constrained Annotated that’d have pretty simple signature of second argument is always a str, third argument is optional and if present specific metadata type. It would mostly catch typos where I accidentally put an extra comma in a long string. __annotate__ way while I think it’d work feels a bit more complex then needed as generic arguments aren’t really a function signature (keyword arguments don’t exist). Other aspect is I know Annotated is weird in type system as one of the only things that allows non type arguments to be in annotations and tends to have edge cases for type checkers to deal with. So I’d guess on static typing side simplest way to support this would be desirable vs having more weird rules.

Melendowski · July 23, 2023, 5:33pm

typing.Annotated makes me wish that the keyword argument __getitem__ pep was accepted. It would use cases like this very smooth imo. Instead of requiring the individual to instantiate the object in the 2nd or beyond argument in annotated you just pass it as a keyword.

E.g. instead of Annotated[str, Pattern(…)]

It would be Annotated[str, pattern=...]

And let the consumer of that type hint determine what to do with pattern instead of instantiating the Pattern class directly.

adriangb · July 24, 2023, 8:32am

Wouldn’t that make it worse for type checking? The PEP was also not accepted so it’s probably worth limiting discussing it to avoid derailing this thread further.

Melendowski · July 24, 2023, 11:57am

I was talking more on the run time application that pydantic and the like having been using it for but you’re correct, best not bring up a rejected pep.

Tinche · July 24, 2023, 4:00pm

I think this is a useful proposal, although we might want to let the community play with Annotated a little more to see if any other patterns crop around it. I’ve personally used it in a couple of projects and was happy with it.

As for how to get there, maybe there’s something generic in typing to inherit from for classes that are meant to be used with Annotated?

from typing import AnnotationItem

class Pattern(AnnotationItem[str]):
    ...

class MyClass:
    a: Annotated[int, Pattern(...)]  # error
    b: Annotated[str, Pattern(...)]  # works

adriangb · July 26, 2023, 11:16am

A snipped I discussed today in a call with a colleague. Kind of a prototype for what we think Pydantic could allow:

from typing import Annotated, Callable, Generic, Protocol, TypeVar, AnnotatedTypeVar

OutputType = AnnotatedTypeVar('OutputType')
T = TypeVar('T')
NextOutptutType = TypeVar('NextOutptutType')


class SupportsLen(Protocol):
    def __len__(self) -> int:
        ...


SupportsLenType = TypeVar('SupportsLenType', bound=SupportsLen)


class Validation(Generic[OutputType, T]):
    def __init__(self, input: type[T]) -> None:
        ...

    def transform(self, f: Callable[[T], NextOutptutType]) -> 'Validation[OutputType, NextOutptutType]':
        ...

    def check(self, f: Callable[[T], bool]) -> 'Validation[OutputType, T]':
        ...

    def check_len(
        self: 'Validation[OutputType, SupportsLenType]', min: int = 0, max: int | None = None
    ) -> 'Validation[OutputType, SupportsLenType]':
        ...


Annotated[
    int,
    Validation(int | str)
    .transform(str)
    .transform(lambda x: x + 'a')
    .check_len(10)
    .transform(str.strip)
    .check(lambda x: x.count('a') == 1)
    .transform(len),
]

This would be completely type safe if the feature request here were implemented.

At runtime AnnotatedTypeVar could just be an alias for TypeVar and it would be up to type checkers to parameterize it with the correct type.

This is partially inspired by TypeScript’s zod which is much more type safe than anything similar in Python.

adriangb · August 28, 2023, 1:15pm

The above example has a bit messy type vars, here’s an updated example:

from typing import TYPE_CHECKING, Annotated, Callable, Generic, Protocol, TypeVar

# make this AnnotatedTypeVar or similar to type check the output type
OutputType = TypeVar('OutputType')
T = TypeVar('T')
# could we also type check the input type? Similar to PEP 712
InputType = TypeVar('InputType')
NextOutptutType = TypeVar('NextOutptutType')


class SupportsLen(Protocol):
    def __len__(self) -> int:
        ...


SupportsLenType = TypeVar('SupportsLenType', bound=SupportsLen)


class Validation(Generic[InputType, OutputType]):
    if TYPE_CHECKING:
        def __new__(cls, input: type[InputType]) -> 'Validation[InputType, InputType]':
            ...

    def __init__(self, input: type[InputType]) -> None:
        ...

    def transform(self, f: Callable[[OutputType], NextOutptutType]) -> 'Validation[InputType, NextOutptutType]':
        ...

    def check(self, f: Callable[[OutputType], bool]) -> 'Validation[InputType, OutputType]':
        ...

Annotated[
    int,
    Validation(int | str)
    .transform(str)
    .transform(lambda x: x + 'a')
    .check(lambda x: len(x) < 10)
    .transform(str.strip)
    .check(lambda x: x.count('a') == 1)
    .transform(len)
]

I’ll also mention that I realized there are some parallels with PEP 712, although I think this solves it in a universal manner that’s not data classes / field() specific and could have wider applicability. Thinking of those parallels did make me consider if the input type could also be type checked, but that seems like a much larger jump because (1) it’s introducing a whole new concept of input and output type to the type system (the current proposal does no such thing and would be useful in cases beyond convert type things, e.g. simple Annotated[str, Pattern(...)]) and (2) opens up a lot of questions e.g. what should the type system do if that special type is used outside of the context of Pydantic or another library that does something with the types at runtime.

adriangb · September 23, 2023, 5:41pm

@Jelle I’m curious if pyanalyze could be used at runtime to check this. Something like:

from typing_extensions import Annotated, get_args, get_origin
from pyanalyze import check_type_matches  # made up function
from pydantic import Validation  # or other library, this is just what was described above

LowercaseStr = Annotated[str, Validation().transform(str).transform(str.lower)]

# somewhere inside Pydantic or other library using Annotated
origin = get_origin(LowercaseStr)
assert origin is Annotated
tp, *metadatas = get_args(LowercaseStr)
for metadata in metadatas:
    if isinstance(metadata, Validation):
        output_type = metadata.get_output_type()    # it should be possible to track
        check_type_matches(tp, output_type)

That’d be a really nice way to experiment with type checking these sorts of things without any changes to the language.

I see there’s pyanalyze.dump_value but that doesn’t give you the value at runtime.
Alternatively is there something that the library could do internally to force pyanalyze to error if the values can’t be assigned? That would require users to run pyanalyze but would also remove runtime costs which would be nice.

Jelle · September 24, 2023, 1:56am

Sounds kind of like the is_compatible function I added recently: https://github.com/quora/pyanalyze/blob/master/pyanalyze/runtime.py. But it sounds like you want compatibility between two types, not a type and a value.

One approach could be for annotated_types’s BaseMetadata to declare what kind of types the metadata object works for. For example, annotated_types.Timezone could set base_type = datetime and then pyanalyze could easily be made to check that you can only do Annotated[T, Timezone(...)] if T is compatible with datetime.