Dedicated syntax for `Annotated`

The only place I can find/recall referencing this perceived problem was in the (now stalled) https://discuss.python.org/t/pep-727-documentation-metadata-in-typing proposal. There the primary suggestion floating around was reusing the @ operator, but Annotated usage is only increasing, well beyond the purposes of that PEP, so i figured i’d at least start a dedicated thread for it; at a minimum as future reference if everyone else thinks this is an awful idea.

tl;dr

I think foo: [int, Doc("foo")]foo: Annotated[int, Doc("foo")] is least bad.

Followed by foo: int with Doc("foo") (which i initially preferred before writing this), although i’m personally open to whatever syntax would make this a thing.

Rationale

Use ofAnnotatedtoday is annoying for a number of reasons, and it is increasingly used for declarative libraries like pydantic or dataclasses to associate data with type attributes

The need to import Annotated

I think, as much as is reasonable, typing constructs would ideally be automatically in scope to remove the need to import them like list vs List; or else have syntax which obviates their use, like |vs Union

line noise between variable and type

  • foo: Annotated[int, <something else>] versus
  • foo: int <something else>

The more complex the overall annotation, or whether it spans more than one line, the more visually annoying it becomes that the annotation forces the actual type away from the variable in question.

Length

foo: Annotated[int, foo] is just particularly long.

foo: int foo is literally the shortest it could be, but more realistically…
foo: int . foo where . is some sigil or…
foo: int word foo where word is some keyword.

Trying to minimize length seems worthwhile, but i think the interruption and noise around the type itself is more problematic than the length necessarily

interrupting keyboard/typing-flow

You’re making a pydantic model, you think oh yes, foo: list. but then you need a default, ah crap, move left, type Annotated[, move right, type , Field(default_factory=list)]

I would prefer to be able to complete the thought foo: list, and then optionally add annotations onto the end, also because that naturally fits with its purpose

Considered options

Sigils (as an infix operator)

foo: int @ Something()
# or
foo: int @ (
   Something(),
   VeryLonnnnnnng(),
)

Pros

  • Sigils approach the minimum bound of characters
  • Depending on whether the sigil is already an operator, it can possibly be backported to prior versions syntax-wise (whether or not it would be)

Cons

  • They’re less obvious/more arbitrary than a decent keyword
  • Seems more likely to devolve into bikeshedding

Keywords (as infix operator)

foo: int with Something()
# or
foo: int with (
   Something(),
   VeryLonnnnnnng(),
)

with seems like the only sane existing keyword really… is, as dont seem to convey the right meaning. and for, pass, and are the only remaining 3 that are even in the ballpark.

I assume a new keyword is out of the question for this specific feature by itself.

Pros

  • A keyword seems more clear/natural

Cons

  • Slightly longer than sigils
  • for with in particular, it would not be backwards syntax compatible

Abbreviation

e.g. foo: [int, Something()]

Pros

  • short
  • backwards compatible syntax-wise
  • Most similar to what we have now

Cons

  • (imo) ideally it would be pure additional syntax after the variable/type in order to solve the line-noise point.
    • With that said, it’s 1 character, so it’s still much better

Chaining

For both above infix operator styles, I think including more than one Annotated item at a time would need to happen with commas and require parentheses in order to span lines. I had initially been thinking foo with x with b, but there’s be no sane way to line wrap it.

Thus foo: int with Something(), SomethingElse() would translate exactly to foo: Annotated[int, Something(), SomethingElse()]

It would ultimately be up to formatters, but I would anticipate a long multiline annotation to be slightly shorter and arguably look better

class Foo:
    variable: Annotated[
        int,
        Doc("Some long doc string that just keeps going"),
    ]
# vs
class Foo:
    variable: int with (
        Doc("Some long doc string that just keeps going")
    )
6 Likes

To me this notation is less readable than the status quo because two objects in a pair of square brackets do not carry any meaning resembling an annotation to me.

I like the previous proposal of overriding the @ operator better because it resembles its existing use as a decorator, semantically as a tag, for something, so when I see:

foo: int @uint32

I would be able to more intuitively read it as “declare foo as an int tagged with uint32”.

6 Likes

With that syntax I would expect foo: int @uint32 @Doc("foo") which I dont think thinking of it as a decorator composes as well beyond a single item or if it spans multiple lines. I think it would really have to act more like an infix operator that could (with parentheses) enclose multiple annotated items.

I dont disagree with you necessarily about the “abbreviation” syntax. I definitely dont think it’s flawless, but i do think the fact that syntactically it would be backwards compatible and I think could be shimmed to function in older python versions is the biggest thing going for it.

Multiple @ tags read just fine to me, reminding me of multiple hashtags attached to a post on social media. A tag spanning multiple lines can always be achieved with parentheses. With a leading @ it’s always clear that a tag is to follow. With a leading [ it’s less clear what it’s meant to enclose.

An alternative discussed in the previous thread is to use the | operator to compose multiple annotations:

foo: int @ uint32 | Doc("foo")

which I find okay too.

2 Likes

I think the Union operator (|) would only confuse those ‘new’ to typing. Generally something like x: int @uint32, Doc("...") or just x: int @uint32 @Doc("....") would work better towards readability.

3 Likes

The @ ‘tagging’ syntax seems natural and intuitive, like adding one or more decorators to ‘tag’ a function.

But this resemblance to decorators could also be a source of confusion. Might a Python programmer unfamiliar to typing assume from

foo: int @uint32 = 3

that the annotation calls the uint32 function like a decorator (uint32(int), uint32(3), or some other incorrect construct)?

Generally, I’d not expect too much confusion, there is no call expression (except for Doc(...), even though it could be made a special form, as type checkers discourage call expressions in hints).

Implemententing this should also not be too much of an issue, we could add the __matmul__ and __rmatmul__ could be added to type. (Probably adding the later only makes more sense, as nested Annotateds could be resolved easily.)

1 Like

Deferred annotations make using operators potentially problematic at runtime.

If a value is not yet defined it’s not possible to know what the operator should do. Annotations don’t necessarily have to be types so annotationlib refuses to guess and defers the whole expression. This makes it impossible to get the other details out of an annotation if any of the names are undefined.

Using something like Annotated on the other hand, makes the intention clear and you can retrieve the other information even if the type is not yet defined (on Main - there was a bug that prevented this and the backport of the fix to 3.14 hasn’t been merged yet).

from annotationlib import get_annotations, Format
from typing import Annotated
from typing_extensions import Doc
from pprint import pp

class Example:
    a: undefined @ Doc("useful info")
    b: Annotated[undefined, Doc("useful info")]
    
annotations = get_annotations(Example, format=Format.FORWARDREF)

pp(annotations)
{'a': ForwardRef('undefined @ __annotationlib_name_1__', is_class=True, owner=<class '__main__.Example'>),
 'b': typing.Annotated[ForwardRef('undefined', is_class=True, owner=<class '__main__.Example'>), Doc('useful info')]}

Here, even if the type is not yet defined it is possible to extract the other information from Annotated, but not from the syntax.


You might argue that this is the case only because @ hasn’t been defined as meaning Annotated yet, but actually this issue actually already exists for unions with the | syntax under 3.14.

from annotationlib import get_annotations, Format
from typing import Union
from pprint import pp

class Example:
    a: str | undefined
    b: Union[str, undefined]

pp(get_annotations(Example, format=Format.FORWARDREF))
{'a': ForwardRef('__annotationlib_name_1__ | undefined', is_class=True, owner=<class '__main__.Example'>),
 'b': str | ForwardRef('undefined', is_class=True, owner=<class '__main__.Example'>)}

Here you can find that str is a valid type for ‘b’ at runtime, but you can’t extract that information for ‘a’.

3 Likes

That wouldn’t work for parameters:

def function(
    param1: int with Something(), SomethingElse(),
    param2: str,
) -> None:
    pass

I believe you can see the obvious issue. So we’d have to make wrapping mandatory when there are multiple elements after with: foo: int with (Something(), SomethingElse()).

EDIT: sorry that’s what you’ve shown in the last example. Might be worth fixing the part I quoted :slightly_smiling_face:

1 Like

Such syntax wouldn’t allow to express the following (or at least make it very hard to read/parse):

foobar: Annotated[int, A(), B()] | Annotated[str, C(), D()]

foobar: int @ A() | B() | str @ C() | D()

Also, how well would this syntax behave with multiple metadata elements?

foo: int @SomeLongStuff(with_parameters=1) \
    @SomeEvenLongerStuff(with_more=0, parameters=1)

Should it require parentheses wrapping too?

foo: int (
    @SomeLongStuff(with_parameters=1),
    @SomeEvenLongerStuff(with_more=0, parameters=1),
)

Definitely looks like dangling decorators now (I could get used to it though) :smile:

Perhaps I misunderstood. I meant for long/multiline annotations you’d likely need foo: int @(foo, bar) as opposed to foo: int @foo @bar. Or at least it seems like it introduces the least amount of “additional” syntax (at least if you’re thinking of it as a decorator versus an operator.

This has become a fairly typical pattern. The vast majority of my encountered use of Annotated includes call expressions.

Yea this is the problem i had with a “decorator” syntax. It seems like it introduces a bunch of new syntactical things or forces the use of \ (which imo is a no-go). So I feel like long/multiple elements ought to look like:

foo: int @(
    SomeLongStuff(with_parameters=1),
    SomeEvenLongerStuff(with_more=0, parameters=1),
)
1 Like

I mean something like:

foo: (
    int
    @foo
    @bar
)
5 Likes

I would definitely like a shorter syntax that doesn’t require an extra import.

2 Likes

I very much would also appreciate a shorter syntax, and the tagging idea seems incredibly intuitive to me! I think the multiple tags each using @ is the better choice as it keeps reminding me that each item in a long annotated group is each a tag.

1 Like

The idea of a @-tag may be generalized to other special forms that currently hinder readability with the more important, declared type of a name wrapped inside nested square brackets.

So instead of:

class Movie(TypedDict):
   title: Required[ReadOnly[str]]
   year: int

the fact that title is declared as a str can be made clearer and more prominently in the front by rewriting ReadOnly and Required as tags attached to str:

class Movie(TypedDict):
   title: str @ReadOnly @Required
   year: int

Implementation-wise, while regular annotation classes should inherit from a base with a __rmatmul__ that wraps the operands into Annotated objects, non-annotation special forms should inherit from a different base to allow the operands to be transformed back into the respective original special forms with the __rmatmul__ operation.

4 Likes

This is a nice idea for readability! Though, it would clash with the annotation tagging syntax originally discussed.

Would this work for special forms with arguments? E.g.,

class Film:
    title: str @Final # As proposed
    year: int @Annotated[Doc("Long doc string")] # So we can still have a nicer annotation syntax
    month: int @Union[str] # But is this acceptable?
    day: int @GenericAlias[str] # Not a special form, but how does the user know?

But if we sacrifice the possibility of using it for annotations, it does seem like a more readable alternative, especially as more special forms keep being added.

2 Likes

Yeah we should document that only 1-argument special forms are to be made @-tag-compatible, where it makes sense, which currently include Optional, ClassVar, Final, Required, NotRequired and ReadOnly. TypeIs and TypeGuard aren’t included because they aren’t applicable to names.

We should also document that other than those 1-argument special forms, all the other user classes must inherit from a specific base in order to support the @ tag so to be wrapped inside an Annotated object.

So that tags that consist of a mix of special forms and annotations:

class Movie(TypedDict):
   title: str @ReadOnly @Required @max_size(60) @Doc('Movie Title')
   year: int

will be transformed back into the current syntax of:

class Movie(TypedDict):
   title: Annotated[Required[ReadOnly[str]], max_size(60), Doc('Movie Title')]
   year: int
5 Likes

If requiring all annotation-bound user classes to inherit from a new base in order to be @-tag-compatible sounds like too much work on the developer’s part, an alternative approach is to implement type.__matmul__ and str.__matmul__, both currently undefined, to support this operation.

1 Like

I like the readability and simplicity of just using one syntax for various constructs, but how would a reader parse which @-tags are annotations and which are type qualifiers?

Or does that not matter? (But that feels like a very large type system change!)