PEP 7xx: Dataclasses - Annotated support for field as metainformation

Following the suggestion in the discussion at: Dataclasses - make use of Annotated - #25 by mementum, that the proposal would need a PEP, please find a draft one at the linke below

2 Likes

Reference implementation has been updated to raise the ValueError exceptions specified in the draft PEP. to control duplicity of default/default_factory values and using Field inside Annotated and also as default value.

Thanks for the PEP!

In general, I don’t find the motivation you’ve given for this change convincing, currently. It seems to me that it would be tricky to achieve the correct behaviour at runtime in all cases, and would require a lot of new special-casing from type checkers in order for it to work correctly. The maintenance costs would be significant, but this proposal would not give users the ability to do anything that they cannot currently already do. While it is true that there are many examples in Python where there is more than one way to do the same thing, it is nonetheless better, where possible, to stick to the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

I also think it’s worth noting that the current way of doing it is actually less verbose than your proposed way of doing it.

Specific points of feedback follow:

1. Nested Annotated types

In the typing documentation, it states (following the specification originally laid out in PEP-593):

Nested Annotated types are flattened. The order of the metadata elements starts with the innermost annotation:

assert Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] == Annotated[
    int, ValueRange(3, 10), ctype("char")
]

This snippet currently fails with NameError with your reference implementation:

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Annotated

@dataclass
class Foo:
    x: Annotated[Annotated[int, field(init=False, repr=False)], "not y"]
Traceback with your reference implementation
Traceback (most recent call last):
  File "C:\Users\alexw\coding\cpython\test2.py", line 6, in <module>
    @dataclass
     ^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\dataclasses.py", line 1329, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\dataclasses.py", line 1319, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\dataclasses.py", line 1053, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\coding\cpython\Lib\dataclasses.py", line 805, in _get_field
    ann_a_type, *ann_args = eval(eval_str)
                            ^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
NameError: name 'Annotated' is not defined

Would nested Annotated annotations be supported under your proposal? If not, you should state as such in your PEP. (You should probably also not be raising NameError, whether or not it’s supported, as well :slight_smile: )

2. Compatibility with type checkers

You state in your PEP draft:

This PEP introduces also no compabitility issues with type checkers:

  • The special cases introduced to handle the “field-default-value” syntax in dataclasses are not affected.
  • Type checking the “field-annotated” syntax is 100% standard and done as for Annotated annotations in non-dataclass classes.

This is not true. Type checkers have to implement a substantial amount of special-casing in order to understand dataclasses.field(). If this proposal were accepted, they would have to implement a separate suite of special-casing in order to parse the second argument to Annotated. You can see in this mypy-playground demo that mypy correctly understands the signature of Foo.__init__, but does not understand the signature of Bar.__init__:

from dataclasses import dataclass, field
from typing import Annotated

@dataclass
class Foo:
    x: int = field(init=False)
    
@dataclass
class Bar:
    x: Annotated[int, field(init=False)]

# no error (mypy understands the current way of creating init=False fields)
Foo()

# false-positive mypy error 'Missing positional argument "x" in call to "Bar"'
# (mypy would need to introduce new special-casing in order to understand your proposal)
Bar()

More broadly, the typing documentation (again, following the specification laid out in PEP 593) promises that static type checkers will always ignore any extra metadata given to Annotated. This PEP would mean that static type checkers would have to start looking at, and have to try to parse, the second argument to Annotated in the context of dataclasses. This would be a significant shift, which should be called out in your PEP.

3. Interaction with ClassVar

In gh-90669, it is proposed to allow Annotated to wrap ClassVar in dataclasses, e.g.:

@dataclass
class Foo:
    x: Annotated[ClassVar[int], "this is the x field metadata"]

That issue/PR has yet to be decided on. Would you allow this with your proposal? What would be the behaviour if somebody did something like this, which would seem to be invalid?

@dataclass
class Foo:
    x: Annotated[ClassVar[int], field(repr=False)]

By the way, this also fails with NameError with your reference implementation:

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Annotated, ClassVar

@dataclass
class Foo:
    x: Annotated[ClassVar[int], "foooo"]

4. Tests

Your reference implementation contains no tests, so it is hard to see which scenarios you have considered and which you haven’t.

7 Likes

Thank you very much Alex for the detailed and (really) fantastic feedback.

Even if not convincing enough (today) I will take the time to address your comments, see the additionally proposed PEPs which can generate side-effects and the shortcomings in the specification and reference implementation.

Best regards

1 Like

I’m in favor of this PEP and to make Annotated the recommended way to use Field in dataclasses.

When I first started with dataclasses I was very confused with this syntaxe: x: int = field(init=False). This was because as a dataclasses beginner, you expect the default value to be at the right side of the =. Beginners struggle to understand that you can configure a dataclass field with a default value. Furthermore, beginners are used to use simple type hints like str, int etc. And it’s in the rules they have learned that the default value must match the type hint. As a beginner it’s confusing to accept that the default value doesn’t match the type hint you are writting.

A second thing that is confusing for beginners is the difference between dataclasses.field and dataclasses.Field. The documentation says:

Field objects describe each defined field. These objects are created internally, and are returned by the fields() module-level method (see below). Users should never instantiate a Field object directly.

It’s strange that we must use dataclasses.field() as the default values but not dataclasses.Field(). When reading the module source code, we are greeted by this explanation:

# This function is used instead of exposing Field creation directly,
# so that a type checker can be told (via overloads) that this is a
# function whose type depends on its parameters.
def field(...

Which means we have made the dataclasses api worse, just because didn’t have Annotated at the time and wanted to satisfy the type checkers with this strange default value.

@mementum I would recommend a few things:

  1. Add to the PEP that dataclasses.Field will be the new recommended way of providing field meta-information since with Annotated, dataclasses.field is obsolete. A exception can be thrown when dataclasses.field is used with Annotated to help users understand what is going on. The “old” way of using = field(...) should of course accepted for backward compatibility.
  2. Add to the PEP that an exception will be raised if the parameter default of Field is used together with Annotated. This is to respect the principle of “there is only one way to do things”. Otherwise, some users might use a : Annotated[int, Field(default=3, repr=False)] while some other users might use a : Annotated[int, Field(repr=False)] = 3. Since Field.__init__ is right now not documented and considered private, we could even remove the default parameter from the signature.
  3. Reach out for the maintainers of Fastapi and Pydantic on the subject. Fastapi is actively pushing for the Annotated syntaxe, you can provide this link in the PEP: Query Parameters and String Validations - FastAPI . Those maintainers can help you with this PEP to have something that matches their packages behavior.

Best of luck with this PEP!

3 Likes

I still don’t think I’m in favour of this draft PEP, but I find this a much more convincing rationale for making the change than the motivation that’s currently given in the document!

6 Likes

Should there be a section on how this works with dataclass_transforms? Currently, a custom dataclass can specify “field specifiers”. These are currently functions that have field’s interface. If you’re changing this to Field, then how should the field specifiers be updated to conform?

1 Like

I think I’m with Alex in still not being in favor, per se, but also loving that there is a nice proposal we can discuss. Very nice work @mementum .

One section in the proposal I think needs tweaking is “Backwards Compatibility”. Technically people could be putting dataclass.field specs into annotated fields today. I think it’s safe to say that that number is near-zero (a GitHub Code Search link would be helpful as well). You probably should take a stance on why changing the semantics of such a case is OK (likely because the intent of the author might’ve been encoding the field info? maybe?)

Additionally I don’t think “This PEP introduces also no compabitility issues with type checkers:” per-se. Type checkers don’t currently do any checking of Annotated metadata in the standard library. This would be a new addition. This section from the typing docs is relevant: " Using Annotated[T, x] as an annotation still allows for static typechecking of T, as type checkers will simply ignore the metadata x."

I also didn’t see any section regarding if someone used both forms on a single declaration. Would that be an error/warning/noop?

Other than that, I think it looks OK.

1 Like

@thejcannon I think you are indeed talking about a very important point: what can/should be done with Annotated? Can it be used by type checkers?

It is not clear for the docs, as it says:

Metadata added using Annotated can be used by static analysis tools or at runtime.

and

Using Annotated[T, x] as an annotation still allows for static typechecking of T , as type checkers will simply ignore the metadata x .

aren’t type-checkers static analysis tools? We should clarify the documentation on this point as it’s confusing.

This section of the PEP: PEP 593 – Flexible function and variable annotations | peps.python.org implies that Annotated can be used by type checkers.

Also relevant:
this “motivating example” from the Annotated PEP looks a lot like a dataclasse

So I do believe that the doc should be corrected with:

Using Annotated[T, x] as an annotation still allows for static typechecking of T , as a type checker will simply ignore the metadata x if it cannot interpret and use the metadata.

(Please create a GitHub issue if you find the docs for Annotated unclear! We should fix that :slight_smile: )

Using Annotated[T, x] as an annotation still allows for static typechecking of T , as type checkers will simply ignore the metadata x .

That’s not true because a field changes type checking behavior in many significant ways (as documented in PEP 557 and 681). For example, if field includes init=False, then the type checker will not include that field in the synthesized __init__ method. If a type checker “simply ignores the metadata”, many false positive errors will result during type checking.

This proposal would be the first time that metadata information in Annotated affects static type checking. Currently, static type checkers like pyright and mypy ignore the type arguments to Annotated other than the first one. This PEP would create a new special case where static type checkers would need to evaluate the additional arguments and determine which of them are applicable.

I also agree with Alex that the motivation in this PEP is weak. It does not address an existing problem or provide any new capabilities. As far as I can tell, it simply provides an alternative (redundant) way of doing something that already works — and does it in a more verbose manner.

This proposal would require work for all static type checker maintainers and potentially all libraries that leverage PEP 681 (dataclass_transform) since that decorator implies that the decorated class works like the stdlib dataclass. It would also require retraining developers who are used to the current mechanism.

I’m pretty negative on this proposal.

6 Likes

In regards to Annotated, I don’t think the PEP mandates type checkers should ignore all values - it’s an extensible mechanism for adding additional data to the type. It just so happens that there’s currently nothing defined that type checkers care about that go there. Indeed ignoring the annotation would produce type check errors, but that’s sorta the same as the runtime consequence of not having code that understands an annotation. Right now @dataclass would “ignore” field annotations, so if you tried writing code with this “incompatible” version would create all manner of exceptions at runtime.

There’s actually a few opportunities we have had to use Annotated this way. A couple typing constructs could be imagined as being a type alias to an Annotated with some metadata: ClassVar, Final, Required and NotRequired. They all have the same Annotated behaviour of “passing through” the subscripted type unchanged, but then having some additional special functionality. These do have restrictions on where the annotation is valid, the same as a hypothetical dataclass.field would

1 Like

( I made an issue here: The description of `typing.Annotated` has contradictory statements ¡ Issue #107284 ¡ python/cpython ¡ GitHub , indeed it seems that not everyone in this discussion understands the same thing from the docs)

1 Like

As intended as a replacement of the current method I’m fairly negative on this as this would mean in order to use core features of dataclasses it would be required to import a tool from typing. Currently this requirement is limited to making annotated class variables with ClassVar.

The draft PEP mentions trying to mitigate the performance impact of the typing import internally but by making this the preferred method of using dataclasses anyone using these features will have to take this performance hit anyway (or suppress the recommended type checker warnings and not make use of this new method).

6 Likes

This doesn’t enable anything that we already couldn’t do. But it adds a ton of extra work for the type checker. I’m not convinced that using Annotated has any benefit over the current approach. Also, it’ll be another way to do the same thing that you’ll need to teach. IMO dataclasses should be made a keyword that will allow users to define one without importing stuff. Importing the typing module just to be able to define a dataclass is a no go for me.

5 Likes