Can we have a more truthful typeshed annotation for `dataclasses.field`?

Today the typeshed annotation for dataclasses.field lies about its actual runtime return type, in order to “help type checkers understand the magic that happens at runtime.” Rather than returning an instance of dataclasses.Field (as it does at runtime), the function is annotated as returning T, where T is the type of its default argument, or the return type of its default_factory argument.

I think this choice, which may have made sense at one point, does not compose well with the later introduction of dataclass_transform, and specifically its field_specifiers argument.

If a third-party dataclass_transform accidentally fails to list field_specifiers, but then uses dataclasses.field as a field specifier, this is a bug: the call to dataclasses.field won’t actually be treated as a field specifier. But this bug can pass silently in an example like this, due to the incorrect typeshed annotations:

from dataclasses import dataclass, field
from typing import dataclass_transform

@dataclass_transform()
def mydc[T](cls: type[T]) -> type[T]:
    return dataclass(cls)

@mydc
class Base:
    hidden: None = field(init=False)

Base()

In this case, since there are no field_specifiers listed, the call to field(init=False) should not be special-cased at all; it should just be treated like any other field RHS, as a default value.

This should emit a diagnostic like “dataclasses.Field instance is not assignable to None", which would highlight the fact that it is not being treated as a field specifier.

Instead, because of the lie in typeshed, the wrong assignment passes silently (since there is no default or default_factory in the call, T resolves to Any). And use of Base will also likely not reveal the problem; the call Base() will still succeed, since even though hidden is not being treated as init=False, it is being treated as having a default.

This is not hypothetical: this exact bug exists in the flet library, and has gone un-noticed for exactly this reason.

Type checkers must already special-case all listed field specifiers (not only dataclasses.field), but only in the context of a class body in which they are a listed field specifier. This context-sensitive behavior cannot be accurately represented in typeshed. Given this, do type checkers actually gain any benefit from this lie in typeshed? Or does it only serve to mislead, when dataclasses.field is used outside of a context where it is a valid field specifier?

I’m particularly interested in feedback from type checker authors about whether this typeshed lie is necessary in some way for their type checker to work, and it would be difficult to adjust to its removal.

5 Likes

(I implemented dataclass and dataclass_transform in Pyrefly.) Pyrefly does rely on the typeshed annotation for dataclasses.field, but we have dataclasses special-cased sufficiently that I don’t think it would be difficult to adjust to a more truthful annotation.

This sounds like a reasonable change to make. My only question would be what this means for non-stdlib code like pydantic/pydantic/fields.py at 46dea928844edfdbee5ca1f36cbc3b042e2a8abd · pydantic/pydantic · GitHub that currently copies the way dataclasses.field is defined. Is the idea that we want to change the conventions for how field specifiers are typed and libraries and type checkers should update accordingly? Or is it specifically only dataclasses.field that you’re proposing changing?

Thanks for the response!

I see. It looks like not only pyrefly, but also pyright, mypy, and zuban, do all error on this example:

from typing import dataclass_transform

class FieldInfo: ...

def field(**kwargs) -> FieldInfo:
    return FieldInfo()

@dataclass_transform(field_specifiers=(field,))
def xform[T](cls: type[T]) -> type[T]:
    return cls

@xform
class C:
    name: int = field(init=False)  # mypy/pyright/pyrefly: invalid assignment of FieldInfo to int

So I guess I could have answered my own question with this example: it seems like mypy/pyright/pyrefly/zuban all currently do depend on this typeshed lie, and thus also effectively require that any third-party field specifier copy that lie, or else emit false positive diagnostics whenever it is used.

The change in behavior for type checkers here would be to stop validating the return type of in-context field specifiers against the annotated field type. That seems like a correct type checker change to me: if the RHS is a valid in-context field specifier, then we don’t actually expect it to return something matching the field type. (Type checkers might also want to do explicit validation of the type of default / default_factory, if they currently rely on the field specifier pretending to return the default type to provide validation. If they do, this already wouldn’t work with the documented example for field specifiers in the typing spec, which just shows returning Any.)

If type checkers make this change, that doesn’t cause pydantic (or typeshed’s) current annotations to stop working; type checkers would just stop depending so much on the details of how field specifiers are annotated. So there would be no need for pydantic to change anything, but they would now have the option to update their annotations to be accurate to the runtime behavior, if they want to.

(This also means that type checkers can freely update their behavior anytime, and we can safely wait to update typeshed until all have done so.)

So this would remove the current (un-specified, un-documented, must be discovered by trial and error) requirement that field specifier functions must lie about their runtime return type, but it wouldn’t replace that with any new required convention: field specifiers would work fine however they are annotated. (But I suppose I’d recommend they annotate accurate to runtime; that’s sort of the default assumption for annotations!)

1 Like

This is how I discovered it, I had to look at what typeshed had for field to figure out the ‘appropriate’ lie. I’d like to not have to do this any more, especially as some of my field specifiers are classes and I gave up trying to lie about those.

1 Like

Thanks for the clarification! I was operating under the assumption that the return type would still be validated in some way, but if it’s not validated at all, then I agree that there’s no need to specify a new convention.

Type checkers might also want to do explicit validation of the type of default / default_factory, if they currently rely on the field specifier pretending to return the default type to provide validation.

Yeah, I think we’d want to do this to replace the validation that we get through the current typeshed annotation.

1 Like