PEP 728: TypedDict with Typed Extra Items

PIG208 · February 9, 2024, 5:20am

02/17/2024 Update

Hi, I’m presenting PEP 728.

Now, instead of "__extra__", the proposal introduces the closed keyword argument on TypedDict and makes "__extra_items__" special only when closed=True is given.

To define a closed TypedDict where no extra keys are allowed:

class Movie(TypedDict, closed=True):
    name: str
    year: int

which is equivalent to:

class Movie(TypedDict, closed=True):
    name: str
    year: int
    __extra_items__: Never  # Because Never is a bottom type
                            # no extra items can be added because
                            # there isn't any compatible value type.

To allow extra keys of a certain type:

class Movie(TypedDict, closed=True):
    name: str
    year: int
    __extra_items__: str

The proposal makes TypedDict more flexible. For example, it enables:

closed TypedDict definition;
type compatibility with dict and Mapping;
precise type annotation for .items() and .values().
allowing extra keys when unpacking (**kwargs: Unpack[Movie])

Any suggestions for the PEP are welcome!

Original

Hi, I’m presenting PEP 728.
It specifies a way to annotate the type of additional items on TypedDict using a reserved key "__extra__".

class Movie(TypedDict):
    name: str
    year: int
    __extra__: str

As a side effect, the proposal makes TypedDict more flexible. For example, it enables:

“closed”/“final” TypedDict definition;
type compatibility with dict and Mapping;
precise type annotation for .items() and .values().

Currently, an open issue is whether there is a better way than making "__extra__" a reserved key for TypedDict.

Any suggestions for the PEP are welcome!

MegaIng · February 9, 2024, 5:57am

I don’t think reserving __extra__ is a problem, I would already expect stuff like __class__ to not work (even if it does according the current spec, haven’t checked)
However, I am personally in favor of putting as a keyword extra= next to the base class anyway, that IMO is cleaner:

class Movie(TypedDict, extra=bool):
    name: str

If at some point TypeForm is added to the spec (which IMO is unavoidable), type checkers need to be able to deal with Type Annotation-like syntax basically anywhere anyway, so I don’t think it’s a good counter argument that it’s harder for typecheckers. The inheritance issue mentioned might still exists, although I don’t quite understand what the problematic thing is

PIG208 · February 9, 2024, 6:47am

Thank you for the feedback!

Re: the drawbacks of extra=, I think I should include a link to the reasoning for not favoring this syntax in a future revision.

There was a more recent discussion on the pros and cons of both approaches and I think it would be helpful to include this quote here:

PEP 705 - TypedDict: Read-only and other keys

other_keys versus __extra__

PEP 705 specifies an other_keys parameter whereas PEP 728 specifies a dundered __extra__ field. Neither of these formulations is ideal. I see disadvantages to both. Some of these disadvantages have not yet been discussed in either PEP, and I think it’s important for us to consider the full list of pros and cons before deciding.

Disadvantages of other_keys

It’s not clear how this would work with inlined TypedDict type definitions (a feature that is not yet spec’ed but is in high demand).

It’s not clear how to “spell” other_keys in a type synthesized within the type checker (e.g. using the process described in the PEP for the | “merge” operator). How should these types appear in error messages, hover text, etc.? What if two types differ only in their other_keys value, and the error indicates that one is not consistent with the other?

The argument expression for other_keys is evaluated at runtime, which means that forward references will need to be quoted even when deferred annotation evaluation becomes the norm.

It requires special-case handling in the type checker to verify that the expression is a valid static type expression and that Required and NotRequired are not used (although ReadOnly is permitted).

Disadvantages of __extra__

It means that __extra__ cannot be used as a legitimate key name.

It requires special-case handling in the type checker to disallow Required and NotRequired.

Given these tradeoffs, I slightly prefer __extra__, although I’m not crazy about either of them. I wonder if there are other options here that we haven’t yet considered.

alicederyn · February 9, 2024, 7:21am

Reserving any key is a problem because it potentially breaks existing usage and prevents anyone ever typing a protocol that uses that key in future. This isn’t like regular classes where they live only in Python; TypedDict is used to hint the output of language agnostic protocols like JSON.

One option that hasn’t been suggested I don’t think is explicitly specifying the key name to be used as a TypedDict parameter. This would be more verbose but would avoid any potential conflicts or breaks of existing types. I think combined with the rejected idea to specify the type as a parameter (which could be used as as less verbose option when applicable), it solves all problems except the complaint about a potential future syntax that doesn’t exist yet, which I don’t think should be the primary concern of a PEP?

Daverball · February 9, 2024, 7:46am

Why not allow both?^[1] We already have some mappings that cannot be expressed using the class syntax^[2], so at the very least the functional syntax should have a way to specify the extra keys, that doesn’t reserve a magic key, so you still can still use it as part of the structure of the dict, in the rare cases where you have to.

although the naming would have to be more consistent ↩︎
i.e. any that use a keyword like class, as one of their keys ↩︎

alicederyn · February 9, 2024, 11:13am

I think breaking existing TypedDicts is a reason not to have a reserved key. I don’t know how to judge that risk.

cdce8p · February 9, 2024, 11:20am

Not sure this is a good idea or would even work but looking at dataclasses.KW_ONLY. What about something like this?

class Movie(TypedDict):
    name: str
    year: int
    _: Extra[str]

Daverball · February 9, 2024, 11:29am

I was thinking about something like that too, but it seems kind of bad to add a type marker that only works in one specific case, since it would need to be rejected everywhere else. Maybe a more pragmatic solution would be something along the lines of __ignored__ in Enum, where you supply something other than a type to change the behavior of the metaclass. If it contains a type it’ll count as a regular key, so this should be fairly unambiguous, albeit maybe a bit clunky.

Maybe something like this could work, where you specify a key transform that’s applied at the end, that way you can use the reserved key by transforming another key:

class Link(TypedDict):
    __config__ = {
        "key_transform": {
          "_class": "class",
          "config": "__config__",
          "extra": "__extra__",
        },
    }
    href: str
    _class: str
    config: str
    extra: str
   __extra__: str

or

Summary

class Link(TypedDict, key_transform={
    "_class": "class",
    "extra": "__extra__",
}):
    href: str
    _class: str
    extra: str
   __extra__: str

But it might be a bit much to ask type checkers to be able to interpret something like that.

layday · February 9, 2024, 11:51am

Has the possibility of inhering from dict[str, T] been considered, similar to how TypedDict inheritance was relaxed to allow defining TypedDicts with generic values? This would also pave the way for mapping intersections (mentioned in the PEP), with

class Foo(TypedDict, dict[str, int]):
    bar: str

being equivalent to e.g. a prospective

dict[{"bar": str}] & dict[str, int]

Alternatively, how about building on PEP 696, making TypedDict itself generic, with a default value of Never:

class TypedDict[V = Never]:
    ...

class Foo(TypedDict[int]):
    bar: str

Both of these do not suffer from being difficult to port to an inline TypedDict syntax; synthesised types would emerge naturally and the extra value type would be defined statically. I personally find the extra key approach unwieldy - something that dataclass-like library authors were often forced to resort to in the past for lack of serviceable alternatives - even if we were to cleverly work around name clashes with __config__ or something like it.

alicederyn · February 9, 2024, 12:01pm

I thought about making TypedDict generic or inheriting from a base class, but I think that would lead one to expect all values should be constrained by that inheritance, not just “extra” ones?

Daverball · February 9, 2024, 12:12pm

I don’t think the Generic approach makes sense, even if you specify a default of object to ensure backwards compatibility. It would be much more difficult to introspect, unless you essentially manually define __class_getitem__ on TypedDict, rather than lean on Generic to return a new type constructor, but then you might as well make it a class parameter, there’s no real difference between the two at that point, especially in terms of forward references.

Jelle · February 9, 2024, 1:16pm

Agree, this is a key reason why it feels problematic to reserve a specific key.

Actually, the verbosity of this approach could be mitigated if we keep __extra__ as the default. The semantics would be:

If the class parameter extra_key is present, it must be a string containing the name of the key containing the type of extra fields.
If the class parameter is not present, then the name of the key is __extra__.

This way, we keep the syntax in the current PEP in the normal case, and if a user wants to use the key __extra__, they can use extra_key. Examples:

class TD1(TypedDict):  # some_key is str, everything else is int
    some_key: str
    __extra__: int
x: TD1 = {"some_key": "x", "foo": 1}

class TD2(TypedDict, extra_key="_type_of_extra"):  # __extra__ is a str, everything else is int
    __extra__: str
    _type_of_extra: int
y: TD2 = {"__extra__": "x", "foo": 1}

This way, all types still appear in an annotation context, but users can still use any key they want.

If we go with this, we should also consider making the default be _ instead of __extra__ for brevity. I think this fits well with the meaning of _ in match blocks (a wildcard match):

class TD3(TypedDict):
    some_key: str
    _: int
z: TD3 = {"some_key": "x", "foo": 3}

alicederyn · February 9, 2024, 4:03pm

I’m concerned that any default still risks breaking existing code

alicederyn · February 9, 2024, 4:08pm

Another possibility: make the reserved key be so high entropy that it cannot be a duplicate (e.g. include a GUID in it defined by the spec), and then add a function to the standard library that returns that key. We can call it “extra_keys()”. (Or my preference, “other_values()”)

class Foo(TypedDict):
  name: str
  other_values(): int

Daverball · February 9, 2024, 4:11pm

I think the bigger danger isn’t immediate breakage but a subtle bug, because now one of the required keys suddenly is no longer required^[1], which wouldn’t immediately pop up as a bug unless you specifically wrote tests to catch that. On the consumer side of the type there wouldn’t really be a difference, it’s just that now the provider can forget to set the key and won’t get an error until the bug is discovered and fixed.

Immediate breakage actually is much better, since you could easily write something like pyupgrade to update all your TypedDict definitions.

With Required/NotRequired you would at least see an error from the type checker, since it’s not allowed for the extra key ↩︎

alicederyn · February 9, 2024, 4:12pm

It would break subclassing.

Daverball · February 9, 2024, 4:13pm

I’m confused, how would this work? You can’t use arbitrary expressions as an lvalue in python. There’s a very strict subset of things you can do in an assignment statement, AnnAssign is even more strict.

alicederyn · February 9, 2024, 4:14pm

Bother. I confess I didn’t try to execute it.

Daverball · February 9, 2024, 4:24pm

I guess one other option^[1] would be to go with the extra_key idea but leave the default on None and then provide a convenience subclass in typing where it’s set to "_". But then the issue becomes finding a good name that’s still short, without introducing confusion about what the difference is between the two, so it’s an obvious win compared to just writing class Foo(TypedDict, extra_key="_"):.

if the potential for breaking existing code in subtle ways is in fact big enough of a concern ↩︎

PIG208 · February 9, 2024, 8:29pm

I wonder how subclassing would work if the parent class uses extra_key. So we can spec it out a bit more:

The child class is not allowed to use extra_key
The child class should also be aware that __extra__ has been renamed, and if it needs to override the type of the renamed extra key, it will use the new name.

For B to be structurally compatible with A:

__extra__ or its renamed form should be treated equivalently
It doesn’t matter if B’s renamed key is a regular key in A, or A’s renamed key is a regular key in B. The same rules defined in PEP 728 apply.

(for clarity, the reveal_type behavior on NotRequired keys in the following examples is modified)

class TD1(TypedDict, extra_key="other_cow"):
    fish: int
    other_cow: str

class TD2(TypedDict):
    fish: int
    other_cow: str

td1: TD1 = {"fish": 10, "other_cow": "moo"}  # OK
td2: TD2 = {"fish": 10, "other_cow": "moo"}  # OK
reveal_type(td1["other_cow"])  # Revealed type is NotRequired[str]
reveal_type(td2["other_cow"])  # Revealed type is str
td1 = td2  # OK

We should also note that the PEP as-is doesn’t prevent you from doing things like

class Movie(TypedDict):
    __extra__: str

movie: Movie = {"name": "The Shining", "__extra__": "data"}
reveal_type(movie["__extra__"])  # Revealed type is "NotRequired[str]"