PEP 728: TypedDict with Typed Extra Items

JakobStadler · March 24, 2024, 6:02pm

Not a fully thought out idea, but what about a typing import like extra_items and setting __getitem__=extra_items[str | bytes] in the functional form behaves as if you set str | bytes as extra items and provides valid __getitem__ behavior at runtime.

Jelle · September 29, 2024, 3:02pm

This PEP has languished unresolved for a while and we need to get it finished. We need to figure out the spelling for the concept.

Current PEP proposal

# Must contain exactly one key
class Movie(TypedDict, closed=True):
    name: str

# May contain arbitrary extra keys of type `bool`
class Movie(TypedDict, closed=True):
    name: str
    __extra_items__: bool

# As above, but all the extra items are read-only
class Movie(TypedDict, closed=True):
    name: str
    __extra_items__: ReadOnly[bool]

# Contains a key `__extra_items__` of type bool
# (type checkers could warn about this)
class Movie(TypedDict):
    name: str
    __extra_items__: bool

Con: The __extra_items__ key becomes special. Easy to make a mistake and forget closed=True when using __extra_items__. Type checkers could warn about this, but then what if you actually want __extra_items__ as a key?

Shantanu’s proposal

(In a few posts above. I extended some edge cases with what seemed to me the intuitive behavior.)

# Must contain exactly one key
class Movie(TypedDict, closed=True):
    name: str

# May contain arbitrary extra keys of type `bool`
class Movie(TypedDict, closed=True):
    name: str
    def __getitem__(self, key: str) -> bool: ...

# As above, but keys are read-only
class Movie(TypedDict, closed=True):
    name: str
    def __getitem__(self, key: str) -> ReadOnly[bool]: ...

# Type checker error: Cannot use __getitem__ on a non-closed TypedDict
class Movie(TypedDict):
    name: str
    def __getitem__(self, key: str) -> bool: ...

# Contains a key "__getitem__" of type str and other keys of type bool
class TD(TypedDict, closed=True):
    __getitem__: str
    def __getitem__(self, key: str) -> bool: ...

Con: Creates a new, special-case concept that doesn’t have a lot of parallels elsewhere. Several odd edge cases: If __getitem__ is used as a key, annotations in the class body look like they conflict with the method name. It works at runtime but still looks odd. Similarly, returning ReadOnly[] from a method looks odd and lacks parallels elsewhere.
Con: The presence of the __getitem__ method would also affect how type checkers interpret other operations (e.g., __setitem__).
Observation: We’ll have to think about more edge cases. For example, what if the argument to __getitem__ is annotated as Literal["some", "strings"]? Or a subclass of str? Or an int?

Based on the above, I think I’d prefer to stick with the existing syntax and submit the PEP to the Typing Council. However, if someone is interested in championing Shantanu’s suggestion and resolving all the edge cases, we can still consider it too.

erictraut · September 29, 2024, 4:54pm

In the interest of completeness, let me enumerate a full list of options.

Option 1: Use __extra_items__ and closed=True as proposed in the current draft of the PEP.

class Movie(TypedDict, closed=True):
    name: str
    __extra_items__: ReadOnly[bool]

Option 2: Use a custom __getitem__ override to specify additional key values as suggested by Shantanu.

class Movie(TypedDict, closed=True):
    name: str
    def __getitem__(self, key: str) -> ReadOnly[bool]: ...

Option 3: For now, drop the idea of supporting arbitrary additional keys and introduce only closed=True. This would imply that no additional keys are present. It’s a subset of the current PEP. This covers most of the use cases that motivated this PEP.

class Movie(TypedDict, closed=True):
    name: str

Option 4: Specify the extra value type as a TypedDict keyword argument with a name like extra_values. If unspecified, extra_values would default to Never if the TypedDict is closed and object if it isn’t closed.

class Movie(TypedDict, closed=True, extra_values=ReadOnly[bool]):
    name: str

I previously pushed back on option 4 because I didn’t want to see us add more places in the Python type system where a value expression was treated like a type expression. However, since that time we’ve made good progress in clarifying the concept of a type expression and specifying where they can appear in the grammar, so I’m more comfortable with that proposal now. The other concern with option 4 was that it would be problematic for some future short-cut syntax for describing TypedDict types, but I think this objection applies to all four options.

Of these four options, I’m slightly negative on option 2 for the reasons that Jelle mentions above. The other three options (1, 3 and 4) all seem reasonable to me.

Option 3 is the most conservative because it tackles only part of the problem, but it leaves open the possibility of a future extension. If we think that this will cover the vast majority of use cases that prompted this PEP, maybe we should start here and defer adding the additional functionality.

Jelle · September 29, 2024, 5:36pm

One issue with Option 4 is that the type for the extra items would be eagerly evaluated only in 3.14. Therefore, if you were to include a forward reference in the type, you’d still have to quote the type.

I do like Option 4 the best conceptually; the idea is that you modify the TypedDict type, and an argument to the class constructor feels like the best place for that.

PIG208 · September 29, 2024, 7:15pm

I’m not sure if it is worthwhile if we only add support to closed=True. The sample of early adapters [1] [2], while small, of the experimental implementations, all seem to use __extra_keys__ already. This contradicts to what I assumed in Supporting TypedDict(extra=type).

I would prefer going with Option 4 if the issue of using type expression there is no longer the main concern. With Option 1 it seems that its issues root in the lack of elegance of making an otherwise regular key special. In contrast, issues with Option 4 seems to be evolving in a positive direction (glad to see all the progress on the typing spec!)

Regarding the still present issue with forward references, I believe that the concern is that it adds burden on the type checkers, right? String literals need to be special cased for extra_value, and this behavior (PEP 563) will not be dropped until the EOL of Python 3.13 in 2029-10.

I’m open to exploring Option 2, but at the moment Option 4 seems more practical.

mikeshardmind · September 30, 2024, 5:19pm

Of these, options 2 and 4 seem the most viable. 4 seems the most straightforward, 2 invites overloads on __getitem__ with literals, which I would rather not support user definition of at this time, as I think it would be better for the long term with function composition if at a later date when more important things are handled, that type checkers appropriately synthesized this and a few other methods of TypedDicts such that passing around bound methods as callbacks was visibly typesafe to type checkers

rchen152 · October 1, 2024, 12:11am

Of the options Eric presented, I find Option 4 the nicest-looking, but I think we’d need to think more about what the various combinations of closed and extra_values mean, and which one(s) preserve the existing sort-of-closed-sort-of-not TypedDict behavior (which I find quite ugly, but I assume we need to keep it around for backwards compatibility).

With the __extra_items__ option, the expected behavior is pretty clear:

Definition	Behavior
closed=True, no `__extra_items__`	Fully closed TypedDict
closed=True, `__extra_items__` present	Closed TypedDict with typed extra items
closed=False	Existing TypedDict behavior

With extra_values, there are two options that feel sensible to me:

We could allow extra_values only when closed=True. Then Option 4 maps in an obvious way onto Option 1.
We could get rid of closed altogether, and have extra_values behave as follows:

class Movie(TypedDict, extra_values=Never):
  # Fully closed TypedDict

class Movie(TypedDict, extra_values=SomeType):
  # Closed TypedDict with typed extra items

class Movie(TypedDict):
  # Existing TypedDict behavior

PIG208 · October 2, 2024, 5:31am

Thanks! That would be reasonable. The closed idea was proposed earlier in this thread. Let’s recount some of the issues closed was intended to resolve.

It allows us to use the special __extra__ key on a TypedDict – that’s no longer an issue if we use extra_items/extra_values instead.

Discoverability was one of the motivations for closed too, that would be a non-issue for extra_items/extra_values.

Another previous concern that closed attempted to resolve, was to cover the simple use case with a simpler syntax. This still applies to extra_items/extra_values.

For use cases where no extra values need to be specified, closed=True might look simpler than extra_values=Never to some. We don’t have enough data to support if the “simple” case will be common to make closed=True more favorable, but I agree that combining closed and extra_values comes its own flavor of mental overhead.

I will work on a revision later this week spec’ing Option 4 without closed, because most of the issues closed addressed aren’t there anymore.

JukkaL · October 2, 2024, 8:33pm

I find the extra_values= option my least favorite, especially if we believe that extra_values=Never is the most common case (which sounds likely to me).

First, it requires an extra import of Never from typing, so the common case looks like this:

from typing import TypedDict, Never

class Movie(TypedDict, extra_values=Never):
    ...

I think this is significantly more verbose and less clean-looking compared to closed=True:

from typing import TypedDict

class Movie(TypedDict, closed=True):
    ...

Second, closed=True is aligns well with the existing total=False flag, which arguably improves consistency and makes this easier to learn and remember:

class Moved(TypedDict, total=False):
    ...

Third, the use of Never feels a bit too clever. Never is a fairly specialized concept, and I bet most typing users rarely use it. Also, I’d argue that in this use case it’s used in a somewhat unusual way, so looking up the current definition of Never in the docs might not clear things up. We might need to explain the use case in the documentation entry for Never, which feels less than ideal.

I think __extra_items__ would be okay, even if verbose, since this is the less common use case. I think I’d slightly prefer using _: <type> due to not needing to invent a new name, and similarity with the match statement.

Jelle · October 2, 2024, 8:38pm

I agree that closed=True is likely the most common use case and requiring the use of Never for it is a bit obscure.

What if we use the following behavior?

closed=True: no extra items allowed
extra_items=T: extra items allowed but must be of type T
closed=True, extra_items=T: type checker error: cannot combine closed=True with extra_items=
Neither: arbitrary extra keys may be present (current behavior)

This would conflict with TypedDicts that use _ as a key (which feels a lot more likely than __extra_items__).

Daverball · October 2, 2024, 9:02pm

I feel that the impact of this potential conflict is overstated. It seems rather unlikely that you’d actually both allow extra keys and need there to be a _ key. And there is a workaround in the rare case that you actually do need it.

The workaround doesn’t seem any worse to me than being forced to use the functional syntax for keys you can’t spell with the class syntax like the far more commonly used class key. I’d rather have a clear and concise syntax that has a parallel in the match statement, than sacrifice ergonomics for some edge case.

That being said, I like your proposal as well. I prefer your proposal over the __extra_items__ key, but I would slightly prefer the _ key over the extra_items class parameter.

rchen152 · October 3, 2024, 1:30am

This seems pretty reasonable to me. I agree that closed=True is more readable than extra_items=Never.

As a side note, the more I look at it, the less I like the combination of closed=True with extra items, simply because it conflicts with my intuition of what a “closed” class should be. (Surely if a class is closed, all its attributes should be statically known?) So I like that closed and extra_items are mutually exclusive in this proposal.

jamestwebber · October 3, 2024, 1:35am

Another possibility there: if closed=True then you can’t add extra items, so their type is irrelevant. In that case, writing the combination could just be a linter rule (“this argument is pointless”) rather than an error.

JukkaL · October 3, 2024, 10:45am

That’s fair, though _ as a key is still probably quite rare. Your proposal seems pretty reasonable.

The issue with key conflicts got me thinking about a more general solution to work around key naming limitations. We already don’t support all possible keys when using the class-based syntax, and thus we probably can’t deprecate the functional syntax. What if we’d add a new way of specifying arbitrary string keys using the class-based-syntax? Here’s one idea:

class Foo(TypedDict):
    name: str  # Regular item
    _: bool    # Type of extra items
    __items__ = {
        "_": int,   # Literal "_" as a key
        "class": str,  # Keyword as a key
        "tricky.name?": float,  # Arbitrary str key
    }

(The name __items__ is just the first thing that came to my mind – the specifics of the name aren’t important.)

This may go beyond the original goals of the PEP, but this would have some nice properties:

Arbitrary key names can be supported using the class-based syntax.
We can hopefully deprecate the functional TypedDict syntax.
We can support forward references in the extra item type without escaping.
All TypedDict items (including the extra ones) are defined using a similar syntax in the common case where __items__ is not needed.
We have the option of adding arbitrary magic key names such as _, since name conflicts can be worked around easily.

This would make the proposal a bit bigger, but on the other hand, we could deprecate the functional syntax, so arguably this would simplify the overall non-deprecated TypedDict functionality.

PIG208 · October 5, 2024, 6:40am

I like the __items__ concept and its potential uses such as replacing the functional syntax. But I feel that feature deserves probably a separate discussion thread and a PEP.

Making closed and extra_items incompatible as class parameters seems most viable right now.

JakobStadler · October 5, 2024, 7:45pm

Couldn’t __items__ be used for extra typing information too?

class Foo(TypedDict):
    name: str  # Regular item
    __items__ = {
        str: bool,  # like "fallback to dict[str, bool] for extra keys",
        "__items__": str,  # as key in dict
        "class": str,  # Keyword as a key
        "tricky.name?": float,  # Arbitrary str key
    }

PIG208 · October 19, 2024, 2:52am

The PEP has been updated to specify the closed and extra_items=T proposal.

closed works a bit like total as it only allows a literal True or False. The value of closed itself is not inherited, but it does implicitly set extra_items=Never. This makes it an error for one to subclass a closed=True TypedDict without explicitly setting closed=True again.

extra_items is quite similar to __extra_items__, except that it is not compatible with closed on the same TypedDict definition.

The revised proposal has an open issues section because I think there are some interest in the __item__ idea, or some other ones, but there are some concerns around them:

Quoting Jelle’s comment:

I feel this isn’t a strong argument; if this PEP is accepted, we’ll be stuck with its behavior for at least many years, so we need to make sure we get it right the first time.
This proposal is nice because it also unlocks some other things that are currently awkward or impossible (keys that aren’t valid identifiers), it has a few disadvantages:
It's less apparent to a reader that _: bool makes the TypedDict special, relative to adding a class argument like extra_items=bool.
It's backwards incompatible with existing TypedDicts using the _: bool key. While such users have a way to get around the issue, it's still a problem for them if they upgrade Python (or typing-extensions).
The types don't appear in an annotation context, so their evaluation will not be deferred.

I agree that _: bool might be not apparent. Its similarity to match statements and brevity are appealing, but TypeDict lacks a case keyword that makes it stand out from regular keys.

I think reintroducing the closed class argument from the __extra_items__ proposal would help with backwards compatibility issue, but that seems less nice when you can already write _ as a regular key with __item__.

erictraut · October 23, 2024, 12:58am

I’ve published pyright 1.1.386, which contains support for the revised draft of PEP 728.

You’ll need to set “enableExperimentalFeatures” to true in the pyright config.

Here’s an example in the pyright playground.

carljm · November 3, 2024, 5:02pm

I have a different concern about PEP 728. ^[1] I think that this section does not adequately consider the soundness reasons for the restrictions on index signatures in TypeScript that the PEP proposes to discard.

Unlike index signatures, the PEP proposes that the “extra items” type does not need to be assignable from the types of known keys. But this makes __setitem__ on a TypedDict unsound: ^[2]

class Movie(TypedDict):
    name: str
    __extra_items__: bool

def set_movie_metadata(movie: Movie, key: str, value: bool):
    movie[key] = value

movie = Movie({"name": "Monty Python's Life of Brian"})

set_movie_metadata(movie, "name", False)  # no type error!

# oops! now movie["name"] is a bool, not a string

The problem here is that Literal["name"] is assignable to str, which means we can have a variable typed as str whose runtime value is actually "name". This means if we are setting a value for a key that is typed as an arbitrary string, we may be setting a value for an extra item, or we may be setting a value for a known item – a type checker has no way to know.

The PEP links this issue on the TypeScript issue tracker to suggest that this limitation was a mistake in TypeScript, and thus we should not copy it. But I think this is a mis-reading of that issue. There doesn’t appear to be serious consideration in that issue of simply lifting the limitation and allowing the unsoundness. The latest proposal instead is to use negation types to allow setting an “extra” key only when the key is of type str - Literal["name"] – that is, it’s a string that the type checker knows cannot be the string "name" (you could imagine that an if key != "name": narrowing check were added to set_movie_metadata above, to make it safe.) But without subtraction types in the Python type system, it isn’t possible for type checkers to enforce this.

I don’t feel comfortable with introducing this hole. I would prefer if the PEP did require, like TypeScript, that the type of __extra_items__ must be assignable from the types of all known keys.

EDIT: Oops, that restriction still doesn’t close the hole; if known keys can have a narrower type than extra-items, the hole demonstrated above still exists. The TypeScript restriction is only sufficient to make __getitem__ safe from this problem, not __setitem__. I’m not actually sure how the extra-items feature can be made safe at all, without intersection and subtraction types.

I saw some discussion suggesting that the most useful feature in the PEP is closed – perhaps it would be more advisable to add only that feature, for now?

Apologies if this has been previously discussed; prior discussions of the PEP are quite lengthy to read through in full; I’m relying here on the intended property of PEP discussions that outcomes of substantial discussion and concerns raised should be reflected in the PEP text! ↩︎
Hat tip to @samwgoldman for bringing this issue to my attention. ↩︎

Tinche · November 3, 2024, 6:16pm

Could the type checkers synthesize a number of overloads for the __setitem__?