PEP 728: TypedDict with Typed Extra Items

carljm · November 3, 2024, 6:22pm

Yes, that’s what they will have to do regardless (at least in effect), but I don’t think it solves the problem, it just reframes it into a more general problem with overlapping overloads.

The overloads a type checker would have to synthesize for __getitem__ in the case where the extra items type is not assignable from a known-key type would violate the current type checker rules about return types from overlapping overloads.

The overloads a type checker would synthesize for __setitem__ would not violate any overload rules enforced by current type checkers, but would exploit the fact that the currently enforced rules are not sound with overlapping types in multi-argument overloads.

It’s arguably already a problem that our overload rules are not sound, but it’s IMO more of a problem to specify a new type system feature in a way that requires type checkers to synthesize unsound overloads (and even break the existing overload checking rules that we do have.)

PIG208 · November 3, 2024, 6:26pm

Will it be possible for type checkers to synthesize a special form ExtraKeys[TD] that excludes the known keys on TD? However, I wonder if inheritance works with this, because new keys can be added.

carljm · November 3, 2024, 6:45pm

It’s possible to imagine adding support for a narrowly focused special subtraction type for this purpose. But the harder part of the problem is that for this type to be usable, there must be some way for a user to write code that narrows a string type to this ExtraKeys[TD] type. Without generalized subtraction types, this part would also have to be quite special cased. Something like if mykey not in TD.statically_known_keys(), which would do the right check at runtime and be understood by type checkers. And then it gets more tricky with inheritance…

I think one way to define the extra items feature soundly would be to require a read only type for extra-items (so __setitem__ is no longer a problem) and introduce the TypedScript restriction (extra items type must be assignable from all known-key types) which makes __getitem__ sound.

erictraut · November 3, 2024, 8:33pm

For context, pyright doesn’t synthesize any overloads for a TypedDict __setitem__ or __getitem__ method. From what I can tell, mypy does not either.

Pyright implements indexed TypedDict accesses and assignments using special logic. While I’d normally prefer to implement such a feature by synthesizing overloads, an alternative approach was required to handle advanced type narrowing scenarios like the one below. This approach also allows for better (more informative) error messages.

class Movie(TypedDict):
    name: str
    year: NotRequired[int]

def func1(m: Movie):
    print(m["name"])  # OK
    print(m["year"])  # Type error (pyright); OK (mypy)

    if "year" in m:
        print(m["year"])  # OK; "year" is known to be present
    else:
        m["year"] = 1981

    print(m["year"])  # OK: "year" is known to be present

I’ll note that mypy doesn’t emit an error for the above condition, but this feature has been requested and heavily upvoted.

Pyright currently allows writing a value to an indexed TypedDict when the key value is not specifically known (e.g. if the key is a str or Any), whereas mypy emits an error in this case.

def func2(m: Movie, key: str):
    m[key] = 1981  # OK (pyright); Type error (mypy)

Pyright’s current behavior here admittedly permits some unsoundness. I opted for this behavior because it allows for some common (safe) patterns that involve iterating over keys in a TypedDict, whereas mypy complains about these usages. I loosened the check after fielding complains from numerous pyright users and seeing similar complaints in the mypy issue tracker. I could be convinced to reverse this decision and once again tighten up the check, especially if the spec were to mandate this stricter behavior.

My meta-point is that I don’t think it’s necessary to place additional limitations on PEP 728 to make it type safe. A type checker does not need to implement its handling of indexed accesses and assignments by synthesizing overloaded __getitem__ and __setitem__ methods. There are ways to implement this in a type-safe manner using custom logic that confirms whether the key is a string literal that doesn’t map to any known keys. This is effectively a limited form of type subtraction, as Carl said.

Based on the arguments I’ve heard so far, I’m not in favor of restricting extra_items to a supertype of all known items. I’m also not in favor of limiting extra_items to be read only. I’m not convinced those limitations are desirable or necessary to guarantee type correctness.

carljm · November 4, 2024, 6:00pm

I agree.

Currently the PEP text does an excellent and thorough job of specifying various behaviors of a TypedDict with extra_items, in terms of inheritance and assignability. But it doesn’t currently contain any discussion of the behavior of __setitem__ or __getitem__ on a TypedDict with extra_items. Clearly assumptions can vary here; I assumed this meant it was implicitly proposed that arbitrary str keys would be allowed and assumed to refer to extra_items, which is not sound. I agree that if the intent is to preserve the current text in the TypedDict spec that “A key that is not a literal should generally be rejected”, then there is not a soundness problem; just a potential usability one.

I think the intended behavior here should be specified or at least the tradeoffs discussed in the PEP.

In particular, it makes it very easy to end up with a TypedDict which contains values for keys that are not of the annotated type for that key. This is highlighted in the current typing spec text for TypedDict as one of three things that type checkers should prioritize preventing.

It seems to me that the existing TypedDict specification already puts type checkers in a position where, in order to achieve usability, they have to choose between significant unsoundness (by “significant” here I mean “not some highly-contrived corner case, but rather very likely to lead to false negatives in real code that miss real typing problems”) or implementing quite-sophisticated type narrowing schemes for string literals.

I worry that allowing typed extra_items, by making it even more likely that people will want to index into TypedDicts with keys typed as str, will further increase this pressure, and extend the need for “sophisticated type narrowing” even further into type concepts that haven’t been specified or carefully examined, such as subtraction types.

mikeshardmind · November 6, 2024, 12:53am

Eric Traut:

Pyright implements indexed TypedDict accesses and assignments using special logic. While I’d normally prefer to implement such a feature by synthesizing overloads, an alternative approach was required to handle advanced type narrowing scenarios like the one below. This approach also allows for better (more informative) error messages.
class Movie(TypedDict):
    name: str
    year: NotRequired[int]

def func1(m: Movie):
    print(m["name"])  # OK
    print(m["year"])  # Type error (pyright); OK (mypy)

    if "year" in m:
        print(m["year"])  # OK; "year" is known to be present
    else:
        m["year"] = 1981

    print(m["year"])  # OK: "year" is known to be present
I’ll note that mypy doesn’t emit an error for the above condition, but this feature has been requested and heavily upvoted.

I view mypy as having the more correct behavior here, but that both type checkers are wrong here.

at the first diverging behavior, pyright errors for a missing key, but exceptions are not part of the type system. If that key is present, then the value is correct, if it fails, the assignment is unreachable.

With the key check, both are wrong to narrow this based on the presence of something that the type allows to be removed. (but this shouldn’t be an error without the narrowing) This is a time of check vs time of use bug as written. Same with after the possible assignment, both in that it shouldn’t narrow and shouldn’t error.

Actually synthesizing the correct overloads here would cause both type checkers to have gotten this correct.

carljm · November 6, 2024, 2:33am

I think this is quite debatable. We’ve chosen to make possibly-nonexistent keys a part of the type of a TypedDict (by adding NotRequired), and it’s perfectly reasonable to say that accessing a key which may not exist is a type error, just as it’s a type error to access a nonexistent attribute of an instance.

It is true that many narrowings done by both pyright and mypy are not sound in the sense that a mutation between check and use could render the narrowing incorrect. If we aim for only narrowings that are definitely sound in this way, basically only the types of local variables can be narrowed. It’s not clear that this leads to an overall more usable type system; it means the type checker will reject many bug-free programs. (I think I’ve mentioned before that Pyre tried at one time to be strict about this, and rolled it back because the overall effect on the codebase and the developer experience was not positive.)

mikeshardmind · November 6, 2024, 2:45am

I don’t think that tracks. The closest thing in comparison with attribute access that’s Supported would be a property that returns SomeType | Never, and that’s something that type checkers don’t error for. This isn’t equivalent to a Union of two typeddicts, one where it is present, one where it is not, or assignment to NotRequired values wouldn’t be valid ever and couldn’t work.

I don’t think it’s serving users as-is.

This is more idiomatically written:

print(m.get("year", 1981))

and the special cased narrowing on __contains__ here is no more magical than:

try:
    value = m["year"]  # type checkers shouldn't error here if we're special casing dict knowledge
except KeyError:
    value = 1981

print(value)

which type checkers reject, despite this actually not being bug prone. The try/except form is also the most idiomatic way of writing this when creating a default is expensive and not appropriate for use with .get

carljm · November 6, 2024, 6:19am

Never is just the empty type; T | Never == T for all types T. A property returning SomeType | Never is identical to a property returning SomeType, it’s not a representation of a maybe-defined attribute.

NotRequired is not part of the value type for a key; it’s part of the type of the TypedDict itself.

There is currently no parallel to NotRequired for attribute access, because we haven’t chosen to add a representation for “possibly-defined attribute” to the type system. There has been discussion of a way to represent this in the context of possibly-not-initialized or delayed-initialized attributes. If we did add a representation of maybe-undefined attributes, the main purpose of doing so would be so that type checkers could error if they are accessed when possibly not initialized.

I agree that m.get("year", 1981) is usually preferable to if "year" in m: ..., but either of those patterns will silence the error about a maybe-not-existent key, so the error itself doesn’t encourage one over the other. If there’s an argument here, it would be an argument for not narrowing on if "year" in m:, not an argument against the error.

I also agree that some useful exception-catching patterns don’t work nicely with type checking, because the type system doesn’t model exceptions. But this isn’t specific to dictionary key errors; you can pick almost any type error that type checkers complain about and write a similar example of “safe” try/except code that triggers that error and then immediately catches the raised exception.

Liz · November 6, 2024, 10:43pm

The contains narrowing is special cased though right? Why do type checkers special case the version that leads people to a concurrency issue rather than the one that doesn’t?

mikeshardmind · November 6, 2024, 10:56pm

This isn’t fully true though. Type checkers do model control flow around exceptions partially, in many cases then determining that a value is unbound or that code is unreachable. There’s even a type system construct for this with typing.assert_never; Asking that they understand what happens at runtime when a key-value pair is missing when there’s a type system construct for a possibly missing key-value pair isn’t a new idea. pyright used to do this, but then removed that, leaving the objectively worse key in dict pattern

I actually think the special case isn’t terrible here, but that the manner of how it is special cased is the problem because it’s leading to less correct programs (not just type theory here) if a type checker pushes a user to use key in dict for an expensive default rather than try/except

carljm · November 6, 2024, 11:44pm

Yes, agreed. This is different from modeling exception-catching as silencing a type error within the try block. Not to say that the latter isn’t possible, just that no type checkers so far have chosen to do it, that I know of. Though your comment below suggests maybe pyright used to do it, unless I misunderstand? I wasn’t aware of that; if that’s the case probably Eric would know why it was removed.

I don’t know what type system construct you are referring to here, or what precisely pyright used to do but then removed. Can you be a bit more explicit?

Given that neither the narrowing nor the lack of modeling exception-catching is specified or has been explicitly discussed in a PEP or a spec change that I know of, I think the only answer I can give here is “because the user demand vs the implementation complexity tradeoff seems to have worked out this way so far for the authors of existing type checkers.”

mikeshardmind · November 7, 2024, 12:07am

Type system construct for a possibly missing key-value pair is NotRequired. mypy and Pyright previously didn’t error in the correct handling of this with try/except, but removed that at some point.

While I brought this up before as a separate issue, I think if we’re expanding the places keys can optionally be present, it’s relevant to this pep that this is addressed

erictraut · November 7, 2024, 12:33am

I think @mikeshardmind is referring to this.

As a general principle, I don’t think that a type checker should suppress errors because the code is located in a try block. Doing so can lead to false negatives because exceptions are not modeled in the type system. This becomes an issue only if you (incorrectly, IMO) use exceptions for normal (non-exceptional) code flow conditions.

mikeshardmind · November 7, 2024, 12:40am

Yeah, that’s what I was referring to in terms of prior support then removing it.

While I generally agree with you about exceptions, optional structural types in particular are problematic with concurrency and .get or try/except are the only correct options. As python’s dict’s don’t have a .get_or_else(key, {lazy closure}), implementing the correct behavior lazily with expensive defaults while satisfying a type checker pushes users in the wrong direction.

PIG208 · November 12, 2024, 12:53am

I agree with this concern. I agree that this comes from the PEP not discussing type narrowing in general. As you mentioned here:

While drafting the PEP, I was more inclined toward allowing indexing with string literals, even if extra_items is specified. I think we can’t express that with __setitem__ and __getitem__ overrides soundly at this time, and the special handling needed for this would be a tradeoff.

We should mention that supporting more sophisticated type narrowing with arbitrary keys is not a goal of the PEP. Adding that to a new section on type narrowing will help set the expectation for both type checkers and users.

(Sidenote: I think the guards that work with named NotRequired items should also work with extra_items, but that will be up to type checkers that already implement such checks)

Yes. Looking at Type narrowing — typing documentation, it should be beneficial to specify type narrowing behavior. This PEP doesn’t intend to extend or limit how type narrowing works.

Viicos · January 6, 2025, 10:49am

There are a couple things that I find confusing in this PEP:

The motivation section states that a TypedDict can have extra items that are not visible through its type. The way I naively understand this sentence is that the following should be allowed:
```
class TD(TypedDict):
    a: int

td: TD = {'a': 1, 'extra': True}
```
By reading this section of the spec, you better understand the assignability rules, so probably the PEP section should link to this part of the spec. Just to be sure I understand correctly, the fact that the above snippet raises an error is only because type checkers special case typed dictionaries assignments? Because with normal classes, this works:
```
class A: pass
class B(A): pass

a: A = B()
```
Alternatively, the PEP could emphasize on the fact that closed typed dictionaries are only relevant in a context where assignability of arguments to parameters is involved (as “simple” assignments (a: <typ> = value) seems to be special cased as I mention above), e.g.:
```
class TD1(TypedDict):
    a: int

class TD2(TypedDict):
    a: int
    b: str

def func(a: TD1): pass

td2: TD2 = ...
func(td2)  # OK
```
It is stated that closed=True is equivalent to extra_items=Never. Does this mean that closed=False (the default) is equivalent to extra_items=object? This currently not the case (playground):
```
class TD(TypedDict):
    a: int


class TDExtra(TypedDict, extra_items=object):
    a: int

td1: TD = {'a': 1, 'extra': True}  # Error
td2: TDExtra = {'a': 1, 'extra': True}  # OK
```
One could think that @final can be used to mark a TypedDict as closed. It might be worth mentioning why this is not the case (does not work with the functional syntax, and as the spec example shows, you don’t need to explicitly create subclasses to have the assignability rules applied).

decorator-factory · February 12, 2025, 1:41pm

A question about inheritance. The PEP allows inheritance of a closed TypedDict.

from typing_extensions import TypedDict

class HasX(TypedDict, closed=True):
    x: int

class HasXY(HasX):
    x: int
    y: int

has_xy: HasXY = {"x": 1, "y": 2}
has_x: HasX = has_xy  # allowed by pyright

Should this assignment be allowed?

If yes, then it’s not clear what closed does compared to a normal TypedDict – has_x is a HasX with keys other than x.
If no, then it can be unintuitive that subclass instances are not assignable to superclass instances. (But maybe that’s fine?)

Daverball · February 12, 2025, 2:07pm

I think you misread the PEP. closed=True is equivalent to extra_items=Never. If you then apply the rules for what you’re allowed to do in subclasses defined here it naturally follows that your example is disallowed^[1].

And in fact if you try your example with pyright’s experimental support for PEP-728, then both your subclass and the assignment are marked as errors:

Code sample in pyright playground

from typing import TypedDict

class HasX(TypedDict, closed=True):
    x: int

class HasXY(HasX): # error
    x: int
    y: int

has_xy: HasXY = {"x": 1, "y": 2}
has_x: HasX = has_xy # error

The assignment and subclass is only allowed with the old non-closed TypedDict.

HasXY is not a valid subclass of HasX ↩︎

decorator-factory · February 12, 2025, 2:26pm

Ah, sorry, I didn’t realize you have to enable an extra feature in Pyright. I was confused because it allowed closed=True as a kwarg (whereas it complains if you provide an unknown kwarg like foo=True)