Treat TypedDict as structured types w.r.t subclassing

ipriven · January 19, 2024, 5:13am

TypedDict already functions as a structured type w.r.t assigning dicts

class T1(TypedDict):
    foo: int
    
def f(a: T1) -> None:
    pass

f({'foo': 42})

and other TypedDicts:

class T2(TypedDict):
    foo: int

t1: T1 = T2(foo=42)

However, in one aspect (at least) mypy and pyright differ:

class T3(TypedDict)
   bar: int

u: T1 | T3
if "foo" in u:
  ...

pyright would narrow u: T1 but mypy would keep u: T1 | T3 since T3 is not final.

I’d like to discuss amending the TypedDict spec to clarify that we shouldn’t consider class hierarchy when type-matching, since (a) it’s not compatible with treating it as a structural type, and (b) more importantly, it’s practical.

P.S. this policy discussed in pyright issue #1899; additional rationale here.

erictraut · January 19, 2024, 5:51am

The type consistency rules for TypedDict are already clearly spelled out in the typing spec. Do you have something more in mind?

FWIW, I recently wrote TypedDict conformance tests based on the spec. As you can see from the latest conformance test results summary, both mypy and pyright fully pass the conformance tests when it comes to TypedDict type consistency rules.

You mentioned that the type narrowing behavior between pyright and mypy is different. Type narrowing behavior is not dictated by the typing spec (other than the behavior of TypeGuard), so it’s not surprising that you will see minor differences between type checkers here. This typically isn’t a problem because type narrowing doesn’t affect library interfaces or type stubs. Unless you’re using different type checkers on the same code base (which isn’t typical), the differences in type narrowing behavior shouldn’t be an issue.

ipriven · January 19, 2024, 6:24am

To be frank, I’m just following this suggestion. Do you think narrowing behavior just doesn’t belong in the typing spec?

erictraut · January 19, 2024, 6:36am

I’m interested to hear from @hauntsaninja on that question.

kkirsche · January 19, 2024, 8:01am

This typically isn’t a problem because type narrowing doesn’t affect library interfaces or type stubs. Unless you’re using different type checkers on the same code base (which isn’t typical), the differences in type narrowing behavior shouldn’t be an issue.

I would push back that it isn’t typical to see different type checkers used on the same code base. Each type checker has distinct strengths and weaknesses as well as levels of integration with tools such as IDEs. At least in my anecdotal experience at a large company, most code bases I encounter run against multiple type checkers to ensure consistent behavior regardless of developer and their environment. This may be atypical of public code bases, but at least the private environments I have worked in have consistently used multiple type checkers.

hauntsaninja · January 19, 2024, 10:44pm

Thanks for opening the discussion — and apologies, I should have been more explicit about what I’d like to get out of this discussion.

I agree that the most important thing is standardising the semantics of annotated symbols. With that in mind, it’s not currently specified what it should mean for a TypedDict to be marked as @final. See also final TypedDict · Issue #7981 · python/mypy · GitHub

If we decide that final TypedDict is not meaningful, I’d merge Ilya’s PR. If we decide it could mean something like “no extra keys” (either by deciding so or leaving it unspecified), then I’d worry more about the small soundness hole in Ilya’s PR, since users could have the expressivity to get the behaviour they desire more soundly. Ilya explains the soundness hole nicely here.

(If you’re missing background here, the discussion on the three linked issues is useful)

In that case, the secondary not-specification-related thing I’m interested in here is seeing what the community thinks of the soundness hole. Ilya pinged me several times on the PR asking for merge, and I usually don’t feel comfortable introducing unsoundness unless the community clearly desires it (and the relevant issues on mypy aren’t currently particularly popular).

To Eric’s general question, narrowing is definitely not nearly as important as standardising what annotations mean. There’s still some value in doing so and in particular there’s value in consolidating thinking about soundness holes, but I don’t think of narrowing as a priority for Typing Council

guido · January 20, 2024, 5:00pm

So it would seem that @final applied to a structural type is just nonsense, right? It might prevent you from subclassing, sure, but it doesn’t prevent you from defining another class that’s completely equivalent, so type checking can’t take its presence into account.

In the light of this, the whole in narrowing business seems wrong too.

ipriven · January 21, 2024, 3:04am

I’m also inclined to think that @final is alien to structural typing. The things it allows a developer to express, ultimately misleads developers intuition about what structural typing is and isn’t.

In the light of this, the whole in narrowing business seems wrong too.

Given

class T1(TypedDict):
  spam: int
class T2(TypedDict):
  ham: int

u: T1 | T2

then "spam" in u can either

keep u: T1 | T2
narrow u: T1
expand u: (T1 | T2) & {"spam": object}

If we pick (1), the tradeoff is that users get no idiomatic way to discriminate, say, Response | Error. I don’t have stats (didn’t notice an awful number of mypy tickets) so this is very anecdotal. Personally I often rely on this form of narrowing in TypeScript (where “shapes” are much more common than classes).

If we pick (2), then it creates a hole:

class T3(T2):
  spam: str

if "spam" in u:
   print(u["spam"] + 42)  # could be TypeError if `spam` is `str`

If we pick (3), it seems sound but not very useful.

Perhaps there could be a middle-ground with feature that’s a “field-level final”, e.g.

class T2(TypedDict):
  ham: str
  spam: Never

Right now it, alas, does nothing.

the relevant issues on mypy aren’t currently particularly popular

Yes, for some reason I imagined there’s more users stumbling into this.

guido · January 21, 2024, 4:46am

This is your key use case. For the longest time my response to this has been that the preferred idiom is to use regular classes instead of TypedDict, and if your input is a bunch of JSON you should use some kind of schema-based library that turns a bunch of nested JSON dicts into a bunch of nested regular classes.

I still personally prefer that (except for quick prototypes or throwaway code, where the issue of idiomatically discriminating with support of the type checker is less of a concern), but I understand that there are good reasons why people prefer to use raw TypedDicts even in production code. For example, few things are faster than raw JSON, so the conversion to regular classes is likely to add some overhead that’s hard to gain back in other ways, even if manipulating regular class instances is faster than manipulating raw dicts. (Do note than starting with 3.11 we’re improving the performance of regular class instances relative to dicts.)

Perhaps your guess is right and this is just not something that people are doing a lot with TypedDicts.

Perhaps the most idiomatic way to discriminate between e.g. Response | Error is to use a type guard (PEP 647). You could write type guards

def is_response(x: Response | Error) -> TypeGuard[Response]:
    return "resp" in x
def is_error(x: Response | Error) -> TypeGuard[Error]:
    return "err" in x

and then use those in the rest of your code. It’s idiomatic, and sound. And Eric is working on an extension of the concept so that in the else clause you can assume that if "resp" in x is false, we may conclude that x is not a Response.

While more verbose, this seems better to me than living with an unsound solution.

(PS: There seem to be some typos in your latest message. Case (2) probably should be u: T1, and later you print u.spam which should be u["spam"].)

ipriven · January 21, 2024, 5:25am

Dedicated type guard are indeed usable for discrimination between top-level concepts like “response” and “error”. I suppose when you discriminate often and among domain-specific models (e.g. BookReview having a book vs MovieReview having a movie), a parallel hierarchy of type guards can get unsightly. At that point, team members would challenge you whether remembering to import and use is_book_review that much better than a cast (or kindly ask you to throw out your fancy TypedDicts and bring back the dicts). Narrowing with a language feature you ought to use anyway (to avoid a runtime error) is zero-cost from that perspective.

p.s. thanks for noticing the typos - fixed!

guido · January 21, 2024, 6:40am

Yeah, the semantics just don’t cooperate.