Structural subtyping, EAFP, and exceptions which inform structure

mikeshardmind · May 23, 2024, 2:42pm

Up until recently, mypy and pyright both allowed try/except/KeyError use with partial typed dicts.

from typing import TypedDict
class A(TypedDict, total=False):
    test: str

def get_test(a: A) -> str:
    try:
        value = a['test'] # error here
    except KeyError:
        return 'ok'
    else:
        return f'received: {value}'

As of a recent change, pyright no longer allows this.

This has led to divergence in what is allowed between type checkers, and I’d appreciate if we could decide on if this should be allowed and place it into the specification to ensure there isn’t divergence between type checker behavior. I believe that type checkers should need to understand control flow errors and how they interact with known types at runtime.

I believe this case should be allowed, as typed dicts are structural, and KeyErrors are a means of detecting a structural mismatch, but that other cases which fit the same pattern should similarly type check as users expect.

Jelle · May 23, 2024, 2:44pm

Are you proposing that type checkers should behave specially when inside an except KeyError, or that using subscripting to access a non-required field should not be an error in general?

If the former, should this apply just to catching KeyError, or should except Exception also silence the error?

mikeshardmind · May 23, 2024, 2:46pm

Within a try block, if the type is optional, but known when it exists (such as the case with total=False) it should be assumed to have that type.

Within an exception block, no special behavior should exist (other exceptions could have gotten there)

Within an else block, the accessed type which was optional should be assumed to exist.

mikeshardmind · May 23, 2024, 2:53pm

The underlying logic here, is that with total = false, the underlying signature should approximate __getitem__(self, non_guaranteed_key) -> KnownType | Never

This is sound from a theory PoV as well, this isn’t just special casing python’s exception flow (and part of why I would suggest no special behavior in the case of an exception caught)

pf_moore · May 23, 2024, 2:55pm

I assume the error you mean is:

c:\Work\Scratch\x.py
  c:\Work\Scratch\x.py:7:17 - error: Could not access item in TypedDict
    "test" is not a required key in "A", so access may result in runtime exception (reportTypedDictNotRequiredAccess)

That error makes no sense to me. Yes, “test” may result in a runtime exception, but I’m deliberately catching that exception, so of course I know that and don’t consider it an error.

If I’d spotted that, I would have reported it as a bug - what’s the logic that says it’s correct to flag it as an error?

mikeshardmind · May 23, 2024, 2:58pm

sorry, clarification on the above but its beyond the discourse edit limit for mailing list mode users, in an else block where a KeyError is included in what would be caught, as well as in the absence of an else block where a caught keyerror results in unreachable code below (via reraising or returning), the value should then be assumed to have the typed value, but it is still not safe to always assume the key continues existing, only the value accessed if it was stored as a local.

erictraut · May 23, 2024, 3:04pm

If we’re going to consider mandating the suppression of certain type checks within a try block (or a with block, which can also suppress exceptions), can we come up with a more general principle? Why would this apply only to KeyError exceptions or only to TypedDict accesses? If you believe in the EAFP philosophy of using exception handling for regular (non-exceptional) control flow, then this should apply to any code that potentially raises exceptions, right?

mikeshardmind · May 23, 2024, 3:14pm

I can work through such a thing, I’m attempting to do so without introducing a reason for typed exceptions (awful…) or algebraic effects (neat, but far beyond the scope of this issue)

I think the general principle is that for types which have a structure that is only partially known, guarded access to that structure should be considered a safe pattern for verifying the structure. I know that’s relatively nebulous, but TypedDict is currently one of only two things I can see this applying to, the other is Iterators, but those already work properly with next and loop control flow incidentally. (the length of iterators is not statically known)

These things should only retain a known type in the “happy path” as that is both the point of EAFP, and we can’t know definitively what caused an exception statically without checked exceptions, algebraic effects, or something else along those lines.

I think what people are likely to encounter and what is easy to have well defined behavior for right now is perfectly overlapping and relatively narrow right now, but it could effect things more if other structural types which support a partial structure are ever added. The one example of this that some people want is partial protocols. While I’ve argued against these for other reasons, and think additional composition is the better tool rather than ambiguous partial inclusion, AttributeError would almost certainly be the appropriate caught exception for those if they are added, and should only apply to things that are explicitly listed as optional in the protocol.

I don’t quite believe in it, but It’s unfortunately both idiomatic and the most performant option. I think EAFP is easier to support with algebraic effects (but that’s a bigger dream than HKTs and not even on my radar for python), and that we likely should keep it limited to things that the type system has a reason to understand already.

mikeshardmind · May 23, 2024, 3:27pm

Oh, and because I didnt clarify this enough before (I’m sorry) the reason we can’t continue to assume the key exists after access once is super important here. It’s valid to say that something like an lru dict (as provided by lru-dict on pypi) is conformant to a partial typeddict, it’s possible for threading and shared access to have that key be popped, and so on. It’s only valid where we have checked it. Thankfully, most real world uses are going to store a local.

We only assume validity in the path we can see, and this matches the behavior for Iterators with type checkers and StopIteration, the only difference here is StopIteration is something we definitively view as a control flow exception, but EAFP is definitely using exceptions as control flow as well.

antonagestam · May 23, 2024, 7:06pm

In the trivial case, where there is a single subscript operation in the try block, this seems to make sense, but does it equally make sense in an arbitrarily complex try-block?

(Asking curiously for discussion, I haven’t come up with any concrete counter example)

mikeshardmind · May 23, 2024, 7:26pm

I think it does for arbitrary try/except where the appropriate error (keyerror, in the case of typeddict) is caught. In any case where

non_total_dict: SomeTypedDict  
# type of value for key: "key" is int, but total=False

try:
    # [arbitrary expressions here]
    x = non_total_dict["key"]
    # [arbitrary expressions here]
except KeyError:
    # no assumption about x's type allowed
else:
    # x must be an int

and also:

try:
    # [arbitrary expressions here]
    x = non_total_dict["key"]
    # [arbitrary expressions here]
except KeyError:
    # [ some arbitrary statements, maybe logging, idk]
    raise  # or return

# x must be an int

determining that x must not be unbound here would be an equivalent means of determining that x have a value, and that the type of that value must be int if this provides a better way for type checkers to implement this.

For completeness, in a finally block, we also shouldn’t assume based on EAFP patterns.

I needed to give this more thought and double-check what we can know statically from context managers. I don’t think the behavior I gave for else holds for context managers which indicates that they suppress exceptions (via their return value), as statically we only know that if we continue executing from a context manager there either was an exception that the context manager could handle, or that there wasn’t an exception.

We can’t know which exceptions were handled statically or if there wasn’t an exception, but the intent was to handle a specific class of exception, so context managers that suppress some exceptions can’t be statically known to be being used for this pattern.

If the examples above were modified for contextmanager use (including anything from open to contextlib.supress(KeyError) and everything inbetween), x could be unbound.

For context managers which indicate that they don’t handle exceptions, they also wouldn’t be acting as a guard on partial structural access, and aren’t participating in EAFP-like patterns.

alicederyn · May 23, 2024, 8:04pm

This seems to be a theme, that dict lookups are the most common “unknown state” operations by far: the same objection is being raised for adding a get to Sequences, namely that it is just not common enough to be worth adding. Generally the idiom for sequences seems to be either LBYL (check the length!) or iteration.

One other case that I’ve come across is optional dicts, where it’s sometimes cleaner to treat it as a dict and catch the error if it was actually a None. I don’t know if that’s peculiar to our company though?

The only other case I can think of is accessing JSON where you don’t know the type for sure but you’re happy to EAFP and catch the TypeErrors, but I think JSON is normally typed as Any so this wouldn’t be relevant.

I hope we come down on the side of allowing idiomatic EAFP for dicts. Accessing nested dicts with get is quite ugly:

foo.get("bar", {}).get("baz", {})....

If those values can be None, it’s even worse:

...((foo.get("bar") or {}).get("baz") or {})....

mdrissi · May 23, 2024, 9:20pm

My own experience generally leans towards check first and continue with main awkwardness being

try:
  x: str
  y: str | int
  _ = x + y
except TypeError:
  ...

Why is this so special to dicts when many static type errors can be similarly try/excepted. For nested dict case I agree that ...((foo.get("bar") or {}).get("baz") or {})... this has poor readability although in practice what I do is,

maybe_get_key(foo, "bar.baz.s")

where maybe_get_key is small utility function. Pretty common to have utility though.

My own codebase/projects would be mostly neutral to this as I generally don’t use except blocks that much. If we did make this special to dict I’d still prefer it be restricted to,

try:
  x["maybe_key"]
except KeyError:
  ...

and not general broad exception catch. If you have explicit except KeyError I can see you likely are fine with failed access, but general Exception does not make me immediately assume most dict lookups are fail prone here.

alicederyn · May 23, 2024, 9:35pm

How do you write a type signature for that that works well with TypedDicts?

mikeshardmind · May 23, 2024, 9:35pm

to be clear, I’m not advocating this for all dicts only for typing.TypedDict, and even then, only in the case where total=False or for keys marked with typing.NotRequired The specialness here is that we have specifically encoded information into the type system that something has an expected type if it exists, but that we can’t be sure it exists.

I believe this is the only case narrow enough to be supported here. This is a structural type, and checking if accessing those keys works is the only way to actually check the structure. key in dict is one such test (but I don’t think this one should narrow, as the key could be removed if the dict is shared between threads before you use it), dict.get(key) is another that itself uses if a KeyError is raised. try/except is idiomatic and since 3.11 is the most performant option.

mikeshardmind · May 23, 2024, 9:42pm

This example shows why we need to keep it narrowly tailored and should be very careful. It’s possible that y is a subclass of int that implements __radd__ for strs, or that x is a subclass of str that implements __add__ for ints. The returned type in either of these cases can’t be known, it doesn’t have something to conform to from either x or y.

this succeeding doesn’t tell us anything other than that _ is bound, which isn’t useful type information, and it also isn’t an idiomatic pattern.

mdrissi · May 23, 2024, 9:46pm

The return type is vague one either object/Any/some union of underlying expected dict contents for path style lookups. I often work with json so common one I use is JSON = str | int | dict[str, JSON] | list[JSON] | float | None | bool roughly for return type. If caller has specific expectations of that key an assert.

mikeshardmind · November 7, 2024, 12:15am

Adding a clarifying comment as to the intended scope, where previously I only mentioned that this should be narrowly scoped.

This should not silence arbitrary exceptions, only exceptions where we have a defined structural type with explicit support in the type system for an “allowed missing” part of the structure.

# d is a typed dict
try
   v = d["key"]  # the value for the typed dict for "key" is NotRequired
except KeyError:  # LookupError should also catch this
    # v is unbound
else:
    # v is bound and of the type specified
finally:
    # v is possibly unbound

If the following happens instead:

# d is a typed dict
try
    v = d["key"]  # the value for the typed dict for "key" is NotRequired
    something_else()
except KeyError:  # LookupError should also catch this
    # v is possibly unbound, something_else() could raise the exception
else:
    # v is bound and of the type specified
finally:
    # v is possibly unbound

The broad catching of BaseException should not trigger this suppression (keyboard interrupts, system exit). I don’t think Exception should either, but that’s more philosophical perspective and flexible.

Liz · November 7, 2024, 12:35am

While your most recent clarification is nice, I like the “no assumption” part you had here: Structural subtyping, EAFP, and exceptions which inform structure - #11 by mikeshardmind for the exception block more.

Eneg · November 19, 2024, 2:48am

I think the nature of the problem is twofold:

does the assignment to x succeed?
what type does the right-hand side of x = ... evaluate to (in the happy path)?

try:
    x = ...
    # x is certainly bound (within the try)
except ...:
    # x can be unbound
else:
    # x is certainly bound
finally:
    # x can be unbound

# x can be unbound

Correct me if I’m wrong, but the exact exception being caught shouldn’t impact the type inference here, but rather code reachability. ^[1] In principle, this pattern could be used for anything, ^[2] as it becomes a matter of “does the RHS result in a statically known type?”

Note that I am not opposed to limiting the scope to TypedDicts, just wanted to give larger picture

a type checker with knowledge of exceptions being raised by the right-hand could determine the else block to be unreachable ↩︎
attribute access on unknown objects, operations on X | None, etc ↩︎