Structural subtyping, EAFP, and exceptions which inform structure

Up until recently, mypy and pyright both allowed try/except/KeyError use with partial typed dicts.

from typing import TypedDict
class A(TypedDict, total=False):
    test: str

def get_test(a: A) -> str:
    try:
        value = a['test'] # error here
    except KeyError:
        return 'ok'
    else:
        return f'received: {value}'

As of a recent change, pyright no longer allows this.

This has led to divergence in what is allowed between type checkers, and I’d appreciate if we could decide on if this should be allowed and place it into the specification to ensure there isn’t divergence between type checker behavior. I believe that type checkers should need to understand control flow errors and how they interact with known types at runtime.

I believe this case should be allowed, as typed dicts are structural, and KeyErrors are a means of detecting a structural mismatch, but that other cases which fit the same pattern should similarly type check as users expect.

2 Likes

Are you proposing that type checkers should behave specially when inside an except KeyError, or that using subscripting to access a non-required field should not be an error in general?

If the former, should this apply just to catching KeyError, or should except Exception also silence the error?

Within a try block, if the type is optional, but known when it exists (such as the case with total=False) it should be assumed to have that type.

Within an exception block, no special behavior should exist (other exceptions could have gotten there)

Within an else block, the accessed type which was optional should be assumed to exist.

The underlying logic here, is that with total = false, the underlying signature should approximate __getitem__(self, non_guaranteed_key) -> KnownType | Never

This is sound from a theory PoV as well, this isn’t just special casing python’s exception flow (and part of why I would suggest no special behavior in the case of an exception caught)

I assume the error you mean is:

c:\Work\Scratch\x.py
  c:\Work\Scratch\x.py:7:17 - error: Could not access item in TypedDict
    "test" is not a required key in "A", so access may result in runtime exception (reportTypedDictNotRequiredAccess)

That error makes no sense to me. Yes, “test” may result in a runtime exception, but I’m deliberately catching that exception, so of course I know that and don’t consider it an error.

If I’d spotted that, I would have reported it as a bug - what’s the logic that says it’s correct to flag it as an error?

1 Like

sorry, clarification on the above but its beyond the discourse edit limit for mailing list mode users, in an else block where a KeyError is included in what would be caught, as well as in the absence of an else block where a caught keyerror results in unreachable code below (via reraising or returning), the value should then be assumed to have the typed value, but it is still not safe to always assume the key continues existing, only the value accessed if it was stored as a local.

If we’re going to consider mandating the suppression of certain type checks within a try block (or a with block, which can also suppress exceptions), can we come up with a more general principle? Why would this apply only to KeyError exceptions or only to TypedDict accesses? If you believe in the EAFP philosophy of using exception handling for regular (non-exceptional) control flow, then this should apply to any code that potentially raises exceptions, right?

2 Likes

I can work through such a thing, I’m attempting to do so without introducing a reason for typed exceptions (awful…) or algebraic effects (neat, but far beyond the scope of this issue)

I think the general principle is that for types which have a structure that is only partially known, guarded access to that structure should be considered a safe pattern for verifying the structure. I know that’s relatively nebulous, but TypedDict is currently one of only two things I can see this applying to, the other is Iterators, but those already work properly with next and loop control flow incidentally. (the length of iterators is not statically known)

These things should only retain a known type in the “happy path” as that is both the point of EAFP, and we can’t know definitively what caused an exception statically without checked exceptions, algebraic effects, or something else along those lines.

I think what people are likely to encounter and what is easy to have well defined behavior for right now is perfectly overlapping and relatively narrow right now, but it could effect things more if other structural types which support a partial structure are ever added. The one example of this that some people want is partial protocols. While I’ve argued against these for other reasons, and think additional composition is the better tool rather than ambiguous partial inclusion, AttributeError would almost certainly be the appropriate caught exception for those if they are added, and should only apply to things that are explicitly listed as optional in the protocol.

I don’t quite believe in it, but It’s unfortunately both idiomatic and the most performant option. I think EAFP is easier to support with algebraic effects (but that’s a bigger dream than HKTs and not even on my radar for python), and that we likely should keep it limited to things that the type system has a reason to understand already.

Oh, and because I didnt clarify this enough before (I’m sorry) the reason we can’t continue to assume the key exists after access once is super important here. It’s valid to say that something like an lru dict (as provided by lru-dict on pypi) is conformant to a partial typeddict, it’s possible for threading and shared access to have that key be popped, and so on. It’s only valid where we have checked it. Thankfully, most real world uses are going to store a local.

We only assume validity in the path we can see, and this matches the behavior for Iterators with type checkers and StopIteration, the only difference here is StopIteration is something we definitively view as a control flow exception, but EAFP is definitely using exceptions as control flow as well.

In the trivial case, where there is a single subscript operation in the try block, this seems to make sense, but does it equally make sense in an arbitrarily complex try-block?

(Asking curiously for discussion, I haven’t come up with any concrete counter example)

I think it does for arbitrary try/except where the appropriate error (keyerror, in the case of typeddict) is caught. In any case where

non_total_dict: SomeTypedDict  
# type of value for key: "key" is int, but total=False

try:
    # [arbitrary expressions here]
    x = non_total_dict["key"]
    # [arbitrary expressions here]
except KeyError:
    # no assumption about x's type allowed
else:
    # x must be an int

and also:

try:
    # [arbitrary expressions here]
    x = non_total_dict["key"]
    # [arbitrary expressions here]
except KeyError:
    # [ some arbitrary statements, maybe logging, idk]
    raise  # or return

# x must be an int

determining that x must not be unbound here would be an equivalent means of determining that x have a value, and that the type of that value must be int if this provides a better way for type checkers to implement this.

For completeness, in a finally block, we also shouldn’t assume based on EAFP patterns.

I needed to give this more thought and double-check what we can know statically from context managers. I don’t think the behavior I gave for else holds for context managers which indicates that they suppress exceptions (via their return value), as statically we only know that if we continue executing from a context manager there either was an exception that the context manager could handle, or that there wasn’t an exception.

We can’t know which exceptions were handled statically or if there wasn’t an exception, but the intent was to handle a specific class of exception, so context managers that suppress some exceptions can’t be statically known to be being used for this pattern.

If the examples above were modified for contextmanager use (including anything from open to contextlib.supress(KeyError) and everything inbetween), x could be unbound.

For context managers which indicate that they don’t handle exceptions, they also wouldn’t be acting as a guard on partial structural access, and aren’t participating in EAFP-like patterns.

2 Likes

This seems to be a theme, that dict lookups are the most common “unknown state” operations by far: the same objection is being raised for adding a get to Sequences, namely that it is just not common enough to be worth adding. Generally the idiom for sequences seems to be either LBYL (check the length!) or iteration.

One other case that I’ve come across is optional dicts, where it’s sometimes cleaner to treat it as a dict and catch the error if it was actually a None. I don’t know if that’s peculiar to our company though?

The only other case I can think of is accessing JSON where you don’t know the type for sure but you’re happy to EAFP and catch the TypeErrors, but I think JSON is normally typed as Any so this wouldn’t be relevant.

I hope we come down on the side of allowing idiomatic EAFP for dicts. Accessing nested dicts with get is quite ugly:

foo.get("bar", {}).get("baz", {})....

If those values can be None, it’s even worse:

...((foo.get("bar") or {}).get("baz") or {})....

My own experience generally leans towards check first and continue with main awkwardness being

try:
  x: str
  y: str | int
  _ = x + y
except TypeError:
  ...

Why is this so special to dicts when many static type errors can be similarly try/excepted. For nested dict case I agree that ...((foo.get("bar") or {}).get("baz") or {})... this has poor readability although in practice what I do is,

maybe_get_key(foo, "bar.baz.s")

where maybe_get_key is small utility function. Pretty common to have utility though.

My own codebase/projects would be mostly neutral to this as I generally don’t use except blocks that much. If we did make this special to dict I’d still prefer it be restricted to,

try:
  x["maybe_key"]
except KeyError:
  ...

and not general broad exception catch. If you have explicit except KeyError I can see you likely are fine with failed access, but general Exception does not make me immediately assume most dict lookups are fail prone here.

1 Like

How do you write a type signature for that that works well with TypedDicts?

to be clear, I’m not advocating this for all dicts only for typing.TypedDict, and even then, only in the case where total=False or for keys marked with typing.NotRequired The specialness here is that we have specifically encoded information into the type system that something has an expected type if it exists, but that we can’t be sure it exists.

I believe this is the only case narrow enough to be supported here. This is a structural type, and checking if accessing those keys works is the only way to actually check the structure. key in dict is one such test (but I don’t think this one should narrow, as the key could be removed if the dict is shared between threads before you use it), dict.get(key) is another that itself uses if a KeyError is raised. try/except is idiomatic and since 3.11 is the most performant option.

2 Likes

This example shows why we need to keep it narrowly tailored and should be very careful. It’s possible that y is a subclass of int that implements __radd__ for strs, or that x is a subclass of str that implements __add__ for ints. The returned type in either of these cases can’t be known, it doesn’t have something to conform to from either x or y.

this succeeding doesn’t tell us anything other than that _ is bound, which isn’t useful type information, and it also isn’t an idiomatic pattern.

The return type is vague one either object/Any/some union of underlying expected dict contents for path style lookups. I often work with json so common one I use is JSON = str | int | dict[str, JSON] | list[JSON] | float | None | bool roughly for return type. If caller has specific expectations of that key an assert.