Problems with TypeIs

erictraut · July 20, 2024, 9:56pm

As I mentioned in another related thread, it is possible to write an implementation of a TypeIs function that subverts the rules of the type system. There are not-so-subtle ways to get it wrong (such as unconditionally returning True), and there are more subtle ways. This thread contains some good examples of implementations that are subtly wrong from a soundness perspective.

I can speak from personal experience (having implemented the type guard functionality in pyright) that it’s difficult to get type narrowing logic correct in all cases. There are many conditions to consider, and the rules can be abstruse. Even in cases where the rules are well established and clear, there are tradeoffs between usability and soundness that must be considered.

From my perspective, the implementation of the TypeIs function in Mike’s clever example subverts the rules of the type system and therefore introduces unsoundness. Given the (carefully crafted) conditions of this example, one cannot determine whether a value of type A[int | str] can be safely narrowed to an A[int] simply by doing an isinstance check of a.i. This check could preserve soundness if one were to either: 1) mark A as @final, or 2) modify the value constraints for X so they are non-overlapping.

If your goal in writing this TypeIs function is not full soundness but rather some balance between soundness and usability, then it’s a fine implementation and will probably serve you well. This depends on your use case and goals. If you’re a stickler for type soundness, then this implementation is problematic, as Mike has demonstrated.

One can think of the TypeIs mechanism (and the TypeGuard mechanism that preceded it) as a way to extend the functionality of a type checker. It’s somewhat akin to mypy’s plugin mechanism but more constrained and standardized. Just as it’s possible to write a mypy plugin that subverts the rules of the type system, it’s possible to write a TypeIs function that does so. That doesn’t mean TypeIs should be thrown out. (Nor should mypy’s plugin mechanism.) TypeIs is important and useful because type checkers provide built-in support for only a limited number of expression forms. If your needs go beyond the built-in capabilities of your type checker, TypeIs provides a way to handle additional use cases. It is a tool that can be used (or abused) as you see fit.

Jelle wrote a guide that helps users avoid some common mistakes with TypeIs. This guide could be extended if we want to document additional considerations, but I’ll note that it already goes well beyond any guidance I’ve seen in the TypeScript world on this topic. (In fact, the example provided in the official TypeScript guide is demonstrably unsound. Edit: actually, it is sound because they’re making a copy of the array.) In the Python world, I feel pretty good about the guidance we have in place now.

mikeshardmind · July 20, 2024, 10:44pm

I’ve gone ahead and PR’d more examples based on the discussion here and what came from many users refining that example. While I’m not a fan of TypeIs, I think it’s going to be jarring the first time people run into issues with generic narrowing, which wasn’t previously covered, especially since most of the time, in relatively typical cases, it won’t be an issue, so I’d like to make sure there’s some kind of reference for why this case is less safe than things like isinstance or even narrowing not involving generics, as well as provide a couple brief examples of a little more work, or alternatives when soundness ends up needed.

tungol · July 22, 2024, 2:29am

I think this is about the first example in this thread that’s made sense to me. The type checker is unable to tell that A[int] & B[?] is not B[int].

There’s a discussion above about whether A[int] & B[int | str] is an uninhabited type, but re-framing I think we could also agrue whether the unsafety in this example occurs in the narrowing at TypeIs[A[int]] or whether it occurs in the narrowing at isinstance(x, B).

The crux of the issue appears to be here:

x: A[int]
if isinstance(x, B):
    reveal_type(x)

given that, it seems to me that it’s unsafe to narrow x to B[int]. I’d put the blame here rather than on the TypeIs function, which is fine until you try to use something outside of A.

This does look like a place mypy and pyright differ: mypy gives me B[Any] to pyright’s B[int]

Liz · July 22, 2024, 7:16am

The TypeIs function is missing the information it needs to have narrowed B, and did something outside of A by narrowing B.

Narrowing to B[Any] is incorrect from a point of theory here, and this really should go back to the fact that B was defined in such a way as to ensure that A[int] & B[?] would be B[int] and only B[int].

I don’t think there’s any discussion to have on this. B was defined so that All B[int] are A[int], All B[int | str] are A[int | str], all B[str] are A[str], and that there are no other options for B[?]

But it is that. The error is not this, but that we have incorrectly told the type checker that B[int | str] is A[int] with a TypeIs function that did not account for possible subclasses.

tungol · July 22, 2024, 9:34am

It seems to me that a couple things are true:

a function for TypeIs[A[int]] cannot meaningfully distinguish between A[int] and A[int | str] at runtime
It’s safe (in practice if not theory) to narrow B[int | str] to just A[int] as long as subsequent code sticks to just the interface of A
Doing so is a bit lossy: trying to go back to B afterwards is where we get in trouble.
in order to round-trip safely like this, some symmetry is required. Either the TypeIs function needs to recognize that it can’t can’t assume A[int] over A[int | str], or promoting back to the subclass needs to recognize that B[int] can’t be assumed given an object which looks like A[int].

Either seems reasonable on the face of it, and claiming that one is definitionally incorrect seems premature given that this bit of narrowing behavior isn’t part of python’s type specification (at the moment) and major implementations currently disagree. Like Eric said, it depends on your use case and goals.
You could avoid the issue by using a TypeIs[B[int]] function in place of isinstance(x, B), which also has the benefit on not relying on narrowing behavior that’s currently not standardized. That seems like it’s probably the safest option when dealing with generic types, at least for the time being.

mikeshardmind · July 22, 2024, 6:18pm

This isn’t really the case. a TypeIs function that narrows a generic type by it’s subscripted type can only narrow exact matches, not subclasses safely, as it cannot know if the subclass does new things the function doesn’t know about. Therefore it can narrow A, but not all subtypes of A without knowledge of them also encoded.

The examples were specifically designed to show that a python type checker can’t guarantee that subsequent code will never further narrow the generic class due to python type checking basically only happening at function boundaries + checking individual function bodies. This is therefore a moot point unless you mean to use it to say that TypeIs is never safe at all, which isn’t true either, TypeIs is safer than that.

I’m not sure where you’re coming from here that either of these are reasonable. Theory states that A[int] & B would be B[int], a major implementation disagreeing with that would just be a bug in that implementation, and there’s really no room to argue otherwise here. However, you can test various assignability and see that all type checkers agree that an instance of B[int | str] would not be assignable to A[int], and is not compatible with it, see Problems with TypeIs - #60 by mikeshardmind for comparing different ways a subclass can be defined and outcomes for what this means for the subtyping relationship and assignability.

We’ve discovered bugs in both pyright and mypy during the examination of this, mypy narrowing A[int] to B[Any] in the presence of isinstance(x, B) is something I’m fully confident in saying is just another one of those issues, the examples were chosen so that there is a solution, there is only 1 solution, and it doesn’t rely on Any at all.

mypy gets isinstance wrong in many cases

and also gets type variable constraints wrong in some cases

This is part of why I reached for pyright to use in the examples first, as pyright is actually passing the conformance test, and it’s a known issue that mypy has many deficiencies with features used to demonstrate here.

The only thing that’s required for theoretical soundness of this roundtripping to happen is that narrowing only happens when enough information is present to do so. This is even part of what was said in the PEP for Type is, that it should return True if and only if it’s that Type. The thing is, A function handing A[X] → A[Y] has no knowledge of B’s use of X, even when B is a subtype of A.

A TypeIs function cannot do this for narrowing generic classes across the generic without having knowledge of how that generic class is defined.

This is strongly related to why typing inference becomes undecidable in the presence of rank 2 and higher types, it’s all about narrowing on only the information you know you have and acknowledging that there are situations you may not have all of the relevant information, which leads me to:

Narrowing behavior being incorrect in mypy is a mypy issue, and I won’t be writing worse code to satisfy a misbehaving type checker. The subtyping relationship here is specified well enough that B[Any] being mypy’s result is just mypy failing to produce an accurate useful solution, and falling back to one that won’t error.

Liz · July 22, 2024, 7:56pm

Mypy doesn’t even accept the definition of B here, saying that each option for X is incompatible with X.

Why are we using mypy as a measuring stick here rather than discussing what is theoretically sound when mypy can’t even handle the definitions needed to have the discussion?

main.py:21: error: Incompatible types in assignment (expression has type "str", base class "A" defined the type as "X")  [assignment]
main.py:21: error: Incompatible types in assignment (expression has type "int", base class "A" defined the type as "X")  [assignment]
main.py:21: error: Incompatible types in assignment (expression has type "str | int", base class "A" defined the type as "X")  [assignment]