Problems with TypeIs

Liz · June 11, 2024, 6:30pm

Sorry, it’s also my fault for not being clearer that I was deferring to the existing post about negation rather than rehash it as well in my hesitation to rehash everything.

I showed it happens with protocols instead of sequence too.

Actually, the pep says that typecheckers should error for invalid TypeIs functions too

The pep links the pep483/484 definition of consistent, which is subtyping based. When I wrote that example, I made sure to start with a TypeIs function that was specified as safe use and create an unsafe example using it.

Is it not a spec contradiction for the spec to say that it should follow the theory, but then allow things we knew were unsound? What about a specification problem that there are functions specified as safe that aren’t?

Can you be sure there aren’t other problems that we haven’t discovered yet? We’re arguing about one case that’s clearly wrong, and ignoring that the presence of such a case means any reasoning used to justify the soundness of the feature is flawed, so other conclusions from it are also suspect.

mikeshardmind · July 18, 2024, 9:09pm

Circling back on this with information from the tangentially related thread, I’ve been working on figuring out when TypeIs is safe and when it isn’t so that developers can have better guidance for writing a safe TypeIs function, and possibly for parts of this that are detectable by type checkers to be implemented by type checkers, since it appears like it will continue to exist…

The primary problems for narrowing the positive case safely involve incomplete information. While at first this looked like a mutability problem, and then a variance problem (likely because of where this was first spotted) it is both of these, but more broadly just an incomplete information issue.

It’s not possible to narrow a generic type safely if you have a subtype of that generic type. You no longer have all of the relevant information and cannot know if you have checked all places the subtype uses a type variable
It is possible to narrow from a generic type to another generic type without modifying the knowledge of the subscripted type.
It is possible to narrow from a non-generic type to a generic type, but with the above, you might not be able to fully narrow to something more specific.

Ensuring exhaustive checking of all places a type variable is relevant before applying narrowing to that type variable is not always possible therefore, TypeIs functions like:

def sequence_narrower(s: Sequence) -> TypeIs[Sequence[int]]:
    return all(isinstance(int, x) for x in s)

cannot be safe, as there can be generic types that are a subtype of sequence that use how they are generic in a way that would not be visible from only the interface of Sequence.

Meanwhile, functions like

# inspect.isawaitable
def isawaitable(object) -> TypeIs[Awaitable]:  ...

are fine, this definitively checks everything needed to know that this is awaitable, and this does not apply any other narrowing to any possible related generic.

I haven’t worked out all of the possible problems for the negative narrowing case yet, but the problems here follow directly from types that cannot be safely negated in the framework specified by TypeIs. The primary problem with just banning these types from TypeIs is something infromally known as the gradual guarantee. That is, the idea that you can take any well-typed piece of code, substitute Any in any annotation, and not add new type errors. The problem with not banning these types however, is that Any (And other gradual types, and some other cases beyond this) do not have a meaningful and well defined negation in theory.

carljm · July 20, 2024, 1:46am

As @Liz touched on briefly over in Revisiting variance - #37 by Liz, I think at least some of the problems here (like many of the problems that came up in the intersection PEP work) boil down to incorrect over-simplification of intersection types. Because we don’t have a formal spec for intersections (and also because the concept can be confusing for users), current type checkers don’t necessarily have a well-fleshed out internal representation for intersection types, and tend to be eager to simplify them to non-intersections, even when it isn’t correct to do so. (There were more examples of this related to Any in my PyCon typing summit talk.)

If we have a generic class MySequence which is a subtype of Sequence, but also uses the typevar in contravariant position and is thus invariant, and you have a TypeIs function that narrows Sequence[object] to Sequence[int], applying that TypeIs function to an instance of MySequence[object] should result in the type MySequence[object] & Sequence[int]. The unsafety you might observe in such a case is a result of type checkers simplifying that intersection to MySequence[int], but this is an incorrect simplification. The correct behavior is to leave the intersection type unsimplified.

The theoretical reason it is incorrect is because MySequence is invariant in its type parameter, therefore there is no subtype relationship between MySequence[object] & Sequence[int], therefore the intersection can’t be simplified.

The intuitive reason that matters in this particular case is because MySequence may have additional runtime state related to the generic type, and a runtime narrowing function meant to apply to Sequence can’t possibly know about or check the actual type of that state at runtime, it can only check the state that’s part of the Sequence type.

The result of leaving the intersection type un-simplified would be that methods present on both Sequence and MySequence that return T would end up returning object & int which does simplify to int. Whereas methods present only on MySequence would still be typed as operating on object.

So if we want to make TypeIs safe in this respect, there’s no need to change the general rules around variance and subtyping in super-disruptive ways, there’s only a need for more correct handling of intersection simplification.

(Shared mutability still remains its own problem, but that applies to type narrowing more broadly.)

Liz · July 20, 2024, 2:03am

This is true for TypeIs, but not for higher kinded types more generally. It would be a good thing if we could get consensus on if higher kinded types are ever going to be fully supported or not, I’ve had some discussions with @mikeshardmind on this, and while I was against this at first, I think fully ruling them out would be more beneficial to the type system as we can stop needing to consider them, and can start using things that only work on rank 1 types.

The problem with this is the indirection possible that allows assigning MySequence to something that expects a Sequence then has the visible intersections be Sequence[object] & Sequence[int], when the actual type is invariant. There are major issues around writing a safe TypeIs function. You can’t write it assuming only the input type, but any possible subtype of of that input type with unknown attributes, properties, methods, and variance. I agree with:

Not even only from a place of variance, it could be an unsafe narrowing even entirely within covariance, as @ImogenBits showed Revisiting variance - #43 by ImogenBits (after modifications to use a tuple where a list was used)

I think some of these cases are detectable and type checkers should warn about those, but not all of them will be, and some need to be documented cases only. This means that this feature cannot be considered safe within the current type system, as it is possible to construct a TypeIs function that would be impossible to determine if the function and use of the function was safe or not statically.

carljm · July 20, 2024, 2:07am

I think that it is possible to narrow a generic type safely if you have a subtype of that generic type, but only if the resulting intersection type is not simplified incorrectly. No example shown so far (including the one you linked) demonstrates otherwise.

Liz · July 20, 2024, 2:11am

It’s not possible to have a correct simplification, @mikeshardmind 's examples in the other thread had indirection steps specifically to show this part

def bad(s: Sequence) -> Sequence[int]:
    return all(isinstance(x, int) for s in x)

def indirection(s: Sequence):
    if bad(s):
        # Sequence & Sequence[int]


x : MySequence[int | str] = ...

indirection(x)

carljm · July 20, 2024, 2:11am

Also, I think that HKT’s should be left out of this discussion entirely, unless or until someone is motivated to start a new thread specifically outlining (with examples) the issues and possible solutions (or pointing to a place where that’s already been done clearly.) Otherwise it’s just not productive or useful.

carljm · July 20, 2024, 2:13am

That example uses (and relies on) Any (as the implicit type parameter of generics used unparameterized), so it’s not at all useful in illuminating what is or isn’t sound in the type system; unsafety is to be expected with Any.

mikeshardmind · July 20, 2024, 2:15am

I’ll be writing a PR with examples of why certain patterns are unsafe to the typing docs, and why they can’t be made safe under the current type system’s rules, and will use examples that don’t rely on Any.

Narrowing an ABC or protocol on it’s subscripted type will always be unsafe.

the HKT discussion needs to happen at some point, variance in subtyping is still an issue if we want HKTs, but it shouldn’t be part of this discussion anymore, we have an easier framework of missing relevant information that is broader and less fraught with future unknowns.

Liz · July 20, 2024, 2:19am

If you change it to Sequence[object] the example still functions you can lose information at function boundaries that’s important.

carljm · July 20, 2024, 2:23am

Where is the unsafety in that example?

Before I reply again, I’ll wait for @mikeshardmind 's PR, or for a complete example, without Any, that shows unsafe narrowing from TypeIs, given correct handling of intersections, and without requiring concurrent shared mutability. If there is such an example, I’m very interested in it. Otherwise I’m not sure what we’re actually discussing.

mikeshardmind · July 20, 2024, 7:43am

I think you’re pre-supposing something here that I would disagree with, and that all current type checkers in the conformance suite do behaviorally as well, though this isn’t specified other than TypeIs saying it should approximate isinstance for practical effects.

Given the types

from typing import Protocol, Iterator, Generic
from typing_extensions import TypeIs, TypeVar

X = TypeVar("X", int, float, int | float, default=int | float, covariant=True)

class A(Generic[X], Protocol):
    _objs: tuple[X, ...]
    def __init__(self, *objs: X):
        self._objs: tuple[X, ...] = objs
    def __iter__(self) -> Iterator[X]:
        yield from self._objs

class B(A, Generic[X]):
    def __init__(self, *objs: X, dtype: type[X] = int | float):
        super().__init__(*objs)
        self._dtype: type[X] = dtype
    @property
    def T(self) -> type[X]:
        return self._dtype

Then the intersection A[int] & B should simplify to be B[int]

As T is covariant in A and B, B is a subtype of A, and int is a subtype of int | float.

(This doesn’t currently type check in any type checker for other reasons involving union assignability to type[X], one could write a protocol here instead to remedy this, but union assignability to type is currently being addressed in another discussion)

The problem with this is of course that if all of that holds, then we can chain together various evidence-based checks paired with a single indirection that loses the fact that SomeB.T exists because we are treating it as A for permissive use, and then get the wrong result

def is_Aint(s: A) -> TypeIs[A[int]]:
    if not s:  # while this shouldn't be needed, I'd rather not argue if that counts
        # as pathological when it's irrelevant 
        raise ValueError("Refusing to narrow empty data container")
    return all(isinstance(x, int) for x in s)

def div(n: A[X], d: int, dtype: type[X] | None = None) -> B[X]:
    if isinstance(n, B):
        if n.T in (float, float | int):
            return B(*(i / d for i in n), dtype=n.T)
        elif n.T is int:
            return B(*(i // d for i in n), dtype=n.T)
        else:
           raise TypeError("Unsupported type for typed container {n.T!r}")
    else:
        return B(*(i / d for i in s))

def something_that_only_handles_int(x: B[int]):
    if x.T is not int:
        raise TypeError
    # impl doesn't matter for this

def indirection(x: A):  # this isn't implicitly Any, see default above.
    if is_Aint(x):
        s = div(x, 2)
        something_that_only_handles_int(s)  # boom

def example():
    b = B(1, 2, 3)
    indirection(b)

Before this is written off as a random hypothetical, this example is modeled off of numpy and how someone would handle needing types for numpy functions that allow array-like things as well, not just numpy types, without having to resort to a mypy plugin to fill in gaps.

__iter__ is used in place of full appropriate datamodel methods, substitute in all additional necessary dunders yourself, the problems still exist.

The chain of evidence-based checks can result in the wrong outcomes here because of the original narrowing on A without determining that we did not have a subtype of A that cares about more than what A does for type information.

It may or may not be possible to accidentally run into this issue, I’m inclined to think it is given how many people are still insisting that it isn’t one. I think it’s very likely to be a repeated real-world issue where people have unexpected outcomes given the time for this to start being used more, and for it to start happening in places that aren’t isolated to one codebase where someone can go “that’s not safe, we do this”, and start having types from different code bases involved in it.

Maybe it’s an acceptable issue given the tradeoffs and pragmatism various people value, but the sharp edges need documenting at a minimum.

mdrissi · July 20, 2024, 8:53am

I appreciate the last example as one I can actually follow.

I tweaked a bit/tried to simplify partly to better understand steps involved and also to avoid constrained typevars as I view them as weird feature and use bound typevars. Also simplified to no default type vars and included explicit type arguments everywhere.

from typing import TypeIs, reveal_type

class A[T: str | bytes]:
    def __init__(self, x: T):
        self._x = x

    @property
    def x(self) -> T:
        return self._x


class B[T: str | bytes](A[T]):
    def __init__(self, x: T, y: T):
        super().__init__(x)
        self._y = y

    @property
    def y(self) -> T:
        return self._y

def narrow_a_str(a: A[str | bytes]) -> TypeIs[A[str]]:
    return isinstance(a.x, str)

def boom(b: B[str]):
    assert isinstance(b.y, str)

def bad(b: B[str | bytes]):
    if narrow_a_str(b):
        reveal_type(b) # Expectation: B[str | bytes] & A[str] -> B[str]
        boom(b)

problem_obj: B[str | bytes] = B("x", b"y")
bad(problem_obj)

narrow_a_str is very simple TypeIs function and looks pretty normal. The main issue is that subclass may introduce additional attributes that depend on same type variable so reasonable TypeIs for A that checks attribute x, is no longer reasonable and safe for B.

edit: Also the bound is not important either. If you find and replace str | bytes with object everywhere the same issue exists.

Edit 2: The more I think on this example my view is moving that for B[str | bytes] & A[str] step it should be irreducible and not B[str]. And that safety of typeis for examples likes this mainly depends on having actual intersection spec. In particular I think a type checker should report boom(b) is a type checker error and say that B[str | bytes] & A[str] is incompatible with B[str].

Very similar question is how about list[str | int] & Sequence[str]? Would people expect that to reduce or also be irreducible? The A/B example leans in direction of even in covariant case you can never collapse intersection of two generic classes even if type arguments differ at all. What if I eliminated field y in B like if B was just a NewType of A?

Edit 3: Re-reading Carl’s earlier post on irreducible is very close just that,

“ The theoretical reason it is incorrect is because MySequence is invariant in its type parameter,”

this is little off. Invariance is unnecessary. Even if both Sequence/MySequence are each covariant, you still have issues.

mikeshardmind · July 20, 2024, 11:24am

B[str | bytes] & A[str] should reduce because B[str] is a valid subtype of B[str | bytes]

This is tricky, because in the context of your simplified example, a type checker could see the bigger picture that it resulted from an unsafe narrowing, but that intersection itself is just B[str]

This is part of why my examples have had additional indirection steps. In real code there would be indirections, or you’d just check what you know and only use what you know. The indirections lead to assignments that avoid the type checker ever seeing that situation directly, but in total still result in the same narrowing while precluding the ability for a type checker from being able to do something smarter, because in Python, only function boundaries are ever guaranteed to be visible to type checkers (stubs, library code vs API typings, etc)

carljm · July 20, 2024, 1:59pm

Thanks for the example, that’s very helpful!

I don’t think so. @mdrissi is right (and also right that the wording in my previous post was a little misleading; invariance specifically isn’t necessary in that example either.) B here is B[float | int] (because of the typevar default). B is a subtype of A, but float | int is not a subtype of int, it’s a super-type. The typevar is covariant, not contravariant, so this means that B[float | int] is not a subtype of A[int], and therefore the intersection should not simplify, it should remain A[int] & B[float | int].

That doesn’t follow. X <: Y and X <: Z does not imply Y <: Z. It is true that B[str] is a subtype of B[str | bytes] (assuming covariance), and also true that B[str] is a subtype of A[str], but that does not make it true that B[str | bytes] is a subtype of A[str] (it is not.)

So this example confirms for me that theoretically-correct handling of intersections is what’s necessary to resolve this particular class of unsafety.

That doesn’t mean everyone (in particular type checker authors) will necessarily be willing to commit to theoretically-correct handling of intersections. This would need to be hashed out as part of an intersections PEP (part of the difficulty with such a PEP is that it implies a specification of type narrowing, which has historically been left unspecified somewhat intentionally.)

Personally I think correct handling of intersections is not something that will cause lots of false positives (in fact in intersections with Any it eliminates what I would consider false positives in type checker behavior today.) So I would personally find it worth doing to improve correctness in subtle narrowing cases. But it can cause more complexity in type checker implementations, and more complexity in user-facing type error messages (more unreduced intersections will appear), and these are also reasons that some might not prefer it.

mikeshardmind · July 20, 2024, 2:16pm

I think we’re missing something here for that to be the case, this isn’t an intersection between those at that point.

We go from

B[int | float] (construction)
to A[int | float] (indirection accepts A, this function can’t see it used to be B)
to A[int] (suspect TypeIs A[int | float] → A[int] , can’t account for B.T that A doesn’t know about)

At this point, the intersection is A[int | float] & A[int]

so that’s a miss, lets see the next place it happens.

inside the body of div (no knowledge of the typevar being passed in, only the constraints and default), it’s A[X], while X has a default of int | float, div doesn’t know that it’s definitively int | float, it’s still valid for div to treat it as any valid option given evidence, which is what happens.

div confirms it’s B with isisntance, so we go from A[X] → B[X], this is still valid, and there’s no info from div’s POV on the type of X

Then, div constructs a new B with the same T as what was passed in for n’s T

This turns A[X] where A[X] is B[X] to a new B[X]

From div’s PoV this is fine, and the caller can’t see inside that function

Finally, back at indirection, we have s is B[int], but with s.T = int | float

At no point here is an intersection happening that is A[int] with B[int | str], it’s directed in such a way that the type checker will never have all the pieces to pick up on that, and this is errily close to how this would happen fixing numpy’s types to not need a plugin

carljm · July 20, 2024, 2:33pm

This is an incorrect narrowing. It is an incorrect intersection reduction to reduce this to B[X].

The isinstance check does not confirm anything about the type parameter of B (a TypeIs function might if it does the right runtime checks, but isinstance obviously cannot.) Thus an isinstance check against a generic type has to be considered a narrowing to B[object] or B[Any], or perhaps we can constrain it to B[int | float] here due to the typevar constraints. (The typevar default doesn’t apply here: this is not an annotation, it’s an expression of what isinstance can verify. The int | float in this case would come from unioning all the constraints on X, expressing that the isinstance gives us no information about the type parameter of B.) So we have A[X] & B[int | float], which cannot reduce, because we don’t know that int | float is a subtype of X here. (Same would be true if we used object or Any instead of int | float.)

This is of course something that I think all the existing type checkers get wrong, so it’s reasonable to assume it, and I agree that the existing type checkers are not safe in a case like this.

But the question here is “would correct handling of intersection reduction fix this,” and the answer remains yes. In essence, you are worried about “loss of information” from upcasts, and yes, that is inevitably a part of any upcast. But correct intersection reduction in narrowing always takes into account what it doesn’t know, and thus prevents that from causing unsafety.

mikeshardmind · July 20, 2024, 2:38pm

This isn’t incorrect at all. div only knows that it’s A[X], and X here is a typevariable that’s identical to the input type and shared between A and B div can be written in another way such that X is actually never explicitly narrowed by div, and never needs to be, it’s the external observation with incorrect knowledge about the input type that causes a problem. The only thing that makes it look wrong is the incorrect narrowing that happened significantly before this, where a TypeIs function narrowed without being sure that it had all the relevant information. View every function here in isolation like the type checker would. The only one that gets anything wrong is the TypeIs.

A correct TypeIs function prevents all of this.

def correct(s: A) -> TypeIs[A[int]]:
    if not type(s) is A:
        raise TypeError("Unsafe to narrow generic which might be using the typevariable in more ways than this function knows about")
    if not s:
        raise ValueError("Refusing to narrow empty data container")
    return all(isinstance(x, int) for x in s)

carljm · July 20, 2024, 2:47pm

This is a confusion based on assuming that two unrelated type variables with the same constraints must have the same value, which is not true. Narrowing A[X] with isinstance(..., B) does not correctly simplify to B[X], because we don’t know the exact value of X in this case, only its constraints, and nothing in the isinstance check verifies at runtime that the type parameter of the B instance we have at runtime matches the actual value of X in this function.

I think I’ve said what I have to say as clearly as I can, so I’m going to leave it here. The TypeIs function is fine as-is, if we reduce intersections in narrowing correctly.

mikeshardmind · July 20, 2024, 2:49pm

rewriting div to be a function that doesn’t do division, and assuming it’s called differently to support the new signature

def div(input: A[X]) -> B[X]:
    if isinstance(input, B):
        return input  # This is quite literally A[input] & B[unknown]
    vals = input._objs
    types = {type(v) for (v) in vals)
    if types == {int, float}:
        dtype = int | float
    elif types == {int,}:
        dtype = int
    else:
        dtype = float
    return B(*vals, dtype=dtype)

Same applies.

int isn’t a subtype of float
float isn’t a subtype of int
and int | float isn’t a subtype of int or float, so those are the only valid options for each of the first two.
The last one can’t be narrowed further to int or float, because as you said, we don’t have the knowledge, but this isn’t even the failing case here.

We have to assume that the program knowledge we have is valid in the absence of Any, the only unsound thing that has happened is the TypeIs narrowing something it does not have the knowledge it needs to do so.