Problems with TypeIs

Liz · July 20, 2024, 2:58pm

For the benefit of others since I guess this is subtle, A[int] & B[?] where we have a program value and ? is constrained to int, float and int | float already, we only have the solution int for ? as any other solution would be an empty intersection, but we have a value, so it can’t be empty.

carljm · July 20, 2024, 3:26pm

Thanks, this is a clear formulation, and I think it gets to the heart of where we disagree.

I don’t agree that A[int] & B[int | float] must by definition be an empty intersection. It is possible to have a single runtime object that, when the API surface of the A type is used, behaves entirely as an A[int]. This is all that’s needed to say that that object is of type A[int]. And it’s precisely what a TypeIs function on A is capable of verifying. That same object, when B API surface is used, can behave as a B[float | int]. So A[int] & B[float | int] is a non-reducible intersection, which can’t be ruled out as definitively empty, and which leads to correct behavior if left unreduced.

mikeshardmind · July 20, 2024, 3:27pm

That ~~would~~ could (I dont know of any that treat it this way) work in a compiled language where this can be retained across function boundaries, but this can’t be retained across function boundaries in python, the lossyness here makes that view untenable

mikeshardmind · July 20, 2024, 3:37pm

I can’t think of a way to make this work in python the way you’re describing it without going against this:

It would create far more false positives to not reduce isinstance in a case like this than just flagging potentially suspect TypeIs functions, or even just documenting the current reality here.

I’m willing to accept that there’s probably a way to do this, but I think it’s actually a less useful formulation that makes assumptions that are going to be harder to prove, and result in worse inference.

I think for now documenting this as an unsafe construct and when it would be unsafe is very reasonable, it doesn’t necessarily need to be a type checker error, but users should be able to discover why something may not behave how they thought it would.

Liz · July 20, 2024, 4:24pm

I don’t think this is correct. B is a subtype of A. If the intent was that the new parts of B might have a different type, then instead of defining B as it was:

class B(A, Generic[X]):

it should need to be defined as

class B(A[Y], Generic[X, Y]):

We can even shore up this issue by changing the original definition of B to

class B(A[X], Generic[X]):

and then there’s no confusion, it refers to the same type variable. All of the rest of the example using this definition would then definitively have TypeIs still be the only place something went wrong and you’d no longer be able to seperate the two interfaces even with a construction that no type checker today supports.

mikeshardmind · July 20, 2024, 4:47pm

This brings the most succinct form that prevents the type checker from seeing it all in the same scope, and every scope being individually fine other than the TypeIs function to:

from typing_extensions import TypeVar, TypeIs
from typing import Generic

X = TypeVar("X", str, int, str | int, covariant=True, default=str | int)

class A(Generic[X]):
    def __init__(self, i: X, /):
        self._i: X = i

    @property
    def i(self) -> X:
        return self._i


class B(A[X], Generic[X]):
    def __init__(self, i: X, j: X, /):
        super().__init__(i)
        self._j: X = j

    @property
    def j(self) -> X:
        return self._j

def boom(x: A[int]) -> int:
    if isinstance(x, B):
        print(x, x.i, x.j)
        return x.i + x.j
    return x.i

def bad(x: A) -> TypeIs[A[int]]:
    return isinstance(x.i, int)

def indirection(x: A):
    if bad(x):
        boom(x)

def example():
    b: B[int | str] = B(1, "this")
    indirection(b)

Adding the detail that It’s the same type var in both parent and subclass closes the possibility that it’s just bad intersection narrowing that no type checker handles right now.

While this example is slightly more contrived, the one mirroring numpy’s ndarray structure is not, this one is just shorter to be easier to follow and reason about, the numpy-like one remains above to help show why this can come up in real code at some point, rather than be a hypothetical.

I’ll be trying to reduce my other examples for other issues in the PR to be similarly short, I’ll refrain from further followups on this till the PR is ready unless there’s still further detractions unaccounted for on this one.

ImogenBits · July 20, 2024, 5:20pm

Using a set theoretic framework here, I think that A[int] & B[int | str] (using the classes from the last example) should certainly not be narrowed to B[str]. The type variables “being the same” is irrelevant here (and arguably not technically true, but that is besides the point). The type A[int] is the set of values that are instances of A where the property i is some int. The type B[int | str] is the set of values that are instances of B where both properties i and j are either an int or a float. The intersection of these two types is the intersection of these sets. That is, the set of valuesthat are instances of A and of B where property i is an int and property j is an int or a float. Since A is a superclas of B this can be simplifies to values that are instances of B with the same restriction on the properties, but the superclass relation does not let us simplify these property restrictions.

Using just these classes and currently accepted Python syntax, there is no way to spell that type. But it is not B[int] since property j is either an int or a float. If we want to spell that type in Python we’d have to do something like this:

class ThisIntersection(Protocol):
    @property
    def i(self) -> int: ...

    @property
    def j(self) -> int | str: ...

But even then, we’re losing some amount of information since we do know that objects also are nominal instances of B and you cannot have a protocol be the subclass of a non-protocol.

Using other theoretical frameworks we would come to the same conclusion. For example, if we think of the intersection as the most general type that is a subtype of both A[int] and B[int | str], then we again arrive at the ThisIntersection protocol (with, again, the additional nominal instance information).

It’s prima facie reasonable to think that narrowing to a subtype would preserve type variable instantiations, but this is evidently not the case. Both in practical examples like in this thread, and from a theoretic point of view. It’s also unfortunate that this means that many intersection types cannot be properly spelled succinctly. I think the only real solution would be for type checkers to display the type as A[int] & B[int | str], internally do something similar to the construction of that intersection protocol (which, afaik, is already happening), and then type check based on that. I think we should add documentation explaining this behaviour since it clearly is nontrivial and unexpected and thus is going to lead to a lot of confusion and false bug reports.

mikeshardmind · July 20, 2024, 5:35pm

Those shouldn’t, but that’s not what’s happening in this example anymore.

What’s happening here is A[int] & B[?] where ? is unknown, but is constrained to int, str, int | str, and known to be non-empty. Additionally, in this example, A[int] & B[int | str] should actually be an uninhabited type.

From this, there’s only the solution ? = int

ImogenBits · July 20, 2024, 6:11pm

I apologize if I misunderstood the current point of contention in this thread. But I would also appreciate it if you could lend me some charitability and not immedietly dismiss the entire content of what I’m saying. I think it is fairly clear that regardless of the particular example we’re choosing in our replies, the underlying disagreement is the proper behaviour of intersections.

This difference leads you to think that A[int] & B[int | str] is an uninhabited type, while I think that it is inhabited (for example, by the object that you construct in your example). My reply above details why I think that and what type I think it actually is.

My explanation also lets us infer the correct reduction of A[int] & B[?]. It’s objects that are instances of B with a property i that is an int and a property j that has some unknown type. Due to the constraints of the type variable we know that this unknown type must be a subtype of int | str. Just like in my explanation above, we cannot spell this type easily, but it can be approximated as B[int | float] or a similar construction to the protocol in my reply.

mikeshardmind · July 20, 2024, 6:25pm

I’m sorry if the tone isn’t coming over well over text, it’s not meant to be dismissive, only an explanation that’s getting rather curt because I’m trying to avoid repeating things that have already been said, and people thinking it’s going in circles. It’s not my intent to be dismissive, but to find the clearest possible examples that show that even under ideal circumstances, the obvious answer of “it’s unsafe for a narrowing function to narrow something it doesn’t know exists” is the actual answer. (A) -> TypeIs[A[int]] is a function without knowledge of B. It should not be surprising that it cannot safely narrow B.

The latest example was explicitly modified such that

class B(A[X], Generic[X]): ...

This means all uses of B[X] are a subtype of A[X]

with this in mind, it’s not possible for A[int] & B[int | str] here to have any values at runtime, as B[int | str] is a subtype of A[int | str], and A[int | str] is not a subtype of A[int]

If you’d like, I could rewrite this further to make this more clear, and instead of relying on the subclass definitions, redefine them, but the most recent back and forth leading to that example was explicitly covering excluding that possibility.

ImogenBits · July 20, 2024, 7:14pm

I do not see how this follows at all. What does A[int | str] not being a subtype of A[int] have to do with this? I am not saying that the intersection should be A[int | str] but rather that it is a subtype of B[int | str] that is currently not expressible in Python.

This is entirely correct, but I think you actually mean something different here. What you’re saying is that e.g. B[int], B[float], B[list[str]], B[MyThingy] are each subtypes of A[int], A[float], A[list[str]], A[MyThingy] respectively. What I think you mean is that A[X] is the only supertype of B[X] that is of the form A[SomeTypeExpr]. This is false.

If I understand you correctly, you seem to think that if an object is a member of B[T] and of A[S] then T and S must be the same type since B is generic over X and defined to inherit from A[X]. But this is not the case. In particular, we can look at your example B(1, "this"). This object clearly is a member of the type B[int | float]. We can also use covariance to see that it is a member of all of int | float’s supertypes, i.e. it also is a member of B[int | float | list[str] or B[object]. Alternatively, we can use inheritance to see that it also is a member of A[int | float]. Combining these we then further get membership of e.g. A[int | float | list[str]] and A[object].

Our disagreement comes here: You assert that these are the only two inferences we can make to determine types that this object is a member of. I propose that there are additional types. One such type is A[int]. It is true that A[int | float] is not a subtype of A[int], but what that is saying is that not every object that is a member of A[int | float] also is a member of A[int] (A(1.23), for example). The important part is that we aren’t talking about whether A[int | float] is a subtype of A[int], but whether the object B(1, "this") is a member of A[int] and type membership does not work in a way that that would be a requirement. The object in question is a member of the class A and it also has a property which is an int. This is precisely what is needed for it to be a member of A[int]. Further, we can follow through this argument for any object of the form B(some_int, literally_anything). All of these objects witness that A[int] & B[int | SomeOtherType] are in fact inhabited.

I think these classes are perfectly fine examples, you don’t need to rewrite them. But what I would very much like is an actual explanation of why you think any of the reasoning in any one of my replies is wrong and not just an assertion that your understanding is correct. If you feel like you need to repeat something to do so, please do that so I can understand.

mikeshardmind · July 20, 2024, 7:40pm

The rules for subtyping and variance break down entirely if you don’t follow the rules here with regard to nominal types. There is an argument to be made that if A was only a Protocol that could not be instantiated at runtime, that this might just be a missing piece of expression, however, the definition of both A and B here are intentionally linked. These are concrete classes, and not merely descriptions of structural types that have contracts that are disparate and unrelated to each other.

Structural subtyping could theoretically follow other rules than it does.

If you want it to be possible for there to be something like an A[int] & B[int | str] it requires different definitions for B, and actually looks like A[int] & B[int, int | str], this was mentioned above.

This isn’t something where it’s ambiguous in this example, as defined in the most recent example B[X] must be a subtype of A[X], and we’ve gone about ensuring it’s also the same X by using the same X in the same scoping when referring to both.

Elizabeth King:

class B(A[X], Generic[X]):
and then there’s no confusion, it refers to the same type variable. All of the rest of the example using this definition would then definitively have TypeIs still be the only place something went wrong and you’d no longer be able to seperate the two interfaces even with a construction that no type checker today supports.

From this, yes. We have a few relevant pieces of information when it comes to constraint solving.

We’re looking at this function specifically here:

def boom(x: A[int]) -> int:
    if isinstance(x, B):
        print(x, x.i, x.j)
        return x.i + x.j
    return x.i

We know it’s A[int] (Well, the caller got this wrong, but the function can’t know that in it’s isolated viewpoint)
We know it’s B[?]
We know ? is covariant, as is the int in A[int]
B is a subtype of A
B[?] is a subtype of A[int]
? is constrained to int, str, and int | str
There exists a type here, as evidence-based runtime checks succeeded.

from this, yes, B[int] is the only option, as

? must be int
A[int] & B[int] is just B[int] due to the relationships between A and B

Us reaching an incorrect conclusion here is not this part of the code being wrong, it’s wrong to have a B(1, "this") be narrowed to A[int] and not A[int | str]`

From incorrect starting information, you can blow up any example, these examples exist to show that every other part is sound, therefore if we’re getting the wrong result here, clearly it’s unsound for a TypeIs function to narrow something it does not have enough information to narrow.

This is also backed by the language of the PEP:

For a function returning TypeIs[T] to be safe, it must return True if and only if the argument is compatible with type T, and False otherwise. If this condition is not met, the type checker may infer incorrect types.

The problem here is that we start with B[int | str]. it is incorrect to say this is compatible with A[int], but this TypeIs function does not check that we are only working with things we know.

mdrissi · July 20, 2024, 7:52pm

The other issue here is lack of precise spec on sub typing rules. I generally follow set based semantics for typing and sub typing rules. A is a subtype of B is every value of A is a value of B. Here I am excluding Any/gradual types. Under this definition B[int | float] & A[int] do not match same set of values as B[int] so that reduction is not possible. The notion of two type variables X being same is not defined constraint in spec for sub typing and would contradict how set based sub typing works. So I stick with the answer here is defining sub typing rules/intersection and reaching eventual conclusion that B[int | float] & A[int] can not reduce and is the final simplified form. That closes the TypeIs in safety as well.

The very fact that calling function wanting B[int] later would crash is also strong reason to see that adding constraint/intersecting of A[int] to type B[int | float] can not just reduce B[int] and underlying set of values of B[int] vs intersection are different sets.

Liz · July 20, 2024, 8:05pm

https://typing.readthedocs.io/en/latest/reference/generics.html#defining-subclasses-of-generic-classes

If

class B(A[X], Generic[X])
    attr: X

weren’t to bind the same X here, it would violate several well detailed parts of the typing specification, including the part where it says that this Generic is optional to even include here. There’s no way for that to be the case if the X is not the same X in all places.

Generic can be omitted from bases if there are other base classes that include type variables, such as Mapping[KT, VT] in the above example.

ImogenBits · July 20, 2024, 8:05pm

Let us look at the relevant parts of the typing spec:

A fully static type B is a subtype of another fully static type A if and only if the set of values represented by B is a subset of the set of values represented by A .

For a type such as str (or any other class), which describes the set of values whose __class__ is str or a direct or indirect subclass of it, subtyping corresponds directly to subclassing. A subclass MyStr of str is a subtype of str , because MyStr represents a subset of the values represented by str .

Although the means of specifying the set of values represented by the types differs, the fundamental concepts are the same for both nominal and structural types: a type represents a set of possible values and a subtype represents a subset of those values.

This firmly supports a set theoretic reading of subtyping. It also explicitly points out that this is true for both nominal and structural types. There is no further explanation of subtyping on the page about generics, so we must assume that they do not change this behaviour.

Given this, please explain why you think that B(1, "something") is not a member of the type A[int]. I can even give you a list of the required steps of reasoning and you can just tell us the number that you think is faulty:

The type A[int] represents the set of values whose __class__ is A or a direct or indirect subclass of it, and whose property i is an int.
The object B(1, "something") is an instance of class B.
Thus, its __class__ is B.
B is a direct subclass of A.
The property i of this object is an int.
Thus this object is an instance of type A[int].

You seem to think that because B[T] is a subtype of A[T], no member of B[T] can be a member of A[S] for some S that is a supertype of T. This is not the case. I fully understand that your and Elizabeth’s intention is to create a class where that would be the case, but that is not what you have done. If you disagree with this, please explain what part of the above reasoning is faulty and what part of the typing spec or generally accepted set theoretic typing ideas leads you to that conclusion. I don’t think that just repeating the “same type variable” definition is going to lead us to anything since it does not actually explain why you think that the object in question is not a member of A[int] & B[int | float], unless you make some assumptions that you are not explaining or arguing for.

Liz · July 20, 2024, 8:08pm

Because we’ve defined B that way. There’s nothing else that needs to be said here, this is the sub-typing relationship, as part of B’s definition, and if we wanted a different one, there’s other things we could write. Those other things aren’t what is written.

mdrissi · July 20, 2024, 8:09pm

It’s just not. That’s part of aspect of lack of formal spec. The main formal theories I am familiar with is set based typing where each type including generics correspond to set of values of structural form based on defining exact induction rules. Python uses latter so your idea it is same X is impossible there as set of values disagree. The induction rule approach requires more formalization then exists so we lack any rules to derive from today.

ImogenBits · July 20, 2024, 8:21pm

I’ve tried to explain my reasoning in great detail and asked you (or I guess Michael, but seeing as this is a public forum it’s really more of an open invitation) to explain specifically what you disagree with and/or what your reasoning is. Do you really have nothing more to go off than the absolute conviction that it must be axiomatically obvious that the definition of B must mean what you think it does? Surely the points I’ve laid out very clearly and with direct support of both the typing spec and a general set theoretic typing framework must give you some pause?

I’m sorry to be so blunt here, but yours and Michael’s understanding of this just is wrong. I’ve tried to explain this several times now in very cordial terms and every time you just assume that I must be completely wrong and not even worth actually engaging with. Frankly, this is incredibly frustrating. I enjoy working with Python and want to make its type system better. But that isn’t going to happen if discussions on how to do that just boil down to some people ignoring what is being presented to them and continually reasserting that they must be obviously correct.

Liz · July 20, 2024, 8:28pm

What do you think each of these statements means?

class Subclass(Baseclass):  ...
class Subclass(Baseclass[int]):  ...
class Subclass(Baseclass[T], Generic[T]): ...
class Subclass(Baseclass[int], Generic[T]): ...
class Subclass(Baseclass[T], Generic[T, V], ): ...

To me, each of these are very clearly distinct different definitions with very clear semantics and defined relationships, and no, I won’t elaborate on each one. I implore you to try them for yourself in a playground using list and collections.abc.Sequence and whatever else.

I’m asserting this to be true, because all evidence that exists to how code is written says that it is.
The way type checkers all treat existing code that looks like any of thse prior to TypeIs matches my understanding of these, including the typeshed’s use.

If you want to make the claim that all type checkers are wrong to do this, then I think that needs to be a different thread, but right now we’re looking at the problems with TypeIs with the system we currently have so that those issues can be documented usefully for users.

mikeshardmind · July 20, 2024, 9:02pm

Code sample in pyright playground

from typing_extensions import TypeVar
from typing import Generic, Container


X = TypeVar("X", str, int, str | int, covariant=True, default=str | int)
Y = TypeVar("Y")


class A(Container[X]):
    def __init__(self, *items: X, other: X):
        self._items: Container[X] = items
        self._other: X = other

    def __contains__(self, x: object, /) -> bool:
        return self._items.__contains__(x)

class B(Container[X], Generic[Y, X]):
    def __init__(self, *items: X, other: Y):
        self._items: Container[X] = items
        self._other: Y = other

    def __contains__(self, x: object, /) -> bool:
        return self._items.__contains__(x)


def takes_container(x: Container[int]):
    ...


def example():
    a: A[int | str]  = A(1, other="this")
    a2: Container[int] = a
    b: B[str, int] = B(1, other="this")
    b2: Container[int] = b
    takes_container(a)
    takes_container(b)

This matches both my intuition and what type checkers do, and what is specified about the behavior of type variables.

to be clear, I agree that class B(A[X]) means all B[X] are A[X], and same with class B(A[X], Generic[X]), for the same reason that class B(A) means that all B are A