Narrowing generic type bounds with isinstance

oscarbenjamin · December 26, 2024, 4:29pm

I have a lot of generic code that operates over many different “element” types. Some generic functions need to dispatch to different non-generic functions for each possible element type. The way that this is kept track of at runtime is by having a “handler” type that is associated to each element type. Each function takes the handler type as an argument and can either use the generic methods of the handler or can consult the handler to get information that narrows the element type.

I thought that I could handle this by using isinstance with the handler object so that a typechecker could narrow the element type via an upper bound declared in the handler type. Specifically this is like:

from __future__ import annotations

from typing import Self, Protocol

class ElementA(Protocol):
    def a_method(self) -> Self:
        return self

class ElementB(ElementA, Protocol):
    def b_method(self) -> Self:
        return self

class HandlerA[T: ElementA]:
    pass

class HandlerB[T: ElementB](HandlerA[T]):
    pass

def generic_func[T: ElementA](x: T, handler: HandlerA[T]) -> T:
    if isinstance(handler, HandlerB):
        reveal_type(handler) # HandlerB[Any/Unknown]
        reveal_type(x) # T
        return x.b_method() # type checkers complain here
    else:
        return x.a_method()

I realise that in general isinstance(obj, list[T]) etc doesn’t work but here I am using a concrete isinstance check for HandlerB which is fine. The checkers narrow the type of handler from HandlerA[T] to HandlerB[Any] but this is not what I wanted.

I use the notation Foo[T: B] to mean the generic type Foo[T] but with T bounded by B and what would be better is:

At least this could be HandlerB[T: ElementA] because the bound on T in HandlerB ensures that every realisation of HandlerB can match the bound on T in generic_func.
Better would be if the checker could narrow this to HandlerB[T: ElementB] which should also be valid since ElementB is a subtype of ElementA.
Ideally having narrowed handler from HandlerA[T: ElementA] to HandlerB[T: ElementB] the checker would also realise that T is narrowed from T: ElementA to T: ElementB and then narrow the type of x to ElementB.

This last point is the thing that I want. The way that this highly generic code is made type safe at runtime is by passing around the handler object so that quite literally records what the element type is. Functions may have more complex signatures like:

def f(x: list[T], y: list[T], h: Handler[T]) -> list[T]: ...

We don’t want to check isinstance on the elements in the lists x and y and that wouldn’t even work if the lists were empty and the data structures can be much more complicated than this. Also it is does not make sense to dispatch by calling isinstance with the element objects because the relationship between element types and handlers can be one to many: there can be different handler types that use the same element types.

The handler makes the types well defined at runtime so ideally it would also be what makes them well defined in a static type checker. It doesn’t look like either mypy or pyright allows narrowing of the bounds on the type parameter like this though.

Is there some better way of handling this?

Is this something that checkers could reasonably be expected to handle?

mikeshardmind · December 26, 2024, 7:09pm

Your specific example here should be sound, but current type checkers don’t keep a graph of all constraints and update the entire graph to update bounds known in each code path. This would have much better behavior for users overall if they did, but requiring this now would require type checkers that exist to entirely redesign.

It’s generally unsound to narrow arbitrary generics in python’s type system currently, and fixing this would require many changes to how type checkers handle variance and some additional special casing around known things like implicit definitions from object that are allowed to be changed, and further special casing or breaking of collections.abc

While your specific example isn’t subject to the issues with that, the fact that type checkers do not keep the full graph of code flow updated means it will be difficult to word any update to specification to try and narrowly cover this without introducing any new unsoundness.

For now, you can work around this with either overloads, or use of *args defined as a union of appropriate 2-tuples, but I acknowledge that both of these are not ideal.

oscarbenjamin · December 28, 2024, 6:54pm

I’m interested in what specifically prevents the first part of this i.e. why the type of handler itself is refined to HandlerB[Any] rather than HandlerB[T]:

from typing import Self, Protocol

class ElementA(Protocol):
    def a_method(self) -> Self:
        return self

class ElementB(ElementA, Protocol):
    def b_method(self) -> Self:
        return self

class HandlerA[T: ElementA]: pass

class HandlerB1[T: ElementA](HandlerA[T]): pass

class HandlerB2[T: ElementB](HandlerA[T]): pass

def generic_func[T: ElementA](x: T, handler: HandlerA[T]):
    if isinstance(handler, HandlerB1):
        # _pyright: HandlerB1[T@generic_func]
        # mypy: HandlerB1[Any]
        reveal_type(handler)
    elif isinstance(handler, HandlerB2):
        # _pyright: HandlerB2[Unknown]
        # mypy: HandlerB2[Any]
        reveal_type(handler)

At least pyright seems to take the type parameter bound into account and refines handler to HandlerB1[T] when the bound in HandlerB1 matches the bound in generic_func. It turns the bound into Any though for HandlerB2 even though that bound is compatible: ElementB is a subtype of ElementA.

I would think that at least it should be possible for both mypy and pyright to give HandlerB1[T] and HandlerB2[T] here for handler rather than using Any/Unknown. Is there some way that that could be unsound?

I don’t see why analysing the whole program flow is necessary here although I can certainly imagine that narrowing T itself might be something that requires significant changes in type checkers to be able to work. At least the pyright type output T@generic_func suggests that it models the type parameter as being inherently scoped to the function and so may not have a way even to represent the concept of a locally narrowed type for a type parameter.

ImogenBits · December 29, 2024, 12:27am

This kind of behaviour isn’t completely defined in the typing spec, so some of it doesn’t come down to whether type checkers could do something but rather whether they want to invest the effort into actually implementing it. The relevant part of the spec says that they should take the info from constructor calls into account and otherwise use Any. What the isinstance(handler, HandlerB1) does is tell the type checker that handler has type HandlerA[T] and also type HandlerB1. Since HandlerB1 is a subtype of HandlerA, we can be sure that the resulting type is HandlerB1[Something] for some type Something. The problem now is that HandlerB1 doesn’t carry info on what its T is, so the type checker needs to do some kind of inference.

In general, the question is what Parent[A] intersected with Child[UnknownType] should narrow to. As we can see in the HandlerB2 example, that intersection can’t always be Child[A] since the bound on the Child’s type var might be tighter than the parent’s. But we also can’t just use the bounding type since the only thing we know about A is that it is some subtype of that bounding type.

It seems that mypy decided to just always go with the safe and direct route of using Any. I haven’t done a lot of testing around this, but I’d guess that pyright first tries to see if Child[A] would be legal, and if it doesn’t falls back to Child[Any]. I’m not sure what the most technically correct general answer would be. It seems that the intersection of the Child’s bound and parent’s specified type should be safe, but there might be some issues with weird combinations of variances and other inferences that would also have to happen. But even then, the resulting type in your example would not be HandlerB2[ElementB] but HandlerB2[T & ElementB]. These are different in that we know that T & ElementB is some subtype of ElementB, but we don’t know which one. It also doesn’t imply that x is an element of ElementB since it might be some super type. For example, consider this code:

class ConcreteElementA(ElementA):
    pass

class ElementC(ElementB):
    def c_method(self) -> Self:
        return self

def new_scope(elem: ElementA, handler: HandlerA[ElementA]) -> None:
    generic_func(elem, handler)

my_elem = ConcreteElementA()
my_handler = HandlerB2[ElementC]()

new_scope(my_elem, my_handler)

It uses new_scope just to make sure that elem and handler are considered exactly the types we want them to, regardless of any additional type inference that is happening. We can see that generic_func is called with an ElementA (that isn’t an ElementC), and a HandlerB2[ElementC]. That means that the type var T of generic_func must be instantiated with ElementA. But it also means that in the second conditional branch, we cannot infer T to be ElementB or ElementC, even though handler has type HandlerB2.

oscarbenjamin · December 29, 2024, 1:14am

Imogen:

For example, consider this code:

class ConcreteElementA(ElementA):
    pass

class ElementC(ElementB):
    def c_method(self) -> Self:
        return self

def new_scope(elem: ElementA, handler: HandlerA[ElementA]) -> None:
    generic_func(elem, handler)

my_elem = ConcreteElementA()
my_handler = HandlerB2[ElementC]()

new_scope(my_elem, my_handler)

Maybe I’m modelling this wrong somehow in the type system. In context your call to new_scope should be an error because the element type does not match the handler type. While much of this code is generic when it gets down to the concrete types each concrete handler type is associated with exactly one concrete element type. That is why it is safe to check the type of the handler and then assume that the type of the elements is known.

ImogenBits · December 29, 2024, 1:36am

Ah, in that case I’m assuming that HandlerA[T]'s type var is invariant. Probably because it actually looks something like this?

class HandlerA[T: ElementA]:
    def do_something(self, element: T) -> None: ...
    def do_other_thing(self) -> T: ...

In that case the type checker can indeed make some more inferences safely. But we still don’t have HandlerB2[ElementB] in the second branch but HandlerB2[T] and we know that T is a subtype of ElementB. In practice, this distinction doesn’t matter in most cases, but it does make a difference if you e.g. have a function that takes HandlerB2[ElementB]. You wouldn’t be able to call it with an element of HandlerB2[T].

But regardless of which variance we use, we still won’t see the most optimal behaviour from the type checkers. This stuff is outside the current scope of the typing spec and too niche and specific to each possible example to be widely implemented.

oscarbenjamin · December 29, 2024, 2:13am

Yes, I think so.

Here is a more explicit demonstration of the sort of thing I am trying to do although this is also just an example:

from __future__ import annotations
from typing import Self, Protocol
from fractions import Fraction

class RingElement(Protocol):
    def __add__(self, other: Self, /) -> Self: ...
    def __mul__(self, other: Self, /) -> Self: ...

class FieldElement(RingElement, Protocol):
    def __truediv__(self, other: Self, /) -> Self: ...

class Ring[E: RingElement]:
    def one(self) -> E: ...
    def add(self, a: E, b: E) -> E: ...

class Field[E: FieldElement](Ring[E]): ...

class IntegerRing(Ring[int]):
    def one(self) -> int: return 1
    def add(self, a: int, b: int) -> int: return a + b

class RationalField(Field[Fraction]): ...

ZZ = IntegerRing()
QQ = RationalField()

def generic_func[E: RingElement](a: E, b: E, ring: Ring[E]) -> E:
    if isinstance(ring, Field):
        return a / b # type checker complains here
    else:
        return a + b