Allowing specializing a bound, covariant `TypeVar` with a `Protocol` as a limited form of intersection

Let’s consider this motivating example (Edit: Originally when writing this post I had forgotten Protocol in CanNeg’s definition, so I’ve fixed it.):

from typing import Generic, Protocol, TypeVar, final, reveal_type

type Arg = int | str

class CanNeg[Out](Protocol):
    def __neg__(self, /) -> Out: ...


_T_co = TypeVar("_T_co", bound=Arg, covariant=True)

@final
class Foo(Generic[_T_co]):
    def __init__(self, value: _T_co, /) -> None: ...
    @property
    def value(self, /) -> _T_co: ...

    def __neg__[U: Arg](self: Foo[CanNeg[U]], /) -> Foo[U]:  # error
        return Foo(-self.value)  # pyright does not understand that self.value: CanNeg[U]


x: Foo[int] = Foo(42)
reveal_type(-x.value)  # int
reveal_type(-x)        # Foo[int]

(pyright playground)

The definition of Foo.__neg__ here is wrong, because Foo[CanNeg[U]] is ill-formed: the statement "for all types X, X <: CanNeg[U] implies X <: Arg” is false.

If we had intersections between a concrete type and a protocol, one would write Foo[CanNeg[U] & Arg].

Note: I know one could’ve just used self: Foo[int], but that’s not the point. I have a codebase where I need to annotate something where Arg is a way more complex union of very different types (from external libraries), each one with a complex __neg__ definition that does not always return Self. I’d like to annotate Foo.__neg__ without needing to worry about those return types and without needing to repeat a plethora of overloads over and over again.

Relaxing the Arg bound is probably the best option as of now, but it’s suboptimal and would allow writing invalid types, such as Foo[None] (in this example).


I think it would be interesting to consider giving the same “intersection” meaning to the pattern in this example. More formally:

Let A[+T: B] be a type, generic over a covariant type variable T with bound B, and let P be a protocol.

Then, the expression A[P] represents the union of all types A[X] with X <: P and X <: B.

Therefore:

  • if X is a type, A[X] <: A[P] iff X <: B and X <: P;
  • if Y is a type, Y <: A[P] iff there exists an X <: B such that Y <: A[X] and X <: P.

Then, Foo[CanNeg[U]] in the example would essentially mean Foo[CanNeg[U] & Arg] and the code would type check as intended. Importantly, this would work without needing to introduce any syntax change, and would be backwards-compatible - it just gives meaning to currently invalid annotations.

It’s worth noting that no bound is equivalent to a bound of object (per the spec, see object’s usage as the top type and TypeVar upper bound definition): in this case (B = object), I believe the semantics I’ve just described are equivalent to the status quo.

Regarding forwards-compatibility with user-expressible intersections, I think this should not be much of a problem, although intersections would include this behaviour making this proposal obsolete. I still think it’s a nice shorthand, intended for long classes with a lot of functions where one might not want to repeat the & Arg bound over and over again. I know, explicit is better than implicit, but too explicit might sometimes harm readibility (and maintainability), especially in big, complex codebases.


I’d be curious to get a sense of how difficult would it be to implement this behaviour in type checkers. In case the implementation happens to be too challenging, I guess it’s possible to restrict this behaviour to the following special cases:

  • A is a concrete type (meaning, not a Protocol),
  • B is a concrete type (although I don’t expect B being a Protocol to be much of a problem).

Edit: It’s worth noting that pyright does seem to partially understand the proposed pattern correctly, see the playground link above.

Here, I’ve always taken A as covariant in T because if T was invariant, the type A[P] would imply X <: P <: X in the above, which I don’t think is very useful if X is intended to be a concrete type (for example, if B is a concrete type).

Disclaimer: This post is just intended to get a sense of whether or not it might make sense to introduce something like this in the type system.
I’ve searched this forum for similar proposals and found none, but if I’ve missed something, please let me know.
I also don’t think I’d have the capacity (nor the experience), as of now, to write a PEP or a similarly formal proposal.

2 Likes

I think this makes a lot of sense. This would also solve another common category of type-unsafe situations where Any is passed as type argument to a type parameter with an upper bound. For example in NumPy there’s the numpy.datetime64[ItemT: int | date] scalar type, where dt.item() -> ItemT for dt: datetime64[ItemT]. The date here refers to the stdlib datetime.date.

It’s not always possible to statically determine the actual ItemT type though, in which case Any is used. For some dt_any: datetime64[Any], type-checkers will accept clearly statements that are clearly type unsafe, such as value: str = dt_any.item(). Because, after all, Any is assignable to anything, including str. The problem is that this completely bypasses the upper bound of ItemT: int | date.

If we extend this proposal by lifting the “P is a protocol” restriction, then it would also solve this issue: datetime64[Any] would be intepreted as datetime64[Any & (int | date)], so that value: str = dt_any.item() would be rejected.


If the intersection types complicate things here, it’s also possible to rephrase this as constraint-based typing, i.e. constraining Any (and only Any) based on the bounds of the type variables it touches, effectively limiting its possible materializations from above.


BTW, I really like your type-theoretical writing style, I wish there was more of that here.

3 Likes

That’s right, now that I think about it I’ve also had the same problem in the past, although I don’t really recall for which project. Actually, this is a great example of an use case when T is invariant/contravariant (or, I guess, not necessarily covariant).

I agree, I guess my strongest point here is that this feature is intended to be a special case of intersections, while being easier to implement in the meantime.

I agree this might be also useful in the general case, but I think in that case we’d have to think more about the UX. For examples, if P and B were both @disjoint_base concrete types (take e.g. B = int, P = str), P & B would be Never, which means that A[P] would silently be A[Never] (which might or might not be equivalent to Never, depending on A’s definition). A user could expect an error here, although I think a warning or some other lint with a message like “note: this type is implicitly Never because no instance of P is assignable to B” would be sufficient.

That’s why I think the simplest path, for now, would be to restrict P to be a Protocol (or, yes, Any).

1 Like

Protocol can also be used as an abc base class, i.e. it can be both structural or nominal. So I don’t see how restricting this to just protocols would simplify things.