Clarifying the typing spec regarding polymorphic protocols

jonathan-laurent · August 7, 2024, 8:25pm

The typing spec is currently unclear when it comes to the difference between the following two protocol definitions:

class Proto1[T](Protocol):
    def __call__(self, arg: T, /) -> T: ...

class Proto2(Protocol):
    def __call__[T](self, arg: T, /) -> T: ...

The semantics of Proto1 is well-documented already. However, the semantics of Proto2 isn’t. Hence my question: should such a definition be allowed and if so, how should it be interpreted?

In my opinion, a natural interpretation for Proto2 is to describe a polymorphic function of one argument. Technically, allowing such an interpretation introduces some form of rank-2 polymorphism into the language.

For example:

from typing import Protocol

class PolyIdentity(Protocol):
    def __call__[T](self, arg: T, /) -> T: ...

def apply_twice(f: PolyIdentity, /) -> tuple[int, str]:
    return f(1), f("a")

def id[T](x: T) -> T:
    return x

def incr(x: int) -> int:
    return x + 1

apply_twice(id)    # should be accepted
apply_twice(incr)  # should error

Version without PEP 695 features, to test with mypy

from typing import Protocol, TypeVar

T = TypeVar("T")

class PolyIdentity(Protocol):
    def __call__(self, arg: T, /) -> T: ...

def apply_twice(f: PolyIdentity, /) -> tuple[int, str]:
    return f(1), f("a")

def id(x: T) -> T:
    return x

def incr(x: int) -> int:
    return x + 1

apply_twice(id)
apply_twice(incr) # mypy error

Here, the id function implements the PolyIdentity protocol but the incr function does not, since it only accepts integer inputs.

As of today, mypy 1.11.1 is acting as I would expect on the snippet above, rejecting the second call to apply_twice but not the first one. Pyright 1.1.375 accepts both calls.

It would be good for the typing spec to decide on the correct behaviour here and document it.

Note: This post is inspired from a pyright issue.

LeeeeT · August 7, 2024, 10:25pm

Allowing assigning incr to PolyIdentity from pyright’s side is at least inconsistent with other type checking rules which it applies to check the usage of the protocol, since it allows one to make a pretty obvious bug leading to a runtime error as you’ve shown in your example. I think mypy’s behavior is correct.

mdrissi · August 8, 2024, 12:02am

I’d lean either,

PolyIdentity supports id but not incr.
PolyIdentity itself is flagged as unsupported and you can’t combine method level generic + protocol.

Option 1 would be to specify rank 2 types are supported. Option 2 is forbid them entirely. While my user preference is 1, I also feel 2 is likely to have a lot harder examples/cases. It’s rare that type system supports rank 2+ types. Some languages like haskell do support them fully, but how will it fit in complexity wise with current type system/other features I am not sure. It would not surprise me if rank 2 + few other python type features is undecidable. Simple lambda calculus rank 2 is decidable, while rank 3 is already undecidable (although only for inference with no annotations, while checking remains possible).

Also if we do option 1, how about this,

def make_f() -> Callable[[T], T]:
  ...

apply_twice(make_f()) # Should this be accepted?

What does T refer to here? Is it scoped when make_f is called or delayed for later? We currently lack a way to distinguish those two cases.

jonathan-laurent · August 8, 2024, 8:28am

My understanding about rank-2 polymorphism is that it makes type inference difficult but does not complicate type checking much. Concretely, the hard problem is to infer the most general type for all arguments of a function when some of them might be higher-rank functions, as the standard Hindley-Milner algorithm is not sufficient anymore in this case. This is the reason why OCaml does not allow higher-rank functions directly but allows universally quantified record fields, which provide the same service minus inference.

But contrary to languages such as Haskell or OCaml, Python type-checkers almost never attempt to infer argument types, which is well past undecidable anyway due to other features such as sub-typing, overloading… So rank-2 polymorphism may not be that big of a deal to support in Python (and might already be implemented in Mypy already for the most part).

Also, I am not sure that it is realistic to disallow the combination of method-level generics and protocols entirely. Due to Python’s object oriented nature, the need for those arises pretty naturally. For example, if you want to add a map function to the Sequence protocol:

class Seq[T](Protocol):
    ...
    def map[U](self, f: Callable[[T], U]) -> Seq[U]: ...

Finally, regarding this example from @mdrissi :

T = TypeVar("T")
def make_f() -> Callable[[T], T]:
  ...

apply_twice(make_f()) # Should this be accepted?

This is another example where the typing spec should be clearer. Mypy is currently accepting this example. Also, Pyright’s documentation says the following:

When a TypeVar or ParamSpec appears within parameter or return type annotations for a function and it is not already bound to an outer scope, it is normally bound to the function. As an exception to this rule, if the TypeVar or ParamSpec appears only within the return type annotation of the function and only within a single Callable in the return type, it is bound to that Callable rather than the function. This allows a function to return a generic Callable.

According to this rule, the type variable in the example above should be rescoped to the callable and thus the whole callable should represent a polymorphic function. More clarification is needed in my opinion regarding this rule and its motivation though, which seems somewhat ad-hoc in its current form. Also, it does not seem to be playing particularly well with PEP 695, which would have us write the following (unless new syntax is introduced for explicitly scoping T to the return type):

def make_f[T]() -> Callable[[T], T]: ...

The syntax above is possibly misleading since it suggests that T is scoped to the make_f function, when in reality it is scoped to the callable. Maybe @erictraut has some thoughts on this and why the current rescoping rule exists.

erictraut · August 8, 2024, 2:00pm

The rescoping rule you’re quoting here from the pyright documentation isn’t currently part of the typing spec. I implemented it this way in pyright to mirror mypy’s (undocumented) behavior. There are numerous libraries and stubs that rely on this behavior, so deviating from it was problematic. I’m not confident that pyright’s behavior is 100% consistent with mypy’s in this case, but it’s close. Whatever behavior we decide is correct here, I’d like to see it properly specified in the typing spec.

When we were working on the proposal for PEP 695, I explored ways to scope type variables to callables, but ultimately decided not to tackle that issue because the PEP was already ambitious and risked not being approved. This topic was also discussed in conversations related to PEP 677, which was ultimately rejected by the steering council.

mikeshardmind · August 8, 2024, 3:11pm

I think we can describe the difference here pretty clearly by stating when a type variable should be solved based on it’s scoping.

for generic types, type variables need to be compatible (such as bounds and constraints), and may (not must) be specialized at assignment.

for methods of a type which are parameterized with type variables not scoped to the type, type variable compatibility may still be a consideration (such as for LSP), but should not be specialized except at use, and compatibility should require that the method is compatible with the appropriate range of types based on variance.

jonathan-laurent · August 11, 2024, 11:53am

@mikeshardmind I agree with you in spirit, but type variable scoping is currently a major gray area of the typing spec. To avoid derailing the conversation, I opened a separate topic about it.

hauntsaninja · August 15, 2024, 8:09am

I do think we should avoid rescoping of the type variable to the return type when using PEP 695 syntax. PEP 695 makes the scope much clearer. Coupling with new syntax also reduces backwards compatibility issues.