Support Function With Multiple Generic Parameters Same Type

saulshanabrook · July 11, 2024, 7:06pm

I would like to be able to type a function that has multiple parameters, with the same generic variable, that only checks if the later parameters are compatible with the first.

Proposal

I propose by adding a bind_first flag to TypeVar:

T = TypeVar('T', bind_first=True)

def eq(x: T, y: T) -> bool: ...

eq(10, 11) # passes
eq(10, "str") # fails

This would be equivalent typing-wise to capturing the first argument in a generic class and then using that in a method, like:

T = TypeVar('T')

class eq(Generic[T]):
     def __init__(self, x: T) -> None: ...

     def to(self, other: T) -> bool: ...

eq(10).to(11)
eq(10).to("str")

Motiviation

There are multiple existing discussions asking for this feature, including one in the Mypy issue tracker from 2019 as well as another from the python/typing discussion board from 2022

Personally, I maintain a Python library that supports optimizing domain-specific languages, called egglog.

In multiple places in the public API (eq, set, rewrite, ne, union, birewrite), I use the workaround of creating a generic class and using a method on it to support having a function with multiple arguments that need to be the same exact class.

I have received multiple times that eq(x).to(y) is less ergonomic and intuitive than eq(x, y), example. For this use case, you might suggest using operator overloading to use something like __eq__ instead. In my case, I want to preserve the ability of users to define their own classes with custom equality and not use this for the meta-level eq.

One option would be to support both forms eq(x).to(y) (type safe) and eq(x, y) (non type safe, but more ergonomic), but it would be unfortunate to have to force users to pick either safety or ergonomics.

Alternatives

In the issues links above, there are a few other suggestions for how to resolve this. One is very similar to this one, but suggests a same_value flag. With that flag however, there are still some open questions about unions and subtypes. It seems like by choosing the first value passed in and binding to that, the same as if we were to make a generic class, we can have a bit clearer semantics by relying on those existing ones for the class version.

There was also a workaround involving having each sub-class you want to distinguish inherit from a different parameterized class. This works, and I would be happy using this if the only sub-classes I needed were defined in my library itself. However, in my case users can define their own sub-classes and those need to be differentiated as well, and having each have this dummy instantiated class they inherit from seems too involved.

Feedback

I would love to hear any thoughts from folks who have thought about this more, to understand if adding this kind of flag would present some unforeseen issues or if there could be a workaround with the existing typing mechanisms.

JamesParrott · July 11, 2024, 7:44pm

Isn’t that how generic functions are supposed to work already?

Have any of the type checkers implemented the new generics syntax yet?

mdrissi · July 11, 2024, 9:14pm

No that is not how they work. They instead infer type that fits for all usages. If you have function and you pass one int and one str it is valid for type checker to infer that T is Union[int, str]. It is also valid for type checker to infer that T is object.

As of right now exact type T is inferred to be is undefined and different type checkers may choose different answers.

Even with current post idea of infer from first usage only that still leaves some ambiguous cases. If I call function with argument 3 is it int or Literal[3]?

Edit: For this topic the old vs new type syntax makes no difference.

JamesParrott · July 11, 2024, 9:53pm

Ah right I see. That’s a very helpful explanation - thankyou.

erictraut · July 11, 2024, 9:53pm

Thanks for the post. I want to make sure I understand the motivation, so I’ll attempt to reiterate the problem statement. You said that you want to support cases where multiple arguments need to evaluate to “the same exact class”. I presume you mean that these values need to be instances of the same class. Do instances of subclasses work as well? I presume so, since a subclass (by virtue of the Liskov Substitution Principle) should be substitutable for its parent class.

What you’re trying to do is impose an additional constraint on the constraint solver. That’s reasonable, but I see a number of issues with your formulation.

First, the notion of “first” is not well defined. Is it based on left-to-right argument order? Runtime evaluation order of arguments (which can differ in certain cases from left-to-right syntactic order)? Parameter order? What if keyword arguments and parameters are involved, in which case order is irrelevant? What if the parameter type references the type variable multiple times, as in def foo(x: dict[T, T]): ...? Or def foo(x: list[T] | set[T]): ...? What does “first” mean in this case? Unions are intended to be order-independent, so type checkers generally normalize the order of subtypes in a union. It’s therefore not clear what “first” means for list[T] | set[T].

Despite these challenges, we might be able to define “first” in unambiguous terms, but it’s not as straightforward as one might expect.

Second, it’s not clear to me what you mean by “equivalent typing-wise”. Do you really mean equivalent types? If so, do you consider the expressions 10 and 11 to have equivalent types? Pyright evaluates the types of these two expressions as Literal[10] and Literal[11], which are not equivalent. The rules for type evaluation of expressions are not spelled out in the typing spec, and different type checkers will evaluate expression types differently.

(I’ll also note here that constraint solving behaviors are not currently defined in the typing spec, and there are many cases where mypy and pyright differ in how they solve constraints.)

Type equivalency is a very high bar. For example, the type list[Any] is not equivalent to list[int]. Maybe you don’t really mean “equivalent types”. Perhaps you instead mean that the first argument should establish an implied upper bound constraint (which would be consistent with my point about the LSP and subtypes). All other values that bind to the type variable would then need to be assignable to the implied upper bound type. If the first argument is 3.0, would you want to accept 1 for the second argument? Or would you force the developer to specify the second argument as 1.0? If the first argument evaluates to list[Any], would you expect a type error if the second argument evaluated to list[int]? An “upper bound” formulation seems closer to what you probably want, but this would lead to a bunch of unexpected order-dependent behaviors in common circumstances. Consider the following:

eq(3.0, 1) # OK
eq(1, 3.0) # Error

eq(x=3.0, y=1) # OK
eq(y=1, x=3.0) # Error

Going back to the original problem statement, we can see that the proposed solution (even if it didn’t have the issues I’ve demonstrated above) fails to solve the problem as stated. The following code would type check without errors. There’s currently nothing in the type system that says that a type needs to be “a single class”. Unions are supported, as are structural types.

def func1(v: int | str):
    eq(v, "") # OK

func1(1) # OK

def func2(x: Hashable, y: Hashable):
    eq(x, y) # OK

func2(1, "") # OK

def func3(x: object, y: object):
    eq(x, y) # OK

func3(1, "") # OK

I’ll also note that you’re using an old (pre-PEP 695) technique for defining type variables in your sample code. It’s not clear how one would designate a “bind first” type parameter using the modern syntax. We could expand the syntax if this feature were sufficiently compelling, but it’s something that would need to be addressed.

I’m curious whether you know of any other programming languages that have a “bind first” construct like the one you’re proposing. I don’t know of any, and I’d be surprised if we could find such an example. It might be worth doing a survey of other programming languages to see if any of them have addressed this use case. If so, how?

Most programming languages provide a way to specify constraints through explicit specialization. Python already supports this for generic classes and generic type aliases, but it doesn’t support it for generic functions due to a limitation in the grammar. Draft PEP 718 proposes to add this functionality. This would allow you to use a type expression to add an explicit constraint for a generic function:

eq[int](0, 1) # Succeeds
eq[int](1, 2.0) # Fails
eq[int](2.0, 1) # Fails
eq[float](2.0, 1) # Succeeds

However, you may find this to be less ergonomic, so I can see why this might not feel like a great solution for your use case.

The solution you’ve adopted in your library — using a generic class and a method on it — strikes me as a good solution. The syntax may take a little getting used to, but it solves the problem nicely and guarantees type safety.

Jelle · July 11, 2024, 10:14pm

One variation that I think could be specified and implemented more precisely: Add a type qualifier that can wrap a TypeVar and means “do not use this usage of the TypeVar to add a constraint to the constraint solver”. Let’s call it NoConstraint.

It would behave like this:

def f[T](x: T, y: NoConstrain[T]): ...

def caller(i: int, o: object):
    x(i, 1)  # OK, T is solved to int, 1 is assignable to int
    x(i, o)  # error, T is solved to int, object is not assignable to int
    x(o, i)  # OK, T is solved to object

I think this would do what the OP wants and doesn’t rely on notions like the “first” usage.

I haven’t tried to actually implement it, though, and I don’t know if this is useful enough to add to the type system.

MegaIng · July 11, 2024, 10:23pm

Another potentially more possible-to-implement alternative could be adding a different restriction option to TypeVar ^[1] namely, “has to be assigned a concrete runtime class”, i.e. something that can be returned by type(obj). This would mean:

Literal[...] types aren’t valid, they are never returned by type()
int | str is not a valid inference (meaning func1 given by @erictraut above isn’t valid)
func2 isn’t valid since it tries to pass Hashable to eq, which is just a protocol (although an exceptionally weird one.)
However, object would still be a valid interference. Maybe it would be possible to either tell type checkers to always reject object if this option is used, or to require that one of the two arguments has to have the exact type being inferred? The later one seems impossible to actually enforce.

So I don’t think this option is viable, but maybe it brings others to more helpful ideas.

ignoring the issue new syntax for now ↩︎

NeilGirdhar · July 12, 2024, 3:35am

Maybe it would help if you explain more concretely why you need the types to “match”. As people have said, it’s not clear what matching means. But it’s also not clear that it’s what you need?

It seems like you want the types to “match” because you want to ensure that x == y is defined. Perhaps, that’s the thing you want to assert somehow?

Thus, you want something like:

def eq[T: Any, U: SupportsEq[T]](x: T, y: U) -> bool:
  ...
  x == y
  ...

where SupportsEq somehow enforces the constraint that x == y is meaningful. But in Python that’s meaningful for nearly all types.

lawsonjl · July 12, 2024, 7:17pm

Typescript added exactly this feature fairly recently: NoInfer<T>

So there’s some prior art for that one, at least.

InSync · July 16, 2024, 6:24am

Unless I’m mistaken, this is the TypeScript equivalent of what OP is trying to do:

declare function eq<T, U extends T>(x: T, y: U): boolean;

f(10, 11);     // fine
f(10, 'str');  // error

In words: Use two type variables, the second extending the first.

The corresponding Python code would be:

def eq[T, U: T](x: T, y: U) -> bool:
    ...

…or:

T = TypeVar('T')
U = TypeVar('U', bound = T)

def eq(x: T, y: U) -> bool:
    ...

These are already valid at runtime, so “only” the typing specs would need to change to allow this. Better than adding new constructs/parameters, no?

Jelle · July 16, 2024, 4:44pm

That’s interesting, I wonder how that interacts with TypeScript’s NoInfer feature noted in the post above yours.

Intuitively, I would think that with your code, if a type checker sees a call to eq(10, "str"), it would be valid to solve both T and U to object. TypeScript doesn’t allow that (playground link: TS Playground - An online editor for exploring TypeScript and JavaScript), so it must have some extra type inference rules here. A proposal to add this feature to the Python type system would have to spell out what those rules are.

saulshanabrook · July 16, 2024, 9:10pm

Thank you all for your responses! I wanted to first refine a bit my particular use case and then go into the different ways to resolve it based on the suggestions here.

Motivation

Let me give an example from the docs on the sort of things I want it to check:

from egglog import *

class Num(Expr):
    def __init__(self, value: i64Like) -> None: ...

    @classmethod
    def var(cls, name: StringLike) -> Num:  ...

    def __add__(self, other: Num) -> Num: ...

    def __mul__(self, other: Num) -> Num: ...

egraph = EGraph()

@egraph.register
def _num_rule(a: Num, b: Num, c: Num, i: i64, j: i64):
    yield rewrite(a + b).to(b + a)
    yield rewrite(a * (b + c)).to((a * b) + (a * c))
    yield rewrite(Num(i) + Num(j)).to(Num(i + j))
    yield rewrite(Num(i) * Num(j)).to(Num(i * j))

Here, I am using the rewrite(...).to(...) syntax so that we can verify that the LHS and RHS are the of the “same type”. For example, a + b can be rewritten to b + a (because they are both of type Num), but Num(i) + Num(j) could not be rewritten to i + j, because the LHS is of type Num and the RHS is of type i64.

Yes, I mean that they are instances of the same class. In my case, sub-classing and unions won’t come up, so I would be fine with any way they are handled.

I agree that this is ambiguous with the current proposal. In my case, I would only need it to work called with positional arguments and as the top level parameter, not nested within a list.

Eric Traut:

Second, it’s not clear to me what you mean by “equivalent typing-wise”. Do you really mean equivalent types? If so, do you consider the expressions 10 and 11 to have equivalent types? Pyright evaluates the types of these two expressions as Literal[10] and Literal[11], which are not equivalent. The rules for type evaluation of expressions are not spelled out in the typing spec, and different type checkers will evaluate expression types differently.

(I’ll also note here that constraint solving behaviors are not currently defined in the typing spec, and there are many cases where mypy and pyright differ in how they solve constraints .)

Type equivalency is a very high bar. For example, the type list[Any] is not equivalent to list[int]. Maybe you don’t really mean “equivalent types”. Perhaps you instead mean that the first argument should establish an implied upper bound constraint (which would be consistent with my point about the LSP and subtypes). All other values that bind to the type variable would then need to be assignable to the implied upper bound type. If the first argument is 3.0, would you want to accept 1 for the second argument? Or would you force the developer to specify the second argument as 1.0? If the first argument evaluates to list[Any], would you expect a type error if the second argument evaluated to list[int]? An “upper bound” formulation seems closer to what you probably want, but this would lead to a bunch of unexpected order-dependent behaviors in common circumstances.

Thank you for clarifying this. I do mean that subsequent arguments would be assignable to the inferred type of the first argument, not that they are equivalent. In my particular case equivalence would be fine, since sub-classing and literals don’t come up, if we think about it as all cases that work with the eq(10).to(11) syntax, then assignability is what I mean.

Basically, if it types checks in the current syntax of the generic class called once with one arg, then with a method with another arg, then it should type check with both args passed in. So any behavior that replicates that would be fine.

That’s fine if those all pass. I am not trying to statically guarantee that it’s called with a statically resolved single class. I just want it to raise an error if I pass in two different single classes.

Yes I can see that it solves the problem. I wouldn’t bring it up, it’s just that I have had multiple users complain and get confused about it. It’s unfortunate that to make the library work with those who do use static typing, I have to adopt a syntax that’s more verbose for everyone.

Yeah in this case I am not trying to catch type errors in the body of eq but in it’s downstream usage. I

Solutions

I didn’t realize that Typescript had implemented very similar with NoInfer!

I think that’s a better solution for the problem and has the advantage of already being implemented there so it feels a bit battle-tested. They already went through all the trouble of refining their behavior so maybe we can re-use some of that (see the implementing PR).

InSync:

Unless I’m mistaken, this is the TypeScript equivalent of what OP is trying to do:
declare function eq<T, U extends T>(x: T, y: U): boolean;

f(10, 11);     // fine
f(10, 'str');  // error
In words: Use two type variables, the second extending the first.

The corresponding Python code would be:
def eq[T, U: T](x: T, y: U) -> bool:
    ...
…or:
T = TypeVar('T')
U = TypeVar('U', bound = T)

def eq(x: T, y: U) -> bool:
    ...
These are already valid at runtime, so “only” the typing specs would need to change to allow this. Better than adding new constructs/parameters, no?

It mentions this solution in the Typescript blog post introducing NoInfer:

One way people currently deal with this is to add a separate type parameter that’s bounded by the existing type parameter.
[…]
This works, but is a little bit awkward because D probably won’t be used anywhere else in the signature for createStreetLight. While not bad in this case, using a type parameter only once in a signature is often a code smell.

That’s why TypeScript 5.4 introduces a new NoInfer<T> utility type. Surrounding a type in NoInfer<...> gives a signal to TypeScript not to dig in and match against the inner types to find candidates for type inference.

I would use this “awkward” solution if it was already supported, but I would hesitate to advocate for implementing that more general syntax for this small use case unless we were already set on allowing bound typesvars to depend on other typevars. I would worry that the complexity of implementing it would be a bit higher.

NeilGirdhar · July 16, 2024, 9:18pm

Do you mind showing an example of downstream usage that should be caught by type checkers? Because a == b is fine for most Python types due to the default implementation accepting object.

saulshanabrook · July 16, 2024, 9:46pm

In the Matrix example, I have this rewrite for example which uses a number of eq functions:

    rewrite(kron(A, B) @ kron(C, D)).to(
        kron(A @ C, B @ D),
        # Only when the dimensions match
        eq(A.ncols()).to(C.nrows()),
        eq(B.ncols()).to(D.nrows()),
    ),

If instead I had made a mistake and tried to make the rewrite conditional on if the cols of A is equal to B, this should give a type error (one is of type Dim the other of type Matrix):

    rewrite(kron(A, B) @ kron(C, D)).to(
        kron(A @ C, B @ D),
        # Only when the dimensions match
        eq(A.ncols()).to(C.nrows()),
        eq(B.ncols()).to(D), # type error
    ),

As you note, if I had used the builtin __eq__ this is easy to type, but I don’t in this case because I want to preserve __eq__ for user defined code, i.e. as an operation which might return a custom Boolean type instead of being a metaprogramming eq.