Union of invariant types

oscarbenjamin · August 27, 2025, 9:25pm

Both mypy and pyright reject the second case in the following code:

class A: pass
class B: pass

# This is fine:

def g1[T](x: T) -> T:
    return x

def f1(x: A | B) -> A | B:
    return g1(x)

# Type checkers don't like this:

def g2[T](x: list[T]) -> list[T]:
    return x

def f2(x: list[A] | list[B]) -> list[A] | list[B]:
    return g2(x)

Is it unreasonable to expect them to see that if a function turns list[T] into list[T] then it can turn list[A] | list[B] into list[A] | list[B]?

jorenham · August 27, 2025, 9:42pm

f2 can be decomposed in four concrete signatures:

list[A] -> list[A]
list[B] -> list[B]
list[A] -> list[B]
list[B] -> list[A]

The first two are compatible with g2. But for the last two, there exists no T (besides Any) that could makes g2 compatible.

samwgoldman · August 27, 2025, 10:20pm

If you wrote your g2 signature differently, then type checkers will accept your progam:

def g2[T](x: T) -> T:
    return x

def f2(x: list[A] | list[B]) -> list[A] | list[B]:
    return g2(x)

In practice, your g2 is probably doing something a bit more interesting, yeah?

oscarbenjamin · August 27, 2025, 10:51pm

I think this is only relevant if we want to assign the function f2 to something. The type checkers are complaining about calling g2 in the body of f2. For the individual call of f2 the input will be one or other member of the union (or both) and it is only required that the output be one or other member of the union.

I should have included the error messages before:

$ mypy t.py
t.py:18: error: Cannot infer type argument 1 of "g2"  [misc]
Found 1 error in 1 file (checked 1 source file)
$ pyright t.py
WARNING: there is a new pyright version available (v1.1.403 -> v1.1.404).
Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to `latest`

~t.py
  ~/t.py:18:15 - error: Argument of type "list[A] | list[B]" cannot be assigned to parameter "x" of type "list[T@g2]" in function "g2"
    Type "list[A] | list[B]" is not assignable to type "list[A]"
      "list[B]" is not assignable to "list[A]"
        Type parameter "_T@list" is invariant, but "B" is not the same as "A"
        Consider switching from "list" to "Sequence" which is covariant (reportArgumentType)
1 error, 0 warnings, 0 informations

Yes, the actual code involves generics and other things. This example with sets is a bit closer:

class E:
    pass

def convert(x: E | int) -> E:
    return E()

def set2list[T](s: set[T]) -> list[T]:
    return list(s)

def f(x: list[E] | list[int]) -> list[E]:
    return [convert(xi) for xi in x]

def g(x: set[E] | set[int]) -> list[E]:
    s = set2list(x) # <-- complains here
    return f(s)

The type checkers are complaining about the set2list function taking a union of sets.

samwgoldman · August 27, 2025, 11:17pm

Thanks for the example!

If it helps, here’s how I get intuition for the error behavior. In the call to set2list(x), the type checker is trying to answer the question “What type can we use for T in set2list that makes this program type check?”

Because set is invariant w.r.t. T, it turns out that there is no answer. What if we said T = E | int, so the signature becomes set2list(s: set[E | int]) → list[E | int]. But in that case, we can’t pass set[E] | set[int] as the argument to s, because neither set[E] nor set[int] are assignable to set[E | int].

In general, the solution is to make things more generic. In this case, we can see that set2list has a narrower than necessary type. list(s) would accept any Iterable, but we are constraining things to be set. So we can make things more general:

def iter2list[T](s: Iterable[T]) -> list[T]:
    return list(s)

Unfortunately, this is not quite enough due to how mypy’s constraint solver works. When we use our new iter2list, when mypy tries to find a solution for T, instead of finding E | int it finds object. Pyright finds the more precise E | int solution instead.

oscarbenjamin · August 27, 2025, 11:55pm

That seems to be what the type checkers are doing but it is also possible to do this separately for each element of the union. Then if set[A] -> list[A] and set[B] -> list[B] we can say that set[A] | set[B] -> list[A] | list[B].

In this context the types are always supposed to be the concrete stated types. It doesn’t matter what features of that type are used by the function because the expectation is always that we have the concrete type. If it should be a list then it should not be a tuple and so on regardless of whether the function only uses __len__ or __iter__.

jorenham · August 28, 2025, 12:03am

The return types of the g2 and f2 are a bit of a red herring, which is what got got me confused earlier. Because without the return types, type-checkers will behave in the same way:

from typing import Any

class A: pass
class B: pass

def g2[T](x: list[T]) -> Any:
    return x

def f2(x: list[A] | list[B]) -> Any:
    return g2(x)  # rejected

So I believe that this question boils down to whether list[A] | list[B] should be assignable to list[T] or not.

Let’s say that it should be assignable. Then an obvious question we could ask is what T will be in that case. It cannot be A | B, because list[A | B] is not consistent with list[A] | list[B]. So it then must be something that isn’t expressible in our type system (unless you include Any). Perhaps a disjoint union could work?

Jelle · August 28, 2025, 12:05am

It seems type checkers could handle this case (without loss of safety) by performing a version of the argument expansion from Overloads — typing documentation , and checking the variants separately. In this case, that would mean T gets solved twice, once to A and once to B, resulting in return types of list[A] and list[B], that then get unioned together again. Implementing this would surely be complex, and maybe there are further problems I didn’t think of.

jorenham · August 28, 2025, 12:14am

So list[T] → list[T] | list[T] -?> list[T1] | list[T2], which can then be unified with list[A] | list[B], giving that T1: A and T2: B. But then, how to you get back from having independent T1 and T2 to the T? Or do we just keep it “non-deterministic”, and don’t bother with that last step?

oscarbenjamin · August 28, 2025, 12:21am

Why do you need to get back to “the” single T?

If S1 -> T1 and S2 -> T2 then S1 | S2 -> T1 | T2.

jorenham · August 28, 2025, 12:29am

Because there are types like TypeIs that it might be used for, which don’t support unions, e.g. TypeIs[A] | TypeIs[B] is invalid, if I remember correctly.

mikeshardmind · August 28, 2025, 11:09am

You can use constrained typevariables for this use case:

class A: pass
class B: pass

def g1[T](x: T) -> T:
    return x

def f1[T](x: T) -> T:
    return g1(x)


def g2[T](x: list[T]) -> list[T]:
    return x


def f2[T: (A, B)](x: list[T]) -> list[T]:
    return g2(x)

I know at least a few people don’t like constrained type variables, but they are genuinely useful and concisely express certain cases where you care about specific discrete concrete types.

oscarbenjamin · August 28, 2025, 12:19pm

Constrained type variables are useful although I always want to avoid introducing an additional type variable if possible. One reason for that is that is that you cannot have type variables on type variables so there is a limited budget for type variables that must be used sparingly.

Moving the goal posts again in my motivating case the types are already generic so a constrained type variable does not work:

def set2list[T](s: set[T]) -> list[T]:
    return list(s)

# set[E] | set[int] cannot assign to set[T]:

def func1[E](x: set[E] | set[int]) -> list[E] | list[int]:
    return set2list(x)

# TypeVar bound cannot be generic:

def func2[E, T: (E, int)](x: set[T]) -> list[T]:
    return set2list(x)

mikeshardmind · August 28, 2025, 1:45pm

The reason this is disallowed is that it requires (or rather, creates) higher kinded types as an expressible type. This isn’t specified, is specifically prevented, and in my own personal estimation, is unlikely to ever be usefully specified unless we can change the way the specification works to not special case the type system itself for “convenience” or “pragmatism”. I should note that this does not mean type checkers need to error for these things^[1]

In this particular case, E isn’t constrained itself nor used outside of the constraint you are placing it in, so this can be simplified out entirely by just using T there without (E, int) as the constraint.

I imagine your actual case that simplification isn’t obviously available, so if you could provide a slightly more complete representative example (if it’s open source or otherwise sharable, I’d be willing to look into the actual code directly here) there might be another expression that doesn’t run afoul of lack of HKTs (there might also not be, there is valid code that isn’t typable right now)

Off topic explanation: I’m not saying type checkers have to error on these cases, but to reasonably implement HKTs, I believe a change needs to happen such that type checkers track the actual known types and what rules have been violated, then they can apply pragmatism to the rules violated, rather than to the type system itself, when deciding whether or not to surface an error. ↩︎

oscarbenjamin · August 28, 2025, 4:52pm

The exact line of code I was looking at before starting this thread is line 660 here:

github.com/sympy/sympy

sympy/polys/densebasic.py

32187ea70


      
          def dmp_normal(f: dmp[int] | dmp[MPZ], u: int, K: Domain[Er]) -> dmp[Er]:
              """
              Normalize a multivariate polynomial in the given domain.
          
              Examples
              ========
          
              >>> from sympy.polys.domains import ZZ
              >>> from sympy.polys.densebasic import dmp_normal
          
              >>> dmp_normal([[], [0, 1, 2]], 1, ZZ)
              [[1, 2]]
          
              """
              if not u:
                  return _dmp(dup_normal(_dup(f), K))
          
              v = u - 1
          
              return dmp_strip([dmp_normal(c, v, K) for c in f], u)

Specifically pyright objects to _dup(f) although mypy does not (I think mypy implicitly uses Any in dmp). The dmp[T] type is a list-of-lists-of-…-of-T explained in a previous thread and the MPZ type in another.

Actually I had misremembered which types are generic. The constrained type variable approach can be used for this particular function because int and MPZ are both concrete types (unusual in this code).

There are definitely other cases where a constrained type variable does not work though because it is more like this which is just my second example from above but with E being a type variable rather than a class:

from typing import Protocol

class F:
    pass

class Domain[E: F](Protocol):
    def convert(self, x: E | int) -> E:
        ...

def convert_list[E: F](x: list[E] | list[int], K: Domain[E]) -> list[E]:
    return [K.convert(xi) for xi in x]

def set2list[T](x: set[T]) -> list[T]:
    return list(x)

def convert_set[E: F](x: set[E] | set[int], K: Domain[E]) -> list[E]:
    y = set2list(x) # <-- complains here
    return convert_list(y, K)

I think my original question in this thread was answered by Jelle above. Yes, it could be possible in principle for type checkers to distribute a union of matching types over a generically typed function although no type checkers currently implement that. If they did then this code would check fine without modification.

There are many ways to work around that though. Perhaps since convert_set and dmp_normal are intended to convert an input that is not yet in the valid form for the type then maybe in this one place the annotation could change from set to covariant Set (or Iterable etc) as suggested above. Everywhere else the types like list[E] etc are supposed to be the exact stated types and e.g. Sequence is not appropriate but maybe here it is okay.

Any suggestions about the annotations in the code shown are certainly welcome though. This is the first part of SymPy to have relatively complete type annotations and it has taken a while to figure out how to write them. The main weakness I see is the recursively defined dmp type:

_T = TypeVar("_T")
dup: TypeAlias = "list[_T]"
dmp: TypeAlias = "list[dmp[_T]]"

It seems like pyright understands this type but I’m not convinced that mypy does:

from typing import reveal_type

type dmp[T] = list[dmp[T]]

def dmp_func[T,S](f: dmp[T], g: dmp[S]):
    reveal_type(f)
    reveal_type(f[0])
    reveal_type(f[0][0])
    f = g # error

$ mypy t.py
t.py:6: note: Revealed type is "builtins.list[...]"
t.py:7: note: Revealed type is "builtins.list[...]"
t.py:8: note: Revealed type is "builtins.list[...]"
Success: no issues found in 1 source file

$ pyright t.py
t.py:6:17 - information: Type of "f" is "list[dmp]"
t.py:7:17 - information: Type of "f[0]" is "dmp[T@dmp]"
t.py:8:17 - information: Type of "f[0][0]" is "dmp[T@dmp]"
t.py:9:9 - error: Type "dmp[S@dmp_func]" is not assignable to declared type "dmp[T@dmp_func]"
    "builtins.list" is not assignable to "builtins.list"
      Type parameter "_T@list" is invariant, but "dmp" is not the same as "dmp"

mikeshardmind · August 28, 2025, 6:40pm

After more with this, I think the issue here is that we are too strict on what isn’t allowed in a typevariable constraint, as the equivalent overload in this case is obviously not an issue:

from typing import Protocol, overload, Any


class F:
    pass


class Domain[E: F](Protocol):
    def convert(self, x: E | int) -> E:
        ...


def set2list[T](x: set[T]) -> list[T]:
    return list(x)


@overload
def convert_list[E: F](x: list[E], K: Domain[E]) -> list[E]:  ...
@overload
def convert_list[E: F](x: list[int], K: Domain[E]) -> list[E]:  ...
def convert_list[E: F](x: list[Any], K: Domain[E]) -> list[E]:
    return [K.convert(xi) for xi in x]


@overload
def convert_set[E: F](x: set[E], K: Domain[E]) -> list[E]:  ...
@overload
def convert_set[E: F](x: set[int], K: Domain[E]) -> list[E]:  ...
def convert_set[E: F](x: set[Any], K: Domain[E]) -> list[E]:
    y = set2list(x)
    return convert_list(y, K)

This is significantly less elegant because the spec avoids avenues to HKT support more broadly than leaving HKTs up to typecheckers to error when encountered until specified.

samwgoldman · August 28, 2025, 6:55pm

I’ve been wondering why the spec forbids generic typevar bounds. Do you know why?

Separately, I have seen generic typevar bounds referred to as higher-kinded types fairly often in this forum, but I understand these as two different things. Maybe the terms are used in a different way in this community?

For what it’s worth, I see generic typevar bounds as a perfectly reasonable feature to support. I’ve been planning to implement this in Pyrefly as a proof of concept.

mikeshardmind · August 28, 2025, 6:59pm

They essentially allow the expression of HKTs in some (but not all) uses. It may be worth relaxing that section so that they can be generic in the cases that aren’t HKT equivalents with the new type parameter syntax. I don’t think this was worth considering prior to the new syntax, as placing the error where it applies was much more difficult.

I forget the exact way you have to use them to end up equivalent and dont really want to re-work it out rn, it had to do with class scoped type variables, and it was very contrived.

rsdenijs · August 28, 2025, 9:49pm

that would be a great feature!

jorenham · August 29, 2025, 2:26am

I believe that basedmypy has support for generic type parameter bounds, but that fork will no longer be maintained. And I think it would be great feature to have, but it’s probably something that we should discuss in a separate thread.