When a static type checker evaluates a call, it needs to determine whether the supplied arguments are compatible with the target’s signature. Type checkers do this as a two-step process. First, they map arguments to parameters. Second, they evaluate whether the type of each argument is assignable to the type of its corresponding parameter.
The argument-to-parameter mapping is complicated by the fact that an argument can map to multiple parameters (e.g. in the case of unpacking), and parameters can map to multiple arguments (e.g. in the case of variadic parameters). Furthermore, these mappings sometimes cannot be statically determined in an unambiguous manner. This occurs when an unpack operator (*
or **
) is used in an argument expression and applied to a type that unpacks to an indeterminate number of objects (e.g. an Iterable[T]
or Mapping[str, T]
).
Consider the following example. Which of these calls should be permitted, and which should be considered errors?
def func(x) -> None: ...
def test(a: list):
func(*a)
func(1, *a)
func(*[1, *a])
func(*(1, *a))
func(*a, 1)
func(*[*a, 1])
func(*(*a, 1))
func(*a)
func(x=1, *a)
func(*a, x=1)
The typing spec is silent about how type checkers should handle these ambiguous cases. Not surprisingly, we see divergent behaviors between type checkers (pyright and mypy).
Why is it important for us to standardize this behavior? We have been attempting to standardize the behaviors for overload call evaluation here, but this necessarily depends on the behaviors for simple (non-overloaded) call evaluation. Without consistency in the non-overloaded case, we will not achieve consistency in the overloaded case.
The current behaviors observed in pyright and mypy not only diverge between type checkers, but mypy’s behavior is arguably inconsistent with itself. Pyright’s behavior is likewise inconsistent with itself because it attempts to (but does not entirely) match mypy’s behaviors. In other words, I don’t think we want to codify the current behavior of either of these type checkers.
Here’s a more complete code sample that demonstrates some of the important differences.
def positional[T1, T2, T3](x: T1, y: T2, z: T3) -> tuple[T1, T2, T3]: ...
def test_positional(p1: list[int], p2: list[str]):
x1 = positional(*p1) # OK
reveal_type(x1) # tuple[int, int, int]
x2 = positional(*p1, *p2) # OK
reveal_type(x2) # tuple[int, int, int]
x3 = positional("", *p1, *p2) # OK
reveal_type(x3) # tuple[str, int, int]
x4 = positional(*p1, *p2, z="") # Mypy: Error, Pyright: OK
reveal_type(x4) # tuple[int, int, str]
x5 = positional(*p1, "") # Error
reveal_type(x5)
x6 = positional(*p1, "", *p2) # Error
reveal_type(x6)
def keyword[T1, T2, T3](*, x: T1, y: T2, z: T3) -> tuple[T1, T2, T3]: ...
def test_keyword(p1: dict[str, int], p2: dict[str, str]):
x1 = keyword(**p1) # OK
reveal_type(x1) # tuple[int, int, int]
x2 = keyword(**p1, **p2) # OK
reveal_type(x2) # Mypy: tuple[object, object, object], Pyright: tuple[str, str, str]
x3 = keyword(x=1.0, **p1, **p2) # OK
reveal_type(x3) # Mypy: tuple[float, object, object], Pyright: tuple[float, str, str]
x4 = keyword(**p1, **p2, x=1.0) # OK
reveal_type(x4) # Mypy: tuple[float, object, object], Pyright: tuple[float, str, str]
In situations like this, I find it’s useful to first agree on a principle, then define a set of behaviors that are consistent with that principle. There are a couple of straightforward principles that we could adopt here:
- Allow any set of arguments that could potentially succeed at runtime. In other words, if there is at least one potential argument-to-parameter mapping that will succeed at runtime, a type checker will assume that mapping without generating an error. This minimizes false positives, permitting common use cases to type check without error, but it’s at the expense of some false negatives. This principle is likely preferred by casual users of type checkers (i.e. the vast majority of Python developers) but will probably not satisfy those who prefer strict type checking (a small but vocal minority). If we were to adopt this principle, then all of the above examples would type check without error.
- Disallow any set of arguments that could potentially fail at runtime during the argument-to-parameter matching process. This eliminates all false negatives, but it results in false positive errors for common usage patterns. Casual users of type checkers will likely find this to be onerous, but it would completely close a current “hole” in type checking. If we were to adopt this principle, then all of the above examples would result in type checking errors.
I can make good arguments for either of these principles. I’m interested in hearing what others think.
We could also devise a more complex and nuanced set of rules that attempts to balance false positives and false negatives. I’m struggling to come up with a good principle that achieves this goal. I’m also not sure if this would satisfy anyone, but sometimes that’s the sign of a good compromise.
Normally, we can sidestep debates about false positives vs false negatives by giving users control over the “strictness” level. However, in this case I don’t think we want this to be under user control. This would make life difficult for library authors who need to make assumptions about how their overloaded functions will be interpreted by static type checkers. In other words, I don’t think we want call evaluation behaviors for overloads to depend on user configuration.