Standardizing argument unpacking behaviors for calls

When a static type checker evaluates a call, it needs to determine whether the supplied arguments are compatible with the target’s signature. Type checkers do this as a two-step process. First, they map arguments to parameters. Second, they evaluate whether the type of each argument is assignable to the type of its corresponding parameter.

The argument-to-parameter mapping is complicated by the fact that an argument can map to multiple parameters (e.g. in the case of unpacking), and parameters can map to multiple arguments (e.g. in the case of variadic parameters). Furthermore, these mappings sometimes cannot be statically determined in an unambiguous manner. This occurs when an unpack operator (* or **) is used in an argument expression and applied to a type that unpacks to an indeterminate number of objects (e.g. an Iterable[T] or Mapping[str, T]).

Consider the following example. Which of these calls should be permitted, and which should be considered errors?

def func(x) -> None: ...

def test(a: list):
    func(*a)
    func(1, *a)
    func(*[1, *a])
    func(*(1, *a))
    func(*a, 1)
    func(*[*a, 1])
    func(*(*a, 1))
    func(*a)
    func(x=1, *a)
    func(*a, x=1)

The typing spec is silent about how type checkers should handle these ambiguous cases. Not surprisingly, we see divergent behaviors between type checkers (pyright and mypy).

Why is it important for us to standardize this behavior? We have been attempting to standardize the behaviors for overload call evaluation here, but this necessarily depends on the behaviors for simple (non-overloaded) call evaluation. Without consistency in the non-overloaded case, we will not achieve consistency in the overloaded case.

The current behaviors observed in pyright and mypy not only diverge between type checkers, but mypy’s behavior is arguably inconsistent with itself. Pyright’s behavior is likewise inconsistent with itself because it attempts to (but does not entirely) match mypy’s behaviors. In other words, I don’t think we want to codify the current behavior of either of these type checkers.

Here’s a more complete code sample that demonstrates some of the important differences.

def positional[T1, T2, T3](x: T1, y: T2, z: T3) -> tuple[T1, T2, T3]: ...

def test_positional(p1: list[int], p2: list[str]):
    x1 = positional(*p1)  # OK
    reveal_type(x1)  # tuple[int, int, int]

    x2 = positional(*p1, *p2)  # OK
    reveal_type(x2)  # tuple[int, int, int]

    x3 = positional("", *p1, *p2)  # OK
    reveal_type(x3)  # tuple[str, int, int]

    x4 = positional(*p1, *p2, z="")  # Mypy: Error, Pyright: OK
    reveal_type(x4)  # tuple[int, int, str]

    x5 = positional(*p1, "")  # Error
    reveal_type(x5)

    x6 = positional(*p1, "", *p2)  # Error
    reveal_type(x6)

def keyword[T1, T2, T3](*, x: T1, y: T2, z: T3) -> tuple[T1, T2, T3]: ...

def test_keyword(p1: dict[str, int], p2: dict[str, str]):
    x1 = keyword(**p1)  # OK
    reveal_type(x1)  # tuple[int, int, int]

    x2 = keyword(**p1, **p2)  # OK
    reveal_type(x2)  # Mypy: tuple[object, object, object], Pyright: tuple[str, str, str]

    x3 = keyword(x=1.0, **p1, **p2)  # OK
    reveal_type(x3)  # Mypy: tuple[float, object, object], Pyright: tuple[float, str, str]

    x4 = keyword(**p1, **p2, x=1.0)  # OK
    reveal_type(x4)  # Mypy: tuple[float, object, object], Pyright: tuple[float, str, str]

In situations like this, I find it’s useful to first agree on a principle, then define a set of behaviors that are consistent with that principle. There are a couple of straightforward principles that we could adopt here:

  1. Allow any set of arguments that could potentially succeed at runtime. In other words, if there is at least one potential argument-to-parameter mapping that will succeed at runtime, a type checker will assume that mapping without generating an error. This minimizes false positives, permitting common use cases to type check without error, but it’s at the expense of some false negatives. This principle is likely preferred by casual users of type checkers (i.e. the vast majority of Python developers) but will probably not satisfy those who prefer strict type checking (a small but vocal minority). If we were to adopt this principle, then all of the above examples would type check without error.
  2. Disallow any set of arguments that could potentially fail at runtime during the argument-to-parameter matching process. This eliminates all false negatives, but it results in false positive errors for common usage patterns. Casual users of type checkers will likely find this to be onerous, but it would completely close a current “hole” in type checking. If we were to adopt this principle, then all of the above examples would result in type checking errors.

I can make good arguments for either of these principles. I’m interested in hearing what others think.

We could also devise a more complex and nuanced set of rules that attempts to balance false positives and false negatives. I’m struggling to come up with a good principle that achieves this goal. I’m also not sure if this would satisfy anyone, but sometimes that’s the sign of a good compromise. :slight_smile:

Normally, we can sidestep debates about false positives vs false negatives by giving users control over the “strictness” level. However, in this case I don’t think we want this to be under user control. This would make life difficult for library authors who need to make assumptions about how their overloaded functions will be interpreted by static type checkers. In other words, I don’t think we want call evaluation behaviors for overloads to depend on user configuration.

3 Likes

One potential more complex rule: “allow any set of arguments that could potentially succeed at runtime EXCEPT if the only potential argument-to-parameter mappings involve unpacking empty arguments”

This is a defensible rule, since the false negatives are unnatural. At least at one point mypy intentionally issued some errors based on this principle.

I think it would be hard to stomach the cost of eliminating false negatives for everyone.

For folks who care about false negatives, I think a type checker could still maybe have a set of stricter diagnostics that avoids making library code ambiguous? E.g. type checker always uses the same logic to determine overloads and solve type variables, but additionally emits a call site error any time you unpack something that a) isn’t a fixed length tuple, and b) does not match a variadic arg in every overload

2 Likes

Emitting an error for unpacking non-fixed length tuples would be very noisy and filled with false positives that would encourage people to change things that currently give Some accurate info to Any instead.

consider sql libraries (not full ORMs). returned rows are typically tuples of unknown static length, and are often typed as tuple[Any, ...] or tuple[ValueTypes, ...] rather than as just Any

It should not be an error to unpack one of these rows into a function call, that’s part of these being gradual types in their length.

1 Like

I think what I said here What are the subtyping rules for tuple[T, ...]? - #51 by mikeshardmind when we were previously looking into the subtyping rules of gradual length tuples covers how I feel about unpacking unknown-length tuples, and I don’t think anything I’ve seen since changes it.

As a litmus test for any change that should handle the simpler non-overload case, I would present this example that heavily relies on iterator semantics and struct pack/unpack, and is representative of efficient binary protocol handling in pure python.

import struct
from itertools import chain
from collections.abc import Iterator
from typing import NamedTuple

# ---------------------------------------------------
# Common:
# up to 112 byte payload
# 1-byte: version | up to 111 bytes: versioned data
# ---------------------------------------------------
# Version 1:
# 1-byte: version = x01
# 6 length prefixed arrays of ids
#   legnth prefix as 1 byte, elements 8 bytes each
#   representing a 64bit int
# ---------------------------------------------------

class NoUserFeedback(Exception):
    pass

class V1TooManyIDs(Exception):
    pass

class V1ToggleWithAddRemove(Exception):
    pass

class V1MultipleToggle(Exception):
    pass
        
class V1NonActionableRule(Exception):
    pass

class V1AddRemoveOverlap(Exception):
    pass



class DataV1(NamedTuple):
    add: frozenset[int]
    remove: frozenset[int]
    toggle: frozenset[int]
    require_any: frozenset[int]
    require_all: frozenset[int]
    require_none: frozenset[int]


def validate_datav1(data: DataV1) -> None:
    """
    Checks that
    - there are 13 or fewer ids encoded
    - that toggle is not provided with either add or remove
    - that toggle contains no more than 1 id
    - that at least one of toggle, add, or remove is provided
    - that ids provided in add are not also provided in removed.
    """

    if sum(map(len, data)) > 13:
        raise V1TooManyIDs

    if tog := data.toggle:
        if data.add or data.remove:
            raise V1ToggleWithAddRemove
        if len(tog) > 1:
            raise V1MultipleToggle
    else:
        if not (data.add or data.remove):
            raise V1NonActionableRule
        if data.add & data.remove:
            raise V1AddRemoveOverlap



def pack_rules(data: DataV1, /) -> bytes:
    validate_datav1(data)
    struct_fmt = "!bb%dQb%dQb%dQb%dQb%dQb%dQ" % tuple(map(len, data))
    to_pack = chain.from_iterable((len(lst), *lst) for lst in data)
    return struct.pack(struct_fmt, 1, *to_pack)  # needs changing if new version


def _v1_struct_unpacker(raw: bytes, /) -> Iterator[frozenset[int]]:
    """
    Calling contract is that you have checked the version in advance
    """
    offset: int = 1
    for _ in range(6):
        (len_,) = struct.unpack_from("!b", raw, offset)
        yield frozenset(struct.unpack_from("!%dQ" % len_, raw, offset + 1))
        offset += 8 * len_ + 1


def _get_data_version(b: bytes, /) -> int:
    (r,) = struct.unpack_from("!b", b, 0)
    assert isinstance(r, int)
    return r


def unpack_rules(raw: bytes, /) -> DataV1:
    try:
        version = _get_data_version(raw)
    except struct.error:
        raise NoUserFeedback from None

    if version != 1:
        raise NoUserFeedback

    try:
        data = DataV1(*_v1_struct_unpacker(raw))
        validate_datav1(data)
    except (NoUserFeedback, struct.error):
        raise NoUserFeedback from None
    except Exception as exc:
        raise NoUserFeedback from None

    return data

Code sample currently passes in both pyright playground mypy-play (strict)

I consider myself part of this minority, but I think unknown length tuples shouldn’t error for things related to their length. They are gradual in that component of their type information and highly useful. Most things that use them the developer will use them either as if they were Sequence[T] rather than tuple[T, …] or they actually have another means of knowing the type is fine, such as the case with SQL rows being enforced by a database, or the example I’ve presented above with struct and binary protocols.

1 Like

I’d be in favor of adopting principle (1) (“Allow any set of arguments that could potentially succeed at runtime.”). I don’t see how a type checker could error on something as common as:

def func(x) -> None: ...
def test(a: list):
    func(*a)

without being very noisy with false positives.

Unpacking a potentially empty argument is a totally reasonable thing to do IMO, so I’m iffy on introducing a rule about that.

2 Likes

I think the easiest straightforward principle is when the length isn’t known statically, because it’s unpacking something with no length guarantee such as list, sequence, or Iterator, or a tuple of unknown length, treat the length as gradual. Only emit errors when it’s impossible for any length (0 to infinite) to be correct.

I would expect the following to error statically:

def foo(x: int, y: str, z: bytes):
    ...

a: list[str] = ...

foo(*a)

because no length at all results in this working.

I would also expect

def foo(x: int, y: str, z: bytes):
    ...

a: list[str | int | bytes] = ...

foo(*a)

to error because the union isn’t assignable to each, so again, all lengths fail.

I would expect this to work:

a: list[str] = ...
b: tuple[int, str, str, int] = (1, *a, 2,)

I would expect the general rules for unpacking as part of a call expression to follow from these.

Great, sounds like folks are on board with prioritising minimising false positives.

It’s a reasonable thing to do; what is less reasonable is if all potential argument-to-parameter mappings require a specific unpacked argument to be empty. In other words:

def func1(): ...
func1(*a)   # error

def func2(x): ...
func2(*a)
func2(1, *a)  # error
func2(*a, 1)  # error
func2(*a, *b)

def func3(x=...): ...
func3(*a)

Whether the complexity is worth it I’m less sure, but I think this rule would have few to zero false positives and would provide useful true positives for certain kinds of refactoring errors. Mypy’s current behaviour is not consistent, but IIRC at least at one point it chose to issue some errors based on this rule.

1 Like

I’m skeptical of a special case rule to disallow empty unpacking. It would introduce a place for false positives both with gradual types and with attempting to remove gradual typing in favor of more concrete types, and I don’t think a human author is likely to write this kind of code unintentionally.

I expect it to interact poorly with certain kinds of callback patterns, unions of functions, function composition, and probably more places where forwarding on built args and kwargs is expected, and in some cases, might be statically known as empty. I expect more places to error with time as this would be a place where more information would lead to a false positive (ie. annotating a union of possible tuples, instead of the single gradual tuple even when only accepting 0, 1, or 2-tuples, and using them uniformly via unpack), which would only create a situation where adding information has a worse outcome for users.

1 Like

Balancing optimistic (principle 1) and conservative (principle 2) behaviors in a type checker for Python is certainly important. I think if we just choose between these principles, then optimistic behavior will often win out.

In this case, I think another criteria is worth considering: simplicity of the specification and implementation, which I believe leads to predicability for users.

For example, for positional args, I think a simple specification would treat arg-param matching as a two-fingered walk, where an unpacked arg of indeterminate length consume all params. This implies rejecting func(*a, 1), as Mypy does. To achieve the Pyright behavior, I believe we would need to consume params until we reach a type error, meaning implementations need to employ backtracking for calls.

It is stated that there’s a problem with letting the user choose the strictness, but I don’t understand it.
Could someone give an example that shows the problem with letting the user choose the strictness?

The problem is that how arguments are matched may affect overload selection. If we let the user choose between different argument matching behavior for unpacked args, then a different overload may be picked in some overload definitions depending on the strictness chosen.

1 Like

That’s the understanding that I got from what was already said, but I’m failing to come up with any examples where it would be a problem.

Out of curiosity, would it be possible to have mypy or upright try out the suggestion with and without the “unpacking empty” on an abundance of popular open source repos (and see how many false positives it generates)?

(in relation to choosing the overload strictness when running the typecheck)

Because overloads are defined by the projects publishing the APIs, and those projects need to be able to type check their public API without having to make assumptions about static checker config settings in projects consuming the API.

It isn’t a setting like mypy’s strict mode, which only affects code being directly type checked within the current project - it’s one which would affect how the public APIs of other projects would be processed.

3 Likes

I think that the proposed overload spec already contains a special rule that avoids any problem with inconsistent overload resolution if type checkers offer optional strict unpacking. The special rule is this: if a call would match two or more overloads, and the call unpacks a variable-length sequence, and in one or more of the matched overloads the unpacked variadic matches a variadic parameter, all other overloads should be eliminated from consideration.

Doesn’t this rule mean that overloads should resolve the same, whether strict or lax handling of variable-length sequence unpackings is used? With lax handling, multiple overloads might match, but the ones that would be preferred by this special rule would be the same ones that would be the only matching ones if strict handling were used.

It seems to me that we have an odd inconsistency between the compromise behavior we settled on for assignability of variadic tuple types, and the behavior of those same types in unpacking (pyright and mypy fully agree on this example):

from typing import Any

def test(list_of_int: list[int], variadic_tuple_of_int: tuple[int, ...], variadic_tuple_of_any: tuple[Any, ...]):
    # Unpacking to tuple[int, int]
    t: tuple[int, int] = tuple(list_of_int)      # ERROR
    u: tuple[int, int] = variadic_tuple_of_int   # ERROR
    v: tuple[int, int] = variadic_tuple_of_any

    # Unpacking to two variables
    a, b = list_of_int
    c, d = variadic_tuple_of_int
    e, f = variadic_tuple_of_any

    # Unpacking to a function that takes two integers
    takes_two_ints(*list_of_int)
    takes_two_ints(*variadic_tuple_of_int)
    takes_two_ints(*variadic_tuple_of_any)

    # avoid unused-variable errors in pyright
    return (t, u, v, a, b, c, d, e, f)


def takes_two_ints(x: int, y: int):
    pass

I find it hard to come up with a rationale for why lines 5 and 6 should error in this function, if lines 10/11 and 15/16 do not error.

If we believe the lax handling is necessary in the latter two cases, then it seems to me we should remove the awkward distinction we currently make between tuple[Any, ...] and tuple[int, ...], and just specify variadic tuples as always gradual in their length.

If we think it is necessary that assigning tuple[int, ...] to tuple[int, int] should be an error, then I find it hard to see why it shouldn’t also be an error to unpack it to a fixed number of targets in a call or an assignment.

One of my goals in starting this thread was to explore whether we could eliminate the special rule from the overload proposal. If we were to adopt strict handling of variable-length sequence unpackings, we’d get the desired overload behavior without any special rule. It sounds like the consensus is that lax handling of unpacking is preferred, which requires us to maintain the special rule for overloads.

Yes, this inconsistency bothers me too.

Maybe as a next step, I should do a quick-and-dirty experiment to implement strict unpacking rules and see what impact that has on existing typed code bases.

2 Likes

Neither mypy nor pyright have an issue with unpacking Sequence[T] without knowing the length either. I think the only option that resolves inconsistency here is treat variadic tuples as always gradual with respect to length.

The other option here creates a situation where either tuple[T, ...] shouldn’t be assignable to Sequence[T], or that we have to disallow unpacking sequences too.

4 Likes