What are the subtyping rules for tuple[T, ...]?

erictraut · November 29, 2023, 4:36am

PEP 484 introduced a way to express “arbitrary-length homogeneous tuples”, but it didn’t specify the subtyping rules for these types.

Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...].

There are two logical ways to treat such tuples from a type compatibility standpoint:

As a gradual type. With this interpretation, tuple[T, ...] is compatible with a tuple of any length that contains homogeneous elements of type T, and the reverse is true as well. With this interpretation, tuple[T] is compatible with tuple[T, ...] and the converse is true as well.
As a union. With this interpretation, tuple[T, ...] is shorthand for tuple[()] | tuple[T] | tuple[T, T] | .... With this interpretation, tuple[T] is compatible with (and a subtype of) tuple[T, ...], but the converse is not true.

Mypy implements interpretation 2. Pyright previously used interpretation 1, but about a year ago I changed pyright to use interpretation 2 for compatibility with mypy (which was the reference implementation for PEP 484 and therefore presumably the authority on the topic).

Pyre and pytype appear to use interpretation 1.

Code sample in pyright playground
Code sample in mypy playground

def func1(p1: tuple[str, ...]):
    t1: tuple[()] = p1  # Type error under interpretation 2
    t2: tuple[str] = p1  # Type error under interpretation 2
    t3: tuple[str, ...] = p1  # OK

    p1 = ()  # OK
    p1 = ("",)  # OK
    p1 = ("", "")  # OK

My personal preference is interpretation 2, but I’d like to hear what others think.

Regardless, I’d like to achieve consensus and formalize the rule in the typing spec.

Jelle · November 29, 2023, 5:31am

I agree that it would be good to clarify the desired behavior here. I also weakly prefer interpretation 2, because it is safer. Under interpretation 1, a program that never uses Any could still be unsafe.

Would interpretation 2 also mean that tuple[Any, ...] (and therefore plain tuple) is not compatible with tuple[int, str]? I think that’s the consistent answer, but it may be surprising.

Do you have a sense of how commonly this comes up in real code? As a minor data point, I implemented the change in pyanalyze, which was using interpretation 1, and ran the patched pyanalyze on our internal codebase of several million lines. It found no new type errors.

erictraut · November 29, 2023, 5:53am

I hadn’t thought about tuple[Any, ...] being treated specially. It appears that mypy does treat this as a special case. I have mixed feelings about this. I’d like to hear what others think about this.

Pyright currently does not treat tuple[Any, ...] differently, so it generates an error in these cases, whereas mypy does not.

def func(x: tuple[Any, ...]):
    v1: tuple[()] = x
    v2: tuple[Any] = x
    v3: tuple[int] = x

Do you have a sense of how commonly this comes up in real code?

When I made the change in pyright about a year ago, I needed to make a handful of changes to my team’s half-million-line code base, but the change was easy and straightforward. Since then, I don’t remember receiving any bug reports or complaints from pyright users about this change.

erictraut · November 29, 2023, 5:58am

Interestingly, mypy doesn’t appear to special-case other variants that use Any such as tuple[Any, *tuple[Any, ...]].

def func(x: tuple[Any, *tuple[Any, ...]]):
    v2: tuple[Any] = x # Mypy emits error here
    v3: tuple[int] = x # Mypy emits error here

I suspect this wasn’t considered when unpacked tuples were added in PEP 646.

This is the problem with carving out special cases in the typing standard. They tend to cause problems with composability when new features are added.

mikeshardmind · November 29, 2023, 8:14am

I have a slight preference towards option 1, as a gradual type, but this preference is rooted in that people can type this as Sequence[T] instead to avoid all of the issues with it being a gradual type. The lack of special casing with this option allows any case where static analysis knows the length of the tuple in question for reasons external to the type system to decide to do more with that information. This can be powerful when it comes to analyzing zip. It also works better with most low-level uses of SQL libraries (As well as various other things like struct.unpack), and the rows can then be transparently passed into something like msgspec or attrs classes which do real runtime validation, without first needing to narrow the union or special casing that unpacking to a parameter list would be fine, because it will error if wrong.

I don’t particularly like how this interacts with the overall goal I have of making the correct typing more obvious, but I think this is something that could be a configurable informational message in type checkers and something that could be easily fixable automatically (like with ruff’s --fix) if the project wants to opt into never using it as a gradual type.

Option 2 opens up some interesting questions that I think complicate things in other ways (Not necessarily bad ways), for instance, as a union, should type checkers narrow the union when length is explicitly checked against a literal? Static analysis can already detect issues with unpacking when the length is expressed.

I’m not sure I follow here, Gradual types are not inherently unsafe, and treating this as the union is only in any way safer if the length is narrowed or the user has to go out of their way to say “yes, I know this could be an arbitrary length thing, but isn’t”, there are examples of struct.unpack that show how this creates more hassle for something known to be safe (struct.unpack is quite strict, but not statically known within the type system since it uses a stringly-typed API for the type sequence to unpack) Part of my preference here is that I don’t think things that the type system can’t express should create more work for non-typing experts.

Jelle · November 29, 2023, 2:45pm

Consider this program:

def f(arg: tuple[int, int]) -> int:
    return arg[0] + arg[1]

def g(arg: tuple[int, ...]) -> int:
    return f(arg)

g((1,))

This throws an error at runtime, but under interpretation 1, type checkers won’t catch the issue.

I agree that interpretation 2 becomes more usable if type checkers also support type narrowing with assert len(arg) == 2.

Tinche · November 29, 2023, 4:54pm

+1 for option 2, especially since due to a lack of a frozen list folks routinely use tuple[T, ...] for hashable/frozen sequences (and there can be reasons for not reaching for Sequence[T], like library support). So safer rules there would make a difference.

Speaking from personal experience, homogenous and heterogenous tuples are used in very different contexts. I almost never know the exact length of a homogenous tuple and I almost exclusively only iterate over them.

hauntsaninja · November 29, 2023, 9:52pm

I prefer interpretation 2, on the principle that we should favour more sound things when not costly.

In my experience, mypy’s stricter interpretation has not resulted in many user reports, and mypy didn’t even do narrowing based on length checks up until extremely recently.

I do think the interactions with PEP 646 need some thinking through, e.g. mypy currently complains about the following because x could be empty tuple:

def takes_at_least_one(x: tuple[int, *tuple[int, ...]]): ...

def takes_var(x: tuple[int, ...]):
    takes_at_least_one(x)

Daverball · November 29, 2023, 10:01pm

I would also prefer option 2, even though I can see some merit in option 1, it’s less intuitive and would probably cause some confusion and would cause some easily overlooked mistakes.

I am not sure how I feel about consistency with tuple[Any, ...], I think I prefer the way pyright handles this currently, i.e. no special casing, if you want to cast to a fixed length tuple you should be forced to perform the narrowing via the length check.

mikeshardmind · November 30, 2023, 12:27am

I don’t view this as an issue. Errors can be raised anywhere, we don’t (and never should) have checked exceptions, this isn’t a type safety issue.

The place where 1 is significantly better:

row: tuple[str, str] = sql_conn.execute(
    "SELECT name, phone from users WHERE user_id = ?", 
    (user_id,),
).fetch_one()

If fetch_one returns tuple[Any, ...]

By being a gradual type, the annotation here just works for something the type system can’t know about.

Yes, this leaves more correctness up to the user, but it also doesn’t create extra work and boilerplate in places the type system can’t handle.

Similarly,

result: tuple[int, int, int, int] =  struct.unpack("!4B", buffer)

There are plenty of cases in real world code where not claiming to know more than we actually do and deffering to the programmer’s annotations is preferable.

erictraut · November 30, 2023, 12:47am

Here’s another argument in favor of option 2. I’d prefer to limit the gradual type forms in the type system. Each of them requires special casing in both the typing spec and in type checker implementations. Gradual types are also more difficult for users of the type system to understand, and they don’t compose as well with other features because they break the normal rules of set theory. (Think about the lengthy, unresolved discussion we had about Any and intersections several months ago.)

Currently, there are two gradual type primitives clearly defined in the type system: Any and .... (The latter is the equivalent of Any when used with a Callable or a ParamSpec.) Adopting option 1 would add a third gradual type to the type system. I’d prefer not to do that unless there’s a really compelling reason to do so.

mikeshardmind · November 30, 2023, 12:52am

I would say that the lack of extra boilerplate in the above examples that definitely come up in real-world code for users everywhere is a gigantic reason to do so. Python is gradually typed. We shouldn’t be shying away from getting the benefits of the gradual typing for places the type system doesn’t cover, and let other things which do understand those modules analyze if the types provided there are correct.

The equivalent to those examples today in mypy or Pyright has additional runtime costs, and both of those examples are in places where the types are strongly enforced by something else already. Both of those often also come up in performance-sensitive paths in code (db access, deserialization).

I would much rather us need to take the time to detail all of the behaviors of it here, with a group of experts, than make more work and boilerplate for non-typing expert users because we avoided that.

Daverball · November 30, 2023, 7:23am

While I agree that there are some situations, where this ability would be nice to have, there are far more situations where it would be a liability. While Any is still less type safe than the gradual typing version of tuple[Any, ...], I am not convinced that it comes up often enough to be worth the extra complication.

Old APIs can return Any in cases like these where it would be annoying to deal with. New APIs can use a generic component in between, i.e. something like a Query[RowT] which is returned by conn.execute and then you can rely on the programmer to tell it what the row type looks like and when it is unspecified it just stays as Any.

That being said I have found myself sometimes wishing for a permissive union type i.e. AnyOf for things like JSON which would also behave like the first option here when applied to tuple[T, ...], so maybe additional more complex gradual types should be left for a future PEP. I don’t think they should be the default for any of the builtin types.

mikeshardmind · November 30, 2023, 7:54am

The extra complication as you put it has to either exist in the type system or in user code. Placing it in the type system is the only thing I can see as the correct call until the type system has a way to accurately express this.

This harkens back to the same ideas I brought up in A more useful and less divisive future for typing?. I see no reason that the type system should enforce something more than it actually has the means to express at the cost of ergonomics. The line should be “check what can be expressed, make it ergonomic to defer to the programmer when the type system cannot express something, work on making the type system able to express more things ergonomically”

The problem with asking this as “what should the subtyping rules be”, is the question itself somewhat limits itself to an audience of typing experts and doesn’t actually consider the ergonomics of use for non-typing experts.

And since this is proposing formalizing it to one or the other, this should probably explicitly require feedback from each of the type checkers on their rationale for the current state.

Why should they return less type information than can be correctly expressed? The decision to do so makes this less safe without an actual reason to do so. In the case of treating a tuple as a gradual type, it could be narrowed in multiple ways. tuple[T, ...] for a non-Any T, can become tuple[T] with just a length check against a literal.

That really is just an arbitrary decision of where to place the gradualness and not place it where it actually exists though to say we should have more, but not the builtin types, even when appropriate.

Additionally, the PEP itself actually does indicate tuples should be treated as a Gradual type in multiple ways, if not directly.

This rule also applies to Tuple , in annotation context it is equivalent to Tuple[Any, ...] and, in turn, to tuple . As well, a bare Callable in an annotation is equivalent to Callable[..., Any] and, in turn, to collections.abc.Callable

def foo(*args: str, **kwds: int): ...
…
In the body of function foo , the type of variable args is deduced as Tuple[str, ...]

and in acknowledging that tuple is a special construct

Type hints may be built-in classes (including those defined in standard library or third-party extension modules), abstract base classes, types available in the types module, and user-defined classes (including those defined in the standard library or third-party modules).
[…]
In addition to the above, the following special constructs defined below may be used: None , Any , Union , Tuple , Callable

All indications in the PEP itself is that the authors were aware that tuple needed special behavior and (possibly) the underlying theory also suggesting this.

Daverball · November 30, 2023, 8:25am

I certainly understand where the desire comes from, especially in the case where you say tuple without subscripting it and would expect that you can pass this anywhere that accepts a tuple, just like you would be able to if you did the same thing with a list. So in that sense it would certainly be a win for ergonomics, but it would also be a loss in expressiveness and intuitiveness.

I just don’t think it comes up often enough, that you would always want it to behave in this way, especially when you are explicit and write tuple[Any, ...] rather than tuple. Maybe this could be expressed as tuple[...] vs. tuple[Any, ...] to match how it works in Callable. tuple could then be equivalent to tuple[...] instead of tuple[Any, ...].

I think with the introduction of TypeVarTuple you can also express the gradually typed version as tuple[*Ts] and leave it unbound in the function that returns it. The type checker should then complain that it can’t determine the type of Ts and you can go ahead and set it to what it actually is. Passing this directly into a function that expects a certain shape should then also work without having to ignore a type error.

That being said I can live with the current status quo in mypy, i.e. tuple/tuple[Any, ...] is special cased and gradual, any other tuple is not.

mikeshardmind · November 30, 2023, 9:34am

Would it actually though? Going back to the original examples I gave, lets look at this with a non-expert lens for both the user code and the library code.

Here it is with .fetch_one() -> tuple[Any, ...] as a gradual type

row: tuple[str, str] = sql_conn.execute(
    "SELECT name, phone from users WHERE user_id = ?", 
    (user_id,),
).fetch_one()

Here’s the user code without it being a gradual type:

row = sql_conn.execute(
    "SELECT name, phone from users WHERE user_id = ?", 
    (user_id,),
).fetch_one()
row = typing.cast(row, tuple[str, str])

On the SQL library side, this would be:

def fetch_one(self: SomeCursorClass) -> tuple[Any, ...]: ...

vs

preserving gradual behavior:

def fetch_one(self: SomeCursorClass) -> Any: ...

Is the use of Any here obvious to a non-expert that this provides better ergonomics than tuple[Any, ...] without it being gradual?

or, forcing the user to pass in more info

def fetch_one(self: SomeCursorClass, *types: *Ts) -> tuple[*Ts]: ...

This changes the existing API to satisfy typing, and essentially adds boilerplate for something already strictly typed by a real source of truth (a database), do we really want all APIs to need churn to satisfy typing arbitrarily? This was a major complaint against typing.

I don’t think the real-world cases for this, which do exist are more intuitive or more expressive. Placing the type info in the annotation and that being enough is perfectly expressive.

Daverball · November 30, 2023, 9:52am

To me it is more intuitive that tuple[Any, ...] is incompatible with tuple[Any] and I think for any typing novice this would be the case as well, it’s only once you consider ergonomics and encounter examples like the ones you already mentioned that you think about making it behave the other way, because it is annoying having to ignore type errors.

I think in the case of tuple without specifying the type arguments I would tend to agree with you that the gradual case makes slightly more sense, because it essentially means “I haven’t thought about it yet” or “I don’t/can’t know”, but tuple[Any, ...] to me means it has an indeterminate number of elements and unless you check the size you’re not allowed to assign to a fixed size tuple.

I think whichever interpretation we choose, we also probably want to be able to express the other interpretation, which one we want depends on the use-case.

I can’t really get behind Sequence[Any] as a replacement for option 2, since it is not accepted by tuple[Any, ...] without an isinstance(x, tuple). Sometimes you care that it is a tuple and not just a generic Sequence even in the case of an unknown length.

If you change your last example to:

def fetch_one(self: SomeCursorClass) -> tuple[*Ts]: ...

You would be able to do your desired thing of declaring the type of the tuple on LHS. The only loss in ergonomics in that case would be that you would get an error if you didn’t specify it, which you would have to ignore (if you didn’t want to specify).

mikeshardmind · November 30, 2023, 9:52am

To support my prior claim of the intentional use of Any for ergonomics not being intuitive, we have people who think Any is bad practice Python 4 should have mandatory static typing, and mypy’s settings lean towards teaching people that by specifically having settings to disallow Any in specific contexts.

pf_moore · November 30, 2023, 9:54am

Michael H:

Here’s the user code without it being a gradual type:

row = sql_conn.execute(
    "SELECT name, phone from users WHERE user_id = ?", 
    (user_id,),
).fetch_one()
row = typing.cast(row, tuple[str, str])

As someone who’s used database interfaces a lot, and who uses typing sparingly (I like the benefits of type annotations, but I don’t want to have to think too hard about complex typing questions) the need for a cast here is a problem for me.

Generally, I refuse to use casts, as they have a runtime performance cost^[1] to provide information that’s only used statically, and they feel like I’m patching over something that I’m doing wrong^[2] (and hence they reduce my confidence in the correctness of my code). They also hide the type information in a less obvious place (making it harder for a casual type user like me to spot that the variable is typed).

So for me, the “gradual type” usage reads far more naturally, and expresses what I want to say in a way that follows the normal pattern I expect for annotated variables. In particular, I find it jarring that adding the tuple[str, str] annotation to the variable’s declaration would be rejected as incorrect.

So I’m a strong +1 for the “gradual typing” behaviour, based on this example.

The argument that this involves special casing tuple does bother me. I find special-case rules in typing hard to understand (discussions about “stuff that works for dataclasses because they are special cased” is another case that always ends up confusing me). But I’m not sure what an example would be of a different type that wasn’t special cased, so I don’t have a clear intuition of how I’d feel about such a case.

albeit a small one ↩︎
maybe that feeling comes from experience with C casts? ↩︎

mikeshardmind · November 30, 2023, 9:56am

David Salvisberg:

If you change your last example to:
def fetch_one(self: SomeCursorClass) -> tuple[*Ts]: ...
You would be able to do your desired thing of declaring the type of the tuple on LHS. The only loss in ergonomics in that case would be that you would get an error if you didn’t specify it, which you would have to ignore.

This doesn’t actually work, this is considered an invalid use of a type var tuple see playground link and even if this restriction were loosened, this would not be an obvious solution at all.