What are the subtyping rules for tuple[T, ...]?

mikeshardmind · November 30, 2023, 5:08pm

The length of collections isn’t part of the type system for any other type other than tuples. (typed dicts kindof, but there’s some issues with considering that since you can subclass from them) It feels as if it is an odd case because it is one in both the actual behavior and how we type it. Some newer languages have even decided explicitly not to have tuples due to the issues with them

I said above that I think there are potential alternatives that could work here that require type system additions, one such addition could be an explicit keyword argument in tuple’s slice [gradual=True] (defaulting to true), but this requires extending the type system in a direction that has already been denied before, so the argument for keyword arguments in generics would need to be made again in a way that is persuasive to the SC and shows the clear ergonomic benefits.

Another would be keeping tuple (And tuple[Any, ...]) as gradual types, and declaring tuple[*Ts: ConcreteType] without a corresponding use of the TypeVarTuple to denote non-gradual use. Might end up overloading TypeVarTuple behavior a bit, but I think this would be the most natural solution that provides something more satisfying to every party.

Yeah, agreed here. It would be nice if some of the imports needed for typing cost less to startup time as well. I’m trying to find what I think the right balance is between satisfying what I want to type strictly, and working with others who either may not be familiar, or may not have the same needs of strictness here. You said you largely use tuple[T, ...] as an immutable version of list, so Sequence[T] sounds about right, but it would be kindof nice to have a distinct way of spelling that intent in the builtins

sirosen · December 3, 2023, 8:21pm

I agree with the subthread and points above that this all seems hard for maintainers of libraries to learn and figure out, but I don’t think that argues for a less strict type system.

I’ve run into cases as a maintainer in which I have to weigh annotating a function as returning Any vs some stricter type. The type could be Any in fact but usually it’s something narrower.
If I annotate “accurately”, the type system gets less information in the common case. If I annotate “pragmatically” then callers might need to cast or ignore.

That is, strict rules for tuple[T, ...] are not introducing that problem newly, nor do I think they make it significantly more common or worse.

I would love to (as a curious person) look at how SQLalchemy has implemented their new interfaces. It all seems extremely sophisticated, and I bet there are good lessons to take away from that implementation.

dmoisset · December 4, 2023, 10:43pm

I think some of this discussion is going in circles because there are two slightly different things that are being treated the same, mostly because our notation (in this case the one provided by the type system) is not expressive enough to describe the problem, so let me give it a try.

I’ll have to introduce some new notation on the way. This IS NOT A PROPOSAL to add that notation as python syntax, it’s just an aid for the discussion.

(Long Contextual Intro starts)

Python allows to be quite specific in types, (e.g. list[int]), gradual (e.g Any) and some types that have partial information and some graduality (e.g. dict[str, Any]).

We normally use Any to refer to stuff where we developers can figure out the type, but it’s too tough to describe that to the type-checker. I don’t see it as a “sin” or something to avoid, it’s just saying “we’re using Python beyond what the type system allows us to describe”, which can happen because Python is very flexible.

Some people are worried about missing some errors on the approach, and while that happens it’s not the key issue. After all, every typechecker happily accepts:

x: list[int] = [1,2,3]
a, b = x # ValueError on runtime, type checkers accept this

So if we allow this, and no one has complained, what make tuples different? Currently we have that tuple[int, ...] doesn’t behave that differently from a list:

x: tuple[int, ...] = (1,2,3)
a, b = x # ValueError on runtime, type checkers accept this. Same as before but list -> tuple
a, b, c = x # Perfectly fine on runtime, type checkers accept this (we like when this happens)

What is completely unique about tuples vs other containers is that we can specify a length, and when we have those, the issue on this thread arises. If the type checker allows a = b where a has fixed length, and b has “arbitrary” length, then the code using a further down the road may be mistyped because type doesn’t match reality (a.k.a. run time types). Furthermore we expect those issues to happen only on gradual types, but this problem could easily happen with tuple[int, int] vs tuple[int, ...], where Any is not involved at all, and that is what makes this “graduality” unexpected.

(Long Contextual Intro ends)

My interpretation of the problem is that the current tuple type allows us to be gradual on its elements, but not on a very important (and unique) attribute of tuples: its length. If we had a (silly, not a real proposal) way to specify the type of the length, the problem would be crystal clear.

# This is some example of a notation allowing to specify type of tuple lengths:
t1: tuple[str, str, length:int] = "foo", "bar"
t2: tuple[str, str, bool, length:Literal[3]] = "foo", "bar", True
# Most of the current tuple with fixed elements declaration should have a meaning similar
# to t2 rather than t1

t3: tuple[str, ..., length: int] = "foo", "bar", "baz"
# The declaration of t3 is probably similar to the current interpretation of tuple[str, ...]

What does this give us? well, as I said in the Long Contextual Intro, We use Any mostly for “I know the type even if the type checker doesn’t”. But when we look at length, there are actually two very different scenarios:

(A) The length is really arbitrary, and the code should handle any possibility. Example: my_list: list[Any]; x = tuple(my_list)
(B) The length is not arbitrary, it’s known by the developer but not the type checker. Example: row = db.fetch_row("SELECT a, b FROM table")

If we can talk about the type of the tuple length, case A has type tuple[Any, ..., length: int]. But case B should have type tuple[Any, ..., length: Any]. That would make it clear that


my_list: list[Any] = generate_some_list()

t1: tuple[int, str, length: Literal[2]] 
t1 = tuple(my_list) # Wrong! Literal[2] can't accept int

t2: tuple[int, str, length: Literal[2]] 
t2 = db.fetch_row("SELECT a, b FROM table") # Fine! Literal[2] can accept Any

t3: tuple[object, ..., length: int] = t2 # Fine! int can accept Literal[2]

Note that the length component essentially works as just another covariant type argument. And this gives us the rules we want with no special magic.

So this ends my analysis, and hopefully this helps aiming the discussion better. I don’t have a solution
and I’m hoping the smart minds here can come up with something acceptable.

The key issue would be being able to describe what I called tuple[T, ..., length: Any], which is not available now. I wouldn’t ask developers on explicitly giving the length type arg, but I wouldn’t mind if it appeared on the spec (that allows to make the semantics clear no matter which scenario we support.). But it would make sense to have tuple to mean tuple[Any, ..., length: Any], because an unqualified generic usually means "every generic arg is Any, and that would include the length. That would distinguish that type from tuple[Any, ..., length: int] (which is today’s semantics for tuple[Any, ...] and most likely what we want for backwards compatibility). However that doesn’t give us a way to write tuple[Foo, ..., length: Any] which is probably useful.

I hope this opens some new directions in this discussion.

ajoino · December 4, 2023, 11:21pm

Very interesting read @dmoisset , thanks!

One possible way to implement the length argument behaviour you describe could be to add a typing.ConstrainedTuple[L: Literal[int], T, *Ts](tuple): ... (probably got the PEP 695 syntax and variadic arguments wrong, please correct me if so) which type checkers know to treat specially. Here L would be your length argument.

stroxler · December 5, 2023, 5:14pm

Thanks @dmoisset, that was really helpful.

For what it’s worth, Pyre’s current behavior treating the length of tuple[T, ...] as a gradual type is probably not where the Pyre team really wants to be on this, we (and our users) are generally pushing for better type safety.

We align with @erictraut and @Jelle on this even though our current implementation does not.

I think from our perspective using a cast would be the preferred option. I’m not sure I can relate to @pf_moore’s concern that that casts reduce confidence in types; to me casts are intended for exactly the scenario where we are assuring the type checker that something it can’t verify is okay.

I do think @Daverball’s idea of using tuple[...] could be helpful if we really need gradual length types. If most of the cases where we want tuple to be generic in the length also involve Any as the type, then this could work well. It’s also pretty consistent with the use of bare ... in Callable[..., T], in which case the bare ... means “gradual over both types and number of arguments (and names, but that’s irrelevant here)”.

On a separate note, @pf_moore mentions that runtime overhead is a concern with casts in performance-critical code. Depending how often this comes up (maybe more common than you’d think because some very dynamic libraries like sqlalchemy live near the bottom of the stack) we might be able to explore what it would take to add a zero-overhead cast to Python.

In principle I think we could erase the cast from compiled bytecode, which (I think?) is exactly what the C/C++ compilers would do.

pf_moore · December 5, 2023, 7:51pm

I think my dislike of casts (in this context) comes from the name - thinking of them as like C casts. In C, a cast is saying “treat this as an int, and ignore the risks”, and is typically used to force the compiler to let you do something that you technically shouldn’t. So avoiding casts as much as possible is very much the right thing to do.

In Python, though, a cast is more like telling the type checker “it’s alright, I know you can’t be sure this is an int, but believe me, I’ve checked and it is”. So you’re not asking to do something that’s not allowed, just telling the compiler something it couldn’t work out for itself. Maybe a better name would have been assume - as in assume(T, expression), but I guess it’s too late to change now.

But this is off-topic for this thread. If anyone wants to discuss the nuances of casts, I suggest we split it off into a separate thread. Otherwise, let’s just leave it at this.

guido · December 6, 2023, 11:13am

I think of it the other way around. C casts are often conversions (e.g. (int)3.14), while in Python a cast may well be a lie or a mistake. So I very much find casts in Python smelly.

erictraut · December 6, 2023, 8:06pm

Thanks everyone for your input. It sounds like most of you (including all of the representatives of the major type checkers) prefer option 2. This is also the behavior that most typed Python code bases today assume, since it’s the behavior that mypy and pyright implement today.

Before I formally write up this proposal for consideration by the full Typing Council, we need to pin down the meaning of tuple[Any, ...] and tuple (with no type arguments). Here are three options:

2a: tuple[Any, ...] follows the same rules as tuple[T, ...]. It implies a union of tuple[()] | tuple[Any] | tuple[Any, Any] | .... tuple is a synonym for tuple[Any, ...].

2b: Unlike the general case of tuple[T, ...], tuple[Any, ...] is considered a gradual type. It is bidirectionally type compatible with any tuple regardless of length. tuple is a synonym for tuple[Any, ...].

2c: We introduce a new form tuple[...], a gradual type that is bidirectionally type compatible with any tuple of any length. The type tuple[Any, ...] is treated as described in option 2a. tuple is a synonym of tuple[...].

Currently, mypy implements option 2b and pyright implements option 2a.

Option 2b would probably be the least disruptive, but I dislike the inconsistency it implies. Inconsistencies like this almost always create problems when composing typing features. For example, how would this be interpreted: tuple[int, *tuple[Any, ...], int]?

Option 2a is consistent but would be potentially disruptive for mypy users. It also leaves the type system without a way to spell “any tuple regardless of its length”.

Option 2c is the most flexible, but it’s also the most disruptive in the short term.

Thoughts?

Jelle · December 6, 2023, 8:16pm

I like option 2a because it’s most consistent. However, it gives me pause that mypy currently implements 2b. struct.unpack (which was brought up by @mikeshardmind earlier) is annotated in typeshed as returning tuple[Any, ...], so changing mypy’s behavior would break mypy users who are assigning the result of a struct.unpack call to a variable typed as a fixed-sized tuple. It would be interesting to experiment with changing mypy’s behavior and seeing the mypy-primer output. (I won’t be able to do that myself, but if anyone is interested, feel free to open a draft PR against mypy.)

I don’t like option 2c because this situation doesn’t feel common enough to justify a new addition to the type system.

Daverball · December 6, 2023, 9:49pm

I like the flexibility of option 2c and it should keep the people who want gradual length tuples happy, since a tuple that is gradual in length but specific in element type should not come up very often and when it does, it can be narrowed with assert len(x) == n instead of writing the annotation.

It also preserves the current behavior for a bare tuple in mypy, which I think is the main reason this exception in the type system was introduced in the first place, because the expectation is that if you do not specify what kind of tuple it is, the LHS should be able to treat it as any more specific tuple it wants to, just like with any other generic.

I think there’s also an option 2d where we don’t allow writing tuple[...], but a bare tuple still means gradual in both length and element type. This would probably require that tuple no longer generates an unbound generic error by default. ^[1]

This would allow us the flexibility to defer coming up with a syntax to express tuples that are gradual in their length ^[2] to a future PEP. That being said, the only other case I can think of, where a gradual length tuple might be helpful is when we compose it with other gradual types, such as a potential future AnyOf, because then you might want to rewrite struct.unpack to -> tuple[AnyOf[int, float, bool, bytes], ...] without giving up on gradual length.

It could be split into its own error code that has to be manually enabled ↩︎
i.e. a more complete solution ↩︎

mikeshardmind · December 7, 2023, 10:45am

We could use an existing valid form (arguably without meaning currently) for this. tuple[*Ts] (without a corresponding use of typevar tuple). It’s marking it as an indeterminate length. You wouldn’t be correlating what goes in with what goes out, but all the normal checks that work with typevar tuples that haven’t been bound to a type should be run for checking the body of the function and the return independently.

My preference hasn’t changed towards 2a at all, discussing it further has actually solidified my thoughts on why it shouldn’t be. (Consider that Sequence[T] does not impose assumptions about length that prevent indexing or unpacking to a specific number of elements)

2b and 2c are each more useful at the boundary between typable and untypable code than 2a. There have been many projects that have run into issues at this boundary, whether because the code predates python’s type system and needs to remain compatible for users, or it truly is too dynamic to be expressed in python’s type system. I don’t want to be telling people to rewrite the world to fit an ever-evolving view of typing, breaking their users in the process. (a recent case of historical code that the type system just doesn’t bother trying to understand right now came up recently with stripe’s python library) or having to tell them their code can’t be typed. I agree with something @Daverball presented as part of another alternative and think it applies to 2b: If we keep the gradual nature, this should apply to any composition of tuple[GradualType, ...] not just Any.

A tuple of unknown (ed: partially unknown, len >=2) length whose first and last members are known to be ints I don’t think this case presents anything novel compared to tuple[int, *Ts: Any, int] (which may be another point towards saying typevartuple should be how this is spelled out by users)

gwerbin · December 21, 2023, 1:16am

Another place where you find heterogeneously-typed tuples of statically-unknown size is in the “data frame” libraries Pandas and Polars.

As a concrete example, pandas.DataFrame.itertuples dynamically constructs a namedtuple and returns an iterator of those dynamically-constructed namedtuples:

github.com

pandas-dev/pandas/blob/a671b5a8bf5dd13fb19f0e88edc679bc9e15c673/pandas/core/frame.py#L1528-L1537


      
          if name is not None:
              # https://github.com/python/mypy/issues/9046
              # error: namedtuple() expects a string literal as the first argument
              itertuple = collections.namedtuple(  # type: ignore[misc]
                  name, fields, rename=True
              )
              return map(itertuple._make, zip(*arrays))
          
          # fallback to regular tuples
          return zip(*arrays)

Currently Pandas annotates the result as tuple[Any, ...].

I stumbled into this issue because I, as the programmer, happen to know exactly what to expect from that named tuple, so I wrote something like this:

import typing
import pandas as pd

if typing.TYPE_CHECKING:
    class WidgetRow(typing.NamedTuple):
        size: float
        color: str
        quantity: int

widgets = pd.read_parquet("widgets.parquet")

widget: WidgetRow
for widget in widgets.itertuples():
    typing.assert_type(widget.quantity, int)

And I was surprised to find that Mypy didn’t like it:

error: Incompatible types in assignment (expression has type "tuple[Any, ...]", variable has type "WidgetRow")  [assignment]
Found 1 error in 1 file (checked 1 source file)

So consider this a vote from a “regular user” that Python ought to provide some way to express this concept in type annotations, whatever that might be.

guido · December 22, 2023, 2:57am

FWIW, this is not because of mypy’s rules around tuple[Any, ...] and fixed-size known-type tuples. If we declare widget as a tuple[float, str, int] it works fine.

It must be related to the special handling of NamedTuple (which I don’t think has been fully specified).

guido · December 23, 2023, 5:10pm

FWIW, it looks like we’ve painted ourselves into the corner of extreme consistency. This is a common issue in Python in general (and in other 30+ year old languages :-), and we need to just break a tie. As I quoted in PEP 8, in a slightly different context, “A foolish consistency is the hobgoblin of little minds” (it’s by Emerson, I’ve since learned).

I think some variant of 2c is called for. We can give tuple[Any, ...] the same behavior as tuple[T, ...] (i.e., 2a – it implies a union of tuple[()] | tuple[Any] | tuple[Any, Any] | ...). And we can introduce a new form that is bidirectionally type compatible (not a formal term) with tuples of any shape.

Here’s my proposal for that new form: let’s use plain tuple.

This deviates from the general rule that for a generic class C[T], using plain C without subscript is equivalent to C[Any]. So be it. tuple is not just a standard generic class.

As to backwards compatibility, I think this is as good as it gets. At runtime it is fine in any Python version. Older type checkers (or newer ones that haven’t implemented this new feature yet) will interpret it as tuple[Any, ...], which in most contexts is fine. There is only one type of situation where the behavior differs:

def f(p: tuple):
    t: tuple[int] = p

Before this feature was introduced, this is treated as

def f(p: tuple[Any, ...]):
    t: tuple[int] = p

and the second line is an error.

Users who want old interpretation can just change their code to use tuple[Any, ...]. Users who want the new interpretation may have to live with errors when they use a type checker that doesn’t support the new behavior yet. But that’s the same with all other proposals for a specific notation for this case.

IMO the advantage of this variant is that (I suspect) there is much code already that uses plain tuple, mostly to satisfy typing requirements by relatively unsophisticated users. They have a way to improve their typing; if they leave things as is, they get an ever so slightly weaker (wider) type, which probably isn’t a big deal to them. (And, per Jelle’s assertion, this isn’t a common situation.)

guido · December 23, 2023, 5:18pm

PS. There seems to be an interesting exception for tuple unpacking:

def f1(p: tuple[str, ...]):
    t2: tuple[str, str] = p  # Error in mypy and pyright
    a: str
    b: str
    a, b = p  # No error in either

This seems out of scope for this discussion though.

erictraut · December 23, 2023, 5:37pm

I’m strongly opposed to using a bare tuple for this purpose. If there’s a need for a type form that provides bidirectional type compatibility with all tuples (I’m not yet convinced there is), then we need a way to spell it such that it’s not confused with the situation where a user has unintentionally forgotten to add type arguments to tuple. Pyright’s reportMissingTypeArguments warns users if they omit a type argument because it’s easy to do so. If we were to adopt this proposal, tuple (and only tuple) would need to be exempted from this rule, and there would be no way to warn users that they forgot to add type arguments. Please, let’s not do that!

My vote is in favor of specifying that tuple[Any, ...] follows the general rule for other tuple[T, ...] forms (option 2a). If we later determine that a gradual form is needed, then we could introduce tuple[...] (option 2c). We can hold off on that decision for now.

One reason I’m not convinced that there’s a need for a gradual form is that pyright has interpreted tuple[Any, ...] (and the bare tuple) as a non-gradual form for almost two years now, and no one has reported an issue with it or requested that it be changed. That leads me to believe that it’s not that important and we should stop worrying about it unless/until we see actual evidence that it’s needed.

Another reason I’m not convinced that a gradual form is needed is that most (all?) of the examples that were provided above could be handled by Sequence[Any].

guido · December 23, 2023, 6:18pm

I was trying to combine Mike H’s strong antipathy to 2a and my own preference for a syntax that doesn’t look advanced or causes problems with older type checkers.

Contrast your claim “pyright hasn’t had any complaints in two years” to his: “There have been many projects that have run into issues at this boundary […]. I don’t want to be telling people to rewrite the world to fit an ever-evolving view of typing, breaking their users in the process.”

Maybe pyright has a different kind of customers (presumably the majority are VS Code users, which appears to aim at less sophisticated users?) than Mike H is observing (proprietary libraries)?

As I said, we’re going to have to make some folks unhappy. I guess it will come down to a vote in the typing council.

PS. For my own reference: 2a/b/c are described here.

erictraut · December 23, 2023, 7:23pm

For completeness, let me summarize the options that have been discussed. I’ll call Guido’s latest proposal 2d.

2a: tuple[Any, ...] follows the same rules as tuple[T, ...] . It implies a union of tuple[()] | tuple[Any] | tuple[Any, Any] | ... . tuple is a synonym for tuple[Any, ...] .

2b: Unlike the general case of tuple[T, ...] , tuple[Any, ...] is considered a gradual type. It is bidirectionally type compatible with any tuple regardless of length. tuple is a synonym for tuple[Any, ...] .

2c: We introduce a new form tuple[...] , a gradual type that is bidirectionally type compatible with any tuple of any length. The type tuple[Any, ...] is treated as described in option 2a. tuple is a synonym of tuple[...] .

2d: tuple[Any, ...] follows the same rules as tuple[T, ...]. The “bare” form of tuple (with no type arguments) is a gradual type that is bidirectionally type compatible with any tuple of any length.

2a is consistent with pyright’s current implementation.

2b is consistent with mypy’s current implementation.

2c is consistent with the use of ... in Callable. I’ll note that tuple[...] is evaluated without error by older versions of Python. This option provides partial backward compatibility for mypy users (those who are relying on the “bare” tuple to be a gradual type).

2c and 2d are very similar except that 2c provides an unambiguous way to spell the bidirectional form.

Of these four options, I could get behind 2a, 2b, or 2c although I think 2a and 2c are the strongest. I remain strongly opposed to 2d.

guido · December 23, 2023, 8:25pm

Thanks for that summary!

IIRC Mike H (?) also proposed some kind of alternate form, maybe tuple[*Ts] – my objection to that is that it smells of variadic type variables, i.e. advanced stuff, which I think isn’t appropriate here.

The advantage of 2a is mostly theoretical – it is most consistent with tuple[T, ...] (the kind of consistency that, in theory, might matter for implementations), while leaving the door open for 2c, 2d, 2e, …

The advantage for 2b is that it conforms with current mypy, solves the problem for good, and addresses the real issue of what to do for the return type of struct.unpack(). In typeshed that is currently type[Any, ...] which means that mypy users get the advantage of the extra leniency, but pyright users don’t. There are a number of other popular examples, e.g. database fetch_one() style APIs and pandas iterators. (In both examples you have to squint a little, because real database APIs are more complicated – hence struct.unpack as the canonical example, because it really does guarantee that it returns a tuple.)

I foresee real trouble agreeing on a choice between 2c, 2d, etc. In that case, rather than giving up mypy’s behavior for struct.unpack() etc., this sways me towards 2b – let’s solve the problem now, rather than kicking the can down the (already very busy) road.

In terms of complexity of implementation, since we’re talking about a single special edge case, tuple[Any, ...], which already feels special because Any is special, I feel this should not be insurmountable (but then again, I no longer co-maintain a type checker).

Another way of thinking about it: are there really users who are going to have a serious problem when they no longer get an error here, and who need to be able to distinguish between the two cases (gradual tuple vs. tuple of variable length and item type)?

def f(a: tuple[Any, ...]):
    t: tuple[int, int] = a

Personally, my expectations around this case are vague, and in my mind I don’t try to draw from my experience with tuple[T, ...] to reason through the example.

MegaIng · December 23, 2023, 10:47pm

I might have missed some part of the discussion, but wouldn’t tuple[*Ts] make a lot of sense? Then the signature of unpack could look like this:

def unpack[*Ts](format: str, buffer: bytes) -> tuple[*Ts]:
    ...

and then if/when we get subscripting for function objects, a usage in isolation could be correctly written as unpack[int,int,int,int]("!4B", buffer), which for example could easily be verified by extra tooling/linters based on the arguments.

And as long as we don’t have subscript support for functions (or if people don’t want to deal with the performance cost), the type checkers can infer the variables. IMO, this wouldn’t be hard to explain or would look too confusing.