What are the subtyping rules for tuple[T, ...]?

Liz · November 30, 2023, 10:08am

As someone else involved in that lengthy discussion, I’d prefer if we just handle the special cases that are needed to make this behave well for end users. The coq program which was shared in that discussion showed it was possible to handle tuple as a gradual type and included the neccessary subtyping rules.

Daverball · November 30, 2023, 10:15am

I see, that’s a shame. I haven’t played around with TypeVarTuple enough yet to know all the edge cases, so I assumed it would work like any other unbound type var.

Edit: It looks like it does work the way I expected it to in mypy, so this looks like another area where the behavior is under-specified.

I still think we would want to be able to express both “this is a tuple” and “this is a variable length tuple” regardless of whether we specify the element type or not.

I would be happy with defaulting to option 1 in the case of tuple without type parameters, with the promise of a future extension to the syntax to allow expressing this explicitly. I would be sad if tuple[int, ...] would implicitly convert to tuple[int, int], I could somewhat live with tuple[Any, ...] converting to tuple[int, int] but I would be worried about the mistakes that could hide.

mikeshardmind · November 30, 2023, 10:28am

For what it’s worth, I would actually like if all type vars did not have any restrictions like this and incorporated the call-site as well, but I don’t know that this would be an intuitive way of handling it, I’m leaning towards it not bein since typevars (let alone typevar tuples) continue to need to be something I explain to people I mentor in a professional setting.

I think I could live with it not if the special casing was only on tuple and tuple[Any, ...] and a hypothetical future tuple[AnyOf[*Ts], ...] (And any other gradual type as the first argument to a homogenous, indeterminate length tuple), but I’m not sure if this is a significant improvement to special case it here instead, and I think it makes it more complex to reason about or teach.

With it being gradual, we can just explain that indeterminate length tuples are treated as compatible with any tuple of a specified length of compatible type.

Additionally, everywhere the length should matter should result in an error if misused, so it shouldn’t cause subtle issues of mixing types unexpectedly, this should error out early and detectably, not hide errors in most cases. (basically the only case it wouldn’t is the naive indexing into a tuple that is larger than expected, and I don’t see that being likely to happen, but if anyone has a counterexample motivated by real-world code for this, I’d love to see it to consider)

There are technically other ways to resolve some of this, but I think they require future additions to the type system, and I would rather be more strict with the tuple behavior only with a better alternative in hand, not before.

Liz · November 30, 2023, 10:36am

I know I’m more knowledgeable than most python programmers about type systems, but I don’t think tuple[*Ts] as a return type that implicitly matches an annotation of the caller is absurd to expect people to learn. If this is viable and the type checkers that don’t allow it currently are okay with making a change to allow this, I’m fine with tuple[T, …]` being an infinitely sized union of tuples containing only T as long as it is also narrowed when a length is checked.

Did anyone contact someone from pyre or pytype about this? Both of them treat this as gradual and would be effected by a change specifying that it isn’t.

pf_moore · November 30, 2023, 10:40am

Typing novice perspective here

Without looking things up, I can’t even clearly express in words what tuple[Any, ...] and tuple[Any] mean. So my first comment is that I’d never use either of these myself, I’m almost entirely concerned with how their use by more typing-aware library authors would affect my usage of APIs. Hence my interest in the fetch_row example above. If the consensus is that fetch_row should always be annotated (by the library author) in a way that supports “gradual typing” (i.e., no cast) then I largely don’t care about the distinction. But if the reality is that library authors will tell me to “just add a cast”, then I do care.

Now excuse me while I go and look up the meanings of tuple[Any, ...] and tuple[Any]…^[1]

(The following is my train of thought as I try to work out what the technical details of this discussion are.)

OK, so tuple[Any, ...] means "a tuple of arbitrary length, where each element is of type Any. Is that right? And tuple[Any] is a 1-element tuple where the element has type Any? So every tuple[Any] is clearly also a tuple[Any, ...], surely? Am I misunderstanding what “incompatible” means? Or is the question the other way round? Because tuple[Any, ...] allows 3-element tuples, so it’s (again, clearly?) not compatible with tuple[Any] which must be 1-element.

But what does any of this have to do with whether the return value of a function that is annotated as returning tuple[Any, ...] can be assigned to something that is declared as tuple[str, str]?

I guess it’s correct that in principle assigning the former to the latter is a type error. But in practice, this is a case where Python is dynamic, and the inputs to the function guarantee that the output is tuple[str, str]. And I want to express that by annotating the target variable (see above for why I don’t think casts are a good UX here). Typing the return of fetch_row as Any would allow me to do that, but it would fail to catch obvious errors like not expecting a tuple (a common mistake for me is to fetch a single column, and forget to unpack the resulting 1-tuple).

So I now understand why the term “gradual” is being used here - the library author can’t state the full type of fetch_row, but they can state important features of that type (it’s a tuple) and insisting on an “all or nothing” answer (full type or Any) is suboptimal for both producer and consumer.

But now I’m wondering what’s wrong with a return annotation of just tuple (with no generic parameter)?

At this point, I’m way deeper in the weeds than I want to be. My goal is to use (or write) a simple library function. At least it would be simple in the absence of type annotations. So all of this effort trying to understand the problem is some sort of “cost of entry” to be able to use typing without fear of hitting “looks simple but it isn’t” issues once I’m already committed.

(Stream of consciousness narrative ends here)

So my “typing novice” perspective is:

The compatibility question makes my head hurt, so I prefer to focus on practical effects.
As a consumer, I expect the fetch_row to work with a declared type for the row value, and without a cast.
As a library author, the typing documentation is far to obscure for me to work out how to annotate fetch_row so that it behaves as described in (1). But as a library author I do want to provide the behaviour in (1) for my users, so for now I’ll be annoyed at the state of the typing documentation. If the docs improve, or I learn more, I may be annoyed at the fact that typing isn’t able to express my desired behaviour in a straightforward way, because I then have to make a trade off between two forms of harm to the user experience for consumers of my library (no typing, or require casts).
That “cost of entry” point I made puts me off typing in general…

I’ll note here that I have no idea where I should go to find definitive reference documentation, so I ended up finding the link to PEP 484 from the initial post in this thread, and stumbling around from there. Also, the tuple[Any, ...] syntax is basically impossible to grep for ↩︎

mikeshardmind · November 30, 2023, 10:43am

tuple is specified as being identical to tuple[Any, ...] (And this one is concretely specified, no ambiguity) and your intuition to reach for it was fine. I continued using tuple[Any, ...] to closely mirror tuple[T, ...] to distinguish when the inner type was also gradual.

mikeshardmind · November 30, 2023, 10:46am

You’re striking at the heart of why this being gradual can be seen as providing a level of unsafety. With it being gradual, it is treated as seemlessley compatible in both directions under the “type system does not have enough information to specify one side of this, assume the programmer is correct and that anything incorrect will be caught with a test”

Edit: technically, pytype actually uses more information available to determine some errors with this despite the gradualness, because it checks for total consistent use.

pf_moore · November 30, 2023, 11:22am

Cool. My point is still that as a novice I don’t want to learn this sort of stuff^[1].

As you’ve said, theory and ergonomics are somewhat in conflict here. And IMO, typing is now becoming mainstream enough that ergonomics needs a higher priority. We’ve hit this in packaging - a lot of our users simply don’t care about the technicalities, they just want to get the job done. It’s the old “practicality beats purity” problem, essentially.

as a nerd, it’s fascinating to me, of course, but not when it stops me getting on with writing my code! ↩︎

Tinche · November 30, 2023, 1:10pm

That cast is unsafe though, and it should bother you. If you change your projection to be just name and not change the cast you’ll have lost correctness and can’t have confidence in anything below. One of the main points of using a type checker is to refactor fearlessly. Fact of the matter is this interface cannot really be made type safe (unless you write a typechecker that can analyze SQL and connect to that particular database to get the data types) so you’re just fooling yourself here.

It is incorrect though. The typechecker cannot guarantee it. Even if we change the rules to silence the error it’d still be unsafe. It’d just be false confidence.

In my opinion it’d be a great shame if we ruin the confidence granted by tight type checking because of examples which cannot work with the type system anyway. Some APIs will never be able to work with typing and that’s OK.

mikeshardmind · November 30, 2023, 1:20pm

The cast isn’t unsafe. The types are guaranteed by proper DB use. The fact that the type system isn’t aware of it and can’t be the guarantor of it does not make it any less safe. The same applies to the gradual type use

It was shown above how this works just fine with gradual typing.

Tinche · November 30, 2023, 1:39pm

Yeah it’s unsafe in the eyes of the type system, and it’s unsafe generally because the types in the database might not be strings. It’s a “trust me bro” situation on multiple levels.

Liz · November 30, 2023, 1:40pm

And the type system doesn’t know and the programmer does. It should trust the programmer in that situation given that python is gradually typed. If you want fully static typing, go use a fully static typed language or don’t use libraries that aren’t fully typed without any gradual types, don’t make the process of needing to interact with things that can’t be fully typed worse for everyone else.

mikeshardmind · November 30, 2023, 1:48pm

Can the type system know the types from the database statically? No
Can other tools do some form of analysis? Yes, you even said as much yourself with:

and there are tools like this already in existence. So maybe, given that this isn’t something the type system should know, we should use gradual typing to improve ergonomics and let other tools validate this. My database schemas don’t change frequently. The last time I modified the type of a column was literally over 5 years ago. Adding new columns? yeah, more recent. This kind of error can be caught by other analyses and other testing and does not need to be the type system’s responsibility.

Outside of that, there’s the struct example from before as well Please tell me how This one is unsafe.

result: tuple[int, int, int, int] =  struct.unpack("!4B", buffer)

and not just strictly a deficiency of the type system

Tinche · November 30, 2023, 3:17pm

No, but there are ways that are still better than blind trust. You can model your table in code and have the library compare it against the state of the actual database before it lets you do queries. But let’s not go off on a tangent.

That one is safe but it’d work just as well if the return type of unpack was Any instead of tuple[Any, ...]. And the same goes for @pf_moore 's sqlite example. So if something cannot be handled by the current type system, there already is a good escape hatch here. In fact having those return Any would be a better signal to developers that this code needs tests and cannot be refactored by mostly relying on the type checker.

Look, the crux of my argument is this. If we make the type system weaker by introducing obvious unsoundness for the sake of ergonomics, it ultimately makes it less useful for everyone. And going by the discussion about changing TypeGuard, we’re not going to be able to walk this back later.

pf_moore · November 30, 2023, 3:41pm

So you’re recommending that the return type of fetch_row should be Any, because no other return value can be made type safe?

But then the type checker won’t catch the error I noted earlier, forgetting to unpack a 1-element tuple return.

emp_count = sql_conn.execute("SELECT count(*) FROM employees")

Or as a final option, can we have a variant of cast that:

Isn’t characterised as “unsafe”, dissuading inexperienced users from using it.
Doesn’t have a runtime cost, dissuading users concerned about performance from using it.
Can be used in the declaration of the variable, so it is as easy to find as all other type declarations.

I repeat - the problem here is one of practicality versus purity. I completely accept that the type checker can’t do a perfect job in this case. But I don’t want perfection. I want the type checker to catch as many errors as it can, and let me handle the others myself.

Is there really no option between no fear and existential terror? I want the type checker to make me feel less worried about refactoring, but I’m not asking it to remove all my fear - just to make things a bit easier for me. And I’m happy to take some of the burden in the sense of accepting that some of my annotations will take the form of “trust me, bro” - to use your phrase.

Let me out it another way:

row: tuple[str, str] = db_conn.fetch_row("...")

isn’t completely safe without knowing the content of “…” (and the semantics of SQL), but it does state my intent that “…” will return two strings. The type checker can’t know that, but that’s fine. The type checker can know that fetch_row returns a tuple, and confirm that much. But the fact that there are 2 str values is on me.

What I care about is:

That the declared type is checked to ensure it’s a valid possibility (it’s a tuple).
That subsequent code is checked based on the declared type of the variable.

If I need to add a type checker directive^[1]:

# typing: trustmebro
row: tuple[str, str] = db_conn.fetch_row("...")

then I’m fine with that. It helps me know where I’ve made that assumption.

And maybe #type: ignore does that - but I sort of assume it would prevent the type annotation of row from being recognised (my point (2)) as well. So at a minimum, it needs to be better explained if it does work.

Anyway, we may never agree on this. And if so, that’s fine, I was only trying to give my perspective as a “typing novice”. I’ll add one last point, and leave it at that - this sort of purist viewpoint is what contributes, at least for me, to the general feeling of alienation I get when trying to make use of typing. I feel that my “just wanting to get the job done” approach is seen as inappropriate, and unwelcome when I try to offer feedback.

As a script writer, I’m fine with that - I can simply not use type checkers. But as a library author, I’m left feeling that I have to say “no” to my users when they want my APIs to be typed. And that feels uncomfortable to me.

Anyway, I hope this perspective is useful, even if it might not be welcome (or ultimately make any difference). I really appreciate all the work going into making typing as useful as it is, even if it might not be suitable for me.

side note, I wish directives preceded the affected line, rather than being added to the end, but that ship has sailed ↩︎

Tinche · November 30, 2023, 3:51pm

I’ll have a longer reply ready later, but I want to say this first.

When I feel strongly about something I have a certain style of arguing that may come across as argumentative. But if I made anyone in this thread (or this entire forum) feel alienated or their opinion erased I sincerely apologize. That was 100% not my intention and I’ll do better. And going beyond that, hearing from folks that are self-described typing novices is very important and valuable. This point is much more important than whatever the typechecking folks decide to do with tuple.

pf_moore · November 30, 2023, 3:54pm

OK. So you’re saying that the return value of fetch_row and unpack should be typed as Any. Fair enough. But if that’s the case (typing novice perspective here again) the way Any is typically described gives far too strong an impression that it should be avoided at all costs if you want to write correctly-typed code. I know I thought it was considered wrong to explicitly use Any.

Also, there’s no way that I can see to express the type “Any, but it’s not valid to assign it to a variable that isn’t typed as a tuple”. Which is what we’re trying to express here, isn’t it? If that’s the case, maybe “that’s not a common enough need to be worth supporting” is the answer - in which case let’s be up front and say that. Or if it is worth supporting, aren’t we back where we started?

pf_moore · November 30, 2023, 4:02pm

Thank you for saying so. My style also tends to come across as argumentative, so I’m as much to blame as anyone if that’s happening here. I don’t actually have a problem with your style, or anything you said, but I appreciate the comment.

My “alienation” comment was really only meant to say that whenever anything non-trivial comes up in typing, it manages to become complex so fast that I’m left with the feeling of “this simply isn’t for me”^[1]. The barrier for entry just seems very high. So I back off, until the next time I’m tricked into thinking some concept “sounds really useful and powerful”. And the cycle continues…

In a practical “writing my code” sense. In a theoretical, computer science nerd, sense it fascinates me, as I said ↩︎

mikeshardmind · November 30, 2023, 4:29pm

tuples of an indeterminate length are inherently a gradual type. Both theoretically, and how how they actually behave in the type system. You can’t index into tuple[T, ...] safely, yet we allow that, so why should any other length concern be treated otherwise? pytype can detect issues with these and misuse based on total consistency, so clearly they are also still statically analyzable.

As for “just use Any” I could just as easily shift the onus on those wanting more strictness by saying they already have Sequence[T], The boundary between typed and untyped code is extremely messy. I’d significantly prefer pragmatic use at this boundary that favors the non-typing users and non-experts. Giving every user the extra check that it is actually a tuple as in the SQL row example is more type information in a way that is easy for users. Those who want strictness have an option as well that doesn’t degrade the ability to partially type this ergonomically.

Tinche · November 30, 2023, 4:50pm

I guess this might be the core of the disagreement. I don’t know the exact definition of a “gradual type” so I’m going by intuition here. I routinely use tuple[T, ...] as a kind of frozen list, not like a gradual version of a heterogeneous tuple. So if you use tuple[int, ...] as an immutable version of list[int], then there’s nothing gradual about the type, right?

Anyway, some good points there. I see where you’re coming from so now I just very slightly disagree with your argument

I suppose if we went with option #1, well-typed code should always use Sequence[T] instead of tuple[T, ...] for the sequence use case, which is a little unfortunate since tuple is right there in builtins I accept your argument about the simple case being oriented towards beginners, so maybe making typing users have to import Sequence is OK.

Tangentially,

Nor can you into any collection, right? That’s an interesting point I haven’t considered. I guess tightly typed code shouldn’t use indexing at all, just iteration (except on typed dicts I suppose).