Why can't we ...?

Jelle · November 1, 2024, 2:41am

There are a number of ideas around making the type system more usable that come up repeatedly and that make sense at first glance, but that run into some problems on further consideration. For example, writing tuple types as (int, str), writing literals as just 1 instead of Literal[1], or writing TypedDict types as {"key": type}.

I wrote up a piece explaining some of those ideas and the problems with them:

https://jellezijlstra.github.io/why-cant-we

If you have anything to add (such as another idea that could go on the list, or a problem I haven’t covered), please let me know! And if you think any of these ideas are promising enough that they’re worth considering despite the problems noted in this document, please open a new thread to discuss them and be prepared to write a PEP.

monk-time · November 1, 2024, 12:34pm

Thank you for this fantastic overview!

There’s something that kept swirling in my mind as I was reading the document, but I must first apologize if any of this will sound ignorant as I’m very far from being able to call myself a confident user of typing, let alone an expert.

A lot of the problems listed (e.g. stuff around | operator) seem to arise from a design decision made quite a long time ago that Python should avoid creating a separate mini-language for typing annotations. This blocks any obvious parser-based solutions to many problems outlined in the document, like changing how | operator is parsed in an annotation.

It seems to me this restriction was reasonable at the time but by now it became double-edged: while typing annotation technically are 100% valid Python, many of the workarounds of that restriction such as using subscripts or being unable to use bare/dict/tuple/func literals made it so typing certainly feels like a separate mini-language. And paradoxically, splitting into a separate mini-language would make annotations feel much more like Python (which is what all the suggestions in the document are trying to achieve) even though technically they would cease being such. Could it be that it’s time to review and reassess that decision?

On a separate note, there was one spot in the document (‘Presence in subscripts’ for ‘Tuples as tuple types’) that I wish was elaborated on a bit further as the reasoning is unclear to me. The paragraph talks about the current syntax but doesn’t explain why X[a, b] and X[(a, b)] being identical causes any issues for the proposed syntax. My first (admittedly uneducated) reaction to that was that both tuple[a, b] and tuple[(a, b)] would be identical to (a, b) while tuple[tuple[a, b]] would be equal to ((a, b), ), so from the text I couldn’t understand what makes any of this a roadblock.

Jelle · November 1, 2024, 2:19pm

I talk about this subject a bit in the “Themes” section at the bottom. If you (or anyone reading this) has a workable proposal for how to allow more flexibility in annotations, feel free to start a discussion about it.

I posted Clarify problem with tuple literals by JelleZijlstra · Pull Request #15 · JelleZijlstra/JelleZijlstra.github.io · GitHub to clarify the problem.

The examples you gave are a bit hard to think about because they use tuple[], and under this proposal there should never be a need to write tuple[]. It may be more helpful to think of an arbitrary type that is generic over a TypeVarTuple, class X[*Ts]: pass. X[a, b] and X[tuple[a, b]] are currently separate types. Under the proposal, the second would most obviously be written as X[(a, b)], but that’s the same in the AST as X[a, b].

monk-time · November 1, 2024, 3:15pm

Ah, I think it finally clicked, thanks! So the problem is not with the new syntax acting as a replacement of tuple[] per se in simple annotations like def f(x: int) -> (int, int), but with how it would work when used inside of other generic types where the parser currently treats parenthesis as meaningless. Yeah, that sounds like another place that could be solved by (and maybe only by) branching off the Python parser into some kind of PythonType mini-language.

drhagen · November 1, 2024, 3:56pm

This is a great summary of typing syntax limitations.

I am ambivalent on whether or not it would be great if (int, str) were legal syntax for tuple[int, str], but X[a, b] meaning the same thing as X[(a, b)] is a huge wart in Python that should be iced off irrespective of typing needing it.

jamestwebber · November 1, 2024, 4:05pm

That’d be a massively breaking change though, so I don’t think it’ll happen.

ImogenBits · November 2, 2024, 1:27am

Something that I’ve been thinking about whenever this comes up, which you might also have already hinted at in the last paragraph, is that most of these problems stem from runtime objects like (a, b) not being compatible with their use as types. But what people are really looking for isn’t actually creating a tuple object in a type annotation, but to use it as a way to spell the type of tuples with a and b elements. So if you somehow made the syntax (a, b) create the tuple[a, b] type object when the programmer intends to write a type rather than a value, most of these issues would be solved.

Of course, there isn’t an actual way to determine the programmer intent when parsing code, but I think you can get reasonably close. The vast majority of types are written in function/class annotations or type statements, particularly once the old Something: TypeAlias = OtherThing syntax has been phased out. So the code being inside a __annotate__ or TypeAliasType.__value__ function should cover most cases where people want to write a type.

Would it be possible to modify the way these functions are generated to create different code when encountering specific constructs? Most of the wanted features could be achieved by changing the code generation for literal expressions. Things like typed dict inheritance via unpacking would need something even more involved. I haven’t looked into their actual implementation, so this might be unfeasible. But since we’re already doing a special transform of the annotation ASTs into the function it doesn’t feel too far fetched to have it also modify code generation.

But even if that is possible it might not actually be a good idea. You’d have to keep the old syntax around since there are use cases for creating types in expression contexts. I’d guess that it’d also be pretty contentious to have the same syntax lead to entirely different objects being created depending on where it is. It also only partially gets around the reason for rejecting PEP 677, it doesn’t directly add new syntax just for typing but it does create new typing-only semantics for it.

willingc · November 2, 2024, 1:50am

Nice write-up. I want to encourage you to consider turning the post into an informational PEP or a history of typing doc.

drhagen · November 2, 2024, 11:01am

Not massively so because it would only be a breaking change at the declaration site (i.e. def __getitem__) and, even then, only for implementations expecting more than one argument. I’ve written maybe a half dozen classes that match that description in my career, and all of those were matrix/tensor classes where I would have loved this feature.

jamestwebber · November 2, 2024, 3:30pm

It would break some ungodly amount of numpy code, and anything that imitates that interface (that’s a lot of stuff). That’s a big breakage.

MegaIng · November 2, 2024, 4:47pm

I think what might be possible to introduce a new __[sg]etitem_ex__ with a different signature that falls back to the normal version for the foreseeable future (or forever), similar to how iter falls back to __getitem__

ajoino · November 2, 2024, 5:15pm

I don’t inderstand how that is a huge wart, could you give some examples of when it’s bad?

drhagen · November 3, 2024, 11:07am

Only if NumPy did’t update its definition of __getitem__ to the new syntax before Python made the from __future__ directive automatic.

I was thinking about a from __future__ directive, but this would work too. The lifecycle is not substantially different, though. Either way, you only have to update the declaration site. At some point, we’ll want to get rid of the old way, and that will be a breaking change on any library that did not update and was expecting multiple arguments.

drhagen · November 3, 2024, 11:44am

The main problem is that multiple arguments to [] violates type stability.

Let’s use NumPy as an example. As everyone knows, in Numpy, you make arrays:

import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])

In NumPy, you can index each dimension with a sequence (e.g. [1,2] or (1,2) or np.array([1,2])), which selects along a dimension.

x[[1,2], [1,2]]  # array([5, 9])
x[(1,2), (1,2)]  # array([5, 9])
x[np.array([1,2]),np.array([1,2])]  # array([5, 9])

You can pass a scalar, a sequence, or a slice to any number of dimensions in NumPy and that dimension will drop, select, or slice, respectively. As you might have guessed, there is exactly one case where this is not true: if you pass in a sequence to only the first dimension and that sequence happens to be a tuple. In that case, you get completely unrelated behavior.

x[[1,2]]  # array([[4, 5, 6], [7, 8, 9]])
x[(1,2)]  # np.int64(6)
x[np.array([1,2])]  # array([[4, 5, 6], [7, 8, 9]])

NumPy does this because it can’t tell the difference between x[(1,2)] and x[1,2] so it guesses that the latter was intended. This is documented in a big warning. NumPy would not do this if the Python syntax didn’t force them to do it. That is most prominent consequence of this wart.

jamestwebber · November 3, 2024, 4:36pm

Sure, it’s possible to change the behavior over the course of many, many years. That’s what is required for a big breaking change, which is how I described it originally. No matter how much notice there is, people end up caught by surprise, and code that used to work stops working^[1]. It’s an unavoidable unfortunate fact of life for a language this big and old.

Maybe it actually is worth doing, and I can certainly see the appeal, but I personally don’t find it compelling enough to want to embark on the 6+ year process.

one relevant example: it’s currently possible to create a tuple elsewhere and then use it as an index, e.g. [my_array[t] for t in zip(some_x, some_y)]. Maybe the new version would need to support my_array[*t] for this case ↩︎

zhangyx · November 6, 2024, 10:36pm

deleted (well, pretending to be deleted)

Throwing in one more why can't:

# Instead of this
x: list[str] | set[str]
# Why not this?
x: (list | set)[str]

i.e.

Union[A, B][T] = Union[A[T], B[T]]

Seems like a low-hanging fruit.

MegaIng · November 6, 2024, 11:54pm

Is this an proposal that has been discussed multiple times before, do you have any links? This is the first time I am seeing it. If you don’t have previous discussions to link to, open a new thread, this is not the correct place to discuss novel ideas.

zhangyx · November 7, 2024, 12:09am

Deleted

Well I think it should either end up as a proposal or in the pool of "why not"s.

I am not sure if it there exists any technical challenge that makes it infeasible. That’s why I posted it here.

You’re correct. I’ve moved it into a new thread.

MegaIng · November 7, 2024, 12:13am

Sure, there are some technical challenges. But again, this thread is the wrong thread to discuss it, so please open up a new one if you want this to actually be discussed.

ntessore · November 7, 2024, 9:29am

Thanks, this is a great resource. If I may offer a suggestion: a table of contents at the top could help readers discover that there is a “Themes” section at the end.