Allow zero arguments for subscript syntax

Nineteendo · May 11, 2024, 3:36pm

OK, a bit of a rabbithole:

expression_list        ::= expression ("," expression)* [","]
expression             ::= conditional_expression | lambda_expr
conditional_expression ::= or_test ["if" or_test "else" expression]
or_test                ::= and_test | or_test "or" and_test
and_test               ::= not_test | and_test "and" not_test
not_test               ::= comparison | "not" not_test
comparison             ::= or_expr (comp_operator or_expr)*
or_expr                ::= xor_expr | or_expr "|" xor_expr
xor_expr               ::= and_expr | xor_expr "^" and_expr
and_expr               ::= shift_expr | and_expr "&" shift_expr
shift_expr             ::= a_expr | shift_expr ("<<" | ">>") a_expr
a_expr                 ::= m_expr | a_expr "+" m_expr | a_expr "-" m_expr
m_expr                 ::= u_expr | m_expr "*" u_expr | m_expr "@" m_expr |
                           m_expr "//" u_expr | m_expr "/" u_expr |
                           m_expr "%" u_expr
u_expr                 ::= power | "-" u_expr | "+" u_expr | "~" u_expr
power                  ::= (await_expr | primary) ["**" u_expr]
primary                ::= atom | attributeref | subscription | slicing | call

pf_moore · May 11, 2024, 3:44pm

No, we don’t. It’s a subscription expression which has its own syntax and semantics.

Part of that semantics is that a special method is called, with the components of the expression being passed to that special method in a way that’s defined by the language.

You’ve completely misunderstood here. The expression list in a subscript expression is collected into a tuple to pass to the special methd, but there’s no duplication of code.

No. The special function is defined as requiring one mandatory argument. You want to call it with no arguments in at least one case, namely the __class_getitem__ of tuple. So the definition is now one optional argument. Basically every existing implementation of __getitem__ and __class_getitem__ would suddenly be broken. And books and training materials would need to be changed, etc, etc.

Well, your implementation of Foo.__class_getitem__ is invalid according to the language spec. And presumably type checkers would therefore reject it. If the signature was changed to allow one optional key as an argument, def __class_getitem__(cls, key), which is valid now, would become invalid (because the key argument is not optional).

You cannot break user code to that extent, without a very good justification. So far the only justification I’ve seen is “it’s confusing that you can’t omit the parentheses in tuple[()]”.

Also, you were unhappy about a generic SyntaxError (and suggested that the message needs improving). Why are you OK with a generic TypeError? Surely the same logic applies there?

Rosuav · May 11, 2024, 3:47pm

Please can you stop going back and making substantive edits to your posts? It makes the thread VERY hard to follow.

Nineteendo · May 11, 2024, 3:49pm

I accidentally posted it, you can’t delete messages here.

Rosuav · May 11, 2024, 3:52pm

You can post a followup though.

Nineteendo · May 11, 2024, 5:29pm

I see what you mean now, thanks for your patience with me.

I’m talking about this: typing — Support for type hints — Python 3.12.3 documentation
From the way they’re presented, it looks like they accept any number of arguments like foo.
It’s not obvious that they actually work with tuples.

tuple[(int, int, int)] == tuple[int, int, int]
tuple[(int, int)]      == tuple[int, int]
tuple[(int,)]          == tuple[int]  # Single argument is put in a tuple
tuple[()]               # tuple[]

foo = lambda *x: x
foo((int, int, int)) != foo(int, int, int)
foo((int, int))      != foo(int, int)
foo((int,))          != foo(int)
foo(())              != foo()

Nineteendo · May 11, 2024, 6:05pm

Could we do a poll? Even if it’s not sensible for every type, it could still be useful. Feel free to vote for syntax error, just note that it needs >=50% of the votes. Otherwise it has an advantage over the other options. See it as sentinel or no sentinel.

jamestwebber · May 11, 2024, 6:15pm

Polls are not useful unless there is consensus about what the question should be. Right now it’s not clear that there’s even a problem to solve.

Nineteendo · May 11, 2024, 6:19pm

As @pf_moore pointed out, modifying the signature of __getitem__ is too disruptive.
The only reasonable question is if it could be called with a sentinel value.
If you don’t t think this is useful, vote for “Syntax Error”, otherwise vote for the sentinel value.

mdickinson · May 11, 2024, 6:19pm

I’m not sure I follow this logic. My understanding of @Nineteendo’s suggestion is that given a[], the general machinery would call type(a).__getitem__(a) (or a.__class_getitem__() if a is a class). Existing Python-encoded __getitem__ methods would then return a TypeError with message along the lines of "__getitem__() missing 1 required positional argument: 'key'", which doesn’t seem unreasonable. Types could opt in to supporting the [] syntax on a type-by-type basis if it was useful for them; existing __getitem__ methods wouldn’t be required to change.

More generally, I don’t think there’s any rule that says that any given dunder method should always have the same signature across all of its implementations, and though I’ll grant it’s not common, there’s at least one precedent for allowing different signatures: if you’re writing a custom class and implementing support for the pow built-in, you’re free to spell your __pow__ method definition as either def __pow__(self, other): ... or def __pow__(self, other, modulo): ..., depending on whether you want support for the 3-argument variant of pow or not.

It’s a bit messier for C extensions and builtins written in C, since calling the existing mp_subscript slot with a key of NULL would likely just segfault in at least some cases, so you’d probably need a new mp_unary_subscript slot to support the [] syntax (and again, if that slot didn’t exist, the general machinery would raise TypeError). But it seems feasible.

Please don’t take this as endorsement of the suggestion, and I don’t want to understate the amount of work that would be involved (which is huge, likely involving new bytecodes, and possibly an extra slot to allow C extensions that want to opt in to support unary subscript, documentation, third party tools, and much much more), but I think if we really wanted to do this, it could be done in a way that doesn’t break existing __getitem__ implementations.

jamestwebber · May 11, 2024, 6:21pm

Okay, I’ll vote (until you reset the poll yet again). But this poll is completely useless.

pf_moore · May 11, 2024, 6:26pm

I have my issues with the way Python’s typing syntax is implemented, but the constraints the developers had to work with in order to force the type definition mini-language into Python’s syntax are almost certainly a large part of why things are the way they are.

Specifically, “generic type” syntax is something that exists in many languages these days, and the GenericType[param1, param2] syntax (often with <...> brackets rather than [...], admittedly) is familiar to many people. So implementing it using Python’s subscripting mechanism is an obvious approach. Yes, it looks like multiple parameters while actually being a tuple, but that’s a relatively minor inconvenience to get a syntax that users will be familiar with.

Having said that, I’m pretty sure most languages with generic types don’t allow 0-arg generics (after all, what would it even mean? Without args, it’s not actually generic…). So not allowing tuple[] is perfectly reasonable in that context.

And that brings us back to the crucial question - what is the actual use case for this proposal? I can’t think of any reason to define a function argument as having to be a 0-element tuple - there is only one such value, so such an argument is pointless. The examples from your search don’t help - I only looked at a few, but the types involved look quite contrived - as far as I can see, they seem to be along the lines of “pass a value of my sequence-like type, or () if you don’t have anything to pass”. For that, tuple[()] is no harder to understand than tuple[], and in fact I’d argue that a much better replacement would be Literal[()].

pf_moore · May 11, 2024, 6:31pm

Agreed. I was being a bit too free with the term “break”. In my defense, mostly because I didn’t want my post to be any longer than it already was - I struggle to be concise at the best of times

Nineteendo · May 11, 2024, 7:20pm

Thanks, I couldn’t come up with an example on my own, so I thought it didn’t exist.

That’s fine, at least it allows each class to define the most sensible default. Which gives this a better chance of being accepted. Therefore I vote against calling it with a sentinel value. Which makes that poll pointless. ~great~

Do other languages with generics define a tuple in the same way Python does? Like Tuple<int, int>?

No, but in that case it’s used in conjunction with Union or Optional and in return annotations.

I have also used it like this:

__slots__: ClassVar[tuple[()]] = ()

Nineteendo · May 11, 2024, 7:24pm

I would love to help with that, but I’m simply not qualified to do that. My C knowledge is very limited, and the C-API of Python even more.

As for people that would support this: I’m looking at codegolf…

NeilGirdhar · May 11, 2024, 7:39pm

A few things:

The Python docs are extensive and far from concise, so I can see why it would be hard to find things. However, you can find what you’re looking for here.

My personal guess is that if we were designing things from scratch today, we would have made it so that subscription calls __getitem__ with one parameter for each comma-separated element.

Then, you would have what you want since:

x[] would call x.__getitem__(),
it would also fix the main objection to your desire to have tuples being used to indicate tuple type annotations since subscripting by a tuple (x[(a, b)] calls x.__getitem__((a, b))) would be distinguishable from subscripting by a multiple items (x[a, b] calls x.__getitem__(a, b)), and
the implementation of __getitem__ would generally be simpler since it would start with len(args) rather than isinstance(key, tuple) and then checking the length.

As an idealist, that’s the Python that might be worth working for, but the present-day cost is extremely high, so my guess is that you’d need more than this to warrant such a change.

Nineteendo · May 11, 2024, 7:54pm

Thanks, but @pf_moore already linked to that. I mean that typing doesn’t explain this (because it would confuse people even more). I don’t see tuple[...] as a subscript, I see it simply as a type hint.

I also thought about simply treating a[(b, c)] as a[(b, c),], but that’s not possible either as people already use the redundant parentheses.

I don’t think there’s a real downside of allowing X[] for classes that define it. But the main problem is that someone must pour the time and effort into implementing this. And sadly I won’t be able to do this in the near future.

MegaIng · May 11, 2024, 8:50pm

I would go a step further and guess that even keyword arguments would be supported ala PEP 637, essentially meaning we have two call syntaxes. Sadly there isn’t even really a backwards compatible way to add a new dunder because of a[x] vs a[x,] which would be undistinguishable with the new dunder, meaning we would have to keep both syntaxes forever. (or we add a really akward way for the new syntax to notice the difference).

But anyway, I would support a proposal to fully add call-like syntax to subscribing, probably via a new dunder, but only allowing a[] seems to much effort for very little benefit. We also don’t allow a = as a statement assigning the empty tuple.

Nineteendo · May 11, 2024, 8:54pm

We could use a{...} for an alternative call. (It even looks a bit like parentheses).
I wouldn’t use this to replace the current type annotations, though.

Daverball · May 11, 2024, 8:56pm

I think you are operating under a fundamental misunderstanding. We already have a weird special case built into Python’s literal tuple syntax, and it’s exactly the one you have trouble reconciling with in the subscript syntax ().

With any other literal tuple the parentheses are not part of the tuple literal and in many cases completely optional, what makes it a tuple is the ,. The parentheses are only necessary to resolve precedence issues when nesting literal tuples in other expressions.

So I don’t think tuple[()] is any more weird than () itself. The only reason we have (), is because we can’t spell an empty tuple with ,.

Specifically the single element case you can spell a different way that is actually equivalent to your parenthesized example, i.e. passes a tuple to __getitem__

tuple[(int,)] == tuple[int,]

The type system is pragmatic here and allows you to omit the , for single parameter generics, so it more closely mirrors how generics look in other languages, but fundamentally __getitem__ is different from __call__, you get exactly what the expression you pass in evaluates to, there’s no special parsing like with function arguments.

I understand that singularities in a language can be irritating, but complete and total consistency is really not important enough to value it over all other design considerations, it’s enough if it’s mostly consistent and the few special cases are well documented.