Negative zero (-0) as a slice index

:thinking: a[End + 1:] would be the same as a[len(a) + 1:]? Sounds good.

The only problem I see is with typing. Static checker won’t be able to tell (by type), if the new values of a slice are supported.

Hmm, maybe it’s possible to silently upgrade old Python code? slice.indices could be charged, but End - 0 would need to be less then 0. And then it’s just a negative 0. Not to mention any old C code…

Maybe additional type would help, but that’s just ugly. And what to do with ABC-s. Idk. The complexity just skyrockets, when I think about it

My prototype is meant as an illustration of the logics of what can be implemented in C, with which we will need just a LOAD_GLOBAL bytecode to load the sentinel.

IMO this is an important point. Cost is essentially the argument in the original post:

That “quite a few bytecodes” is three: LOAD_GLOBAL (len), LOAD_FAST, and CALL. The cost of the actual call, to calculate the length, is needed anyway - and calculating the length of a sequence in Python us typically pretty cheap anyway, as the length is maintained in the object (it’s not like C strings, where a length calculation is O(n)).

Do you have any actual timings that indicate that the cost of len(tokens) is really the important limit on performance in real-world code, or is this purely a micro-optimisation? In which case, a language change seems pretty extravagant for such a small benefit… Furthermore, the JIT might well be able to detect that len refers to the builtin, and eliminate the overhead with essentially no language change needed. That would probably be a better aprpoach to take.

1 Like

The other point that has been made is that it’s harder to ask for the last n items a[-n : ].

This has the edge case that when n == 0, it returns all of the sequence, not an empty sequence.

With an END, that would be a[END - n : ].

1 Like

But it could be a[len(a) - n :], right?

If the value is in a variable, yes.

But how about f(x)[-n : ]?

You need to store the value in a variable in order to get its length and get the last n items correctly.

3 Likes

True. This does feel like it’s getting into niche territory, though–I can’t say I encounter a use case for such a thing[1] all that often.


  1. “I need the last n elements from a sequence of length m, and I don’t know either value ahead of time” ↩︎

I have run into problems with this edge case and I have also seen situations where an index variable unexpectedly had the wrong sign and negative indexing was not helpful.

The situation that I have never encountered is having an integer variable n that may or may not be negative and where x[n] or x[:n] etc actually does something useful. I may not know ahead of time what len(x) or n are but I always know whether I want to index from the start or the end and would use explicit syntax for reverse indexing if it existed.

2 Likes

I suppose the way to do that now would be

(a := f(x))[len(a) - n:]

God is that ugly.

f(x)[<n:] or similar would be much clearer.

I know I’ve run across this periodically, and found it an annoyance, though I can’t immediately bring a specific case to mind. I suspect I’d find uses for the syntax.

1 Like

C# has a similar syntax, except that it uses ^ instead of < (and also .. instead of :).

1 Like

I think ergonomics is more of an issue here than performance. The trap of a[-n] when n=0 is probably the one I have fallen into most often…

That one rarely, if ever, does “forward indexing” at the same time as “reverse indexing” is a good observation. What if reversed(lst) allowed transparent indexing and assignment?

Why does it need to be on one line?

names = f(x)
last_names = names[len(names) - n:]

is perfectly understandable.

100%, this suggests that the slicing is not ideal. (We can dream of better systems that will never make it into Python.)

Just use a[-n or None:].

That doesn’t work. You’re confusing it with a[:-n or None].

2 Likes

Ah,right. But a[-n or sys.maxsize:] should work.

1 Like

Yeah the performance argument in my OP is rather silly in retrospect.

As others have pointed out, the goal I should have articulated is really to make it easier to express a slice with an index starting from the end where the offset may be zero.

Yeah that is certainly nicer than @Stefan2’s a[2**62:], though the proposal as discussed has now evolved to include a generalization of a slice index with a calculated offset from the end with the possibility of the offset being 0, in which case sys.maxsize doesn’t help.

I will update my OP accordingly in a moment.

Perhaps obvious, but just in case: if you update the OP, just do it as a separate “update” section, but leave the original intact! Otherwise it will be very hard to follow the discussion.

3 Likes

As James noted, currently -0 == 0, so this would need to change for negative zero to work as proposed. If the proposal is about introducing -0 as a sentinel, using descriptive sentinels like First and Last would be more user-friendly and backward-compatible.

1 Like

On one hand, negative zero exists as float but not as int, but indices must be int…

Also, we already have “None” to mean exactly that: Depending on the sign of the stride/step, Python does understand that “None” as start or stop does refer to the end, it’s certainly much less cumbersome than 2**64 or other proposals I saw in the comments…

I’m not sure what you’re getting at. seq[None:] is equivalent to seq[0:] which doesn’t help with the use-case described above.

1 Like