Negative zero (-0) as a slice index

blhsing · March 5, 2025, 2:00am

While using a negative number as a slice index from the end of a sequence is quite a stroke of genius, it hits its limit when one tries to represent a slice beyond the end of a sequence.

In Python multiple assignment: want to (tokens.append(),tokens.append(), remaining) = remaining.partition(blank) - Stack Overflow, where the OP asks if there is a clean way to extend the list tokens with the first two items of the tuple returned by remaining.partition while assigning the rest back to remaining, one of the answers responds with a good use of a slice beyond the end of tokens:

*tokens[len(tokens):], remaining = remaining.partition(' ')

But the call to len(tokens) (which involves quite a few bytecodes) makes me wonder if we can simply support -0 as a slice index to fully generalize the idea of using a negative number as a slice index from the end of a sequence, so the above can be rewritten as:

*tokens[-0:], remaining = remaining.partition(' ')

This is technically possible because struct _PyLongValue already supports a negative 0 through the lower 2 bits of lv_tag, so by allowing -0 to result in an object different (but still equal) to 0 we can support this generalization of a negative slice index.

The downside to this idea is that -0 is 0 will no longer be true, but since using the is operator on an integer currently produces a SyntaxWarning: "is" with 'int' literal. Did you mean "=="?, there shouldn’t be any serious project affected by this change.

Thoughts?

facundo · March 5, 2025, 2:25am

The term “there shouldn’t be any serious project affected by this change” talking about a language as popular as Python around the world, implies that you will be breaking a lot of code.

In other words, if you want to pursuit with your proposal, you need to answer: Does this change value enough to pay the cost of breaking a lot of code?

jamestwebber · March 5, 2025, 2:25am

Isn’t this backwards incompatible? One can imagine that existing code might be relying on seq[-0:] being equivalent to seq[0:].

blhsing · March 5, 2025, 2:27am

Yeah, I just thought of that. Scratch the idea then. Thanks for the feedbacks!

Stefan2 · March 5, 2025, 9:47pm

Use 2**64, that’s just one LOAD_CONST

Edit: Better use 2**62 (or 5**27, which is larger and still as fast and as short):

134 ns  a[len(a):] = 1,
277 ns  a[2**64:] = 1,
 91 ns  a[2**62:] = 1,
 91 ns  a[2**20:] = 1,

134 ns  a[len(a):] = 1,
284 ns  a[2**64:] = 1,
 95 ns  a[2**62:] = 1,
 84 ns  a[2**20:] = 1,

131 ns  a[len(a):] = 1,
276 ns  a[2**64:] = 1,
 93 ns  a[2**62:] = 1,
 89 ns  a[2**20:] = 1,

from timeit import repeat

for _ in range(3):
    for c in [
        'a[len(a):] = 1,',
        'a[2**64:] = 1,',
        'a[2**62:] = 1,',
        'a[2**20:] = 1,',
    ]:
        t = min(repeat(c, 'a=[]'))
        print(f'{t*1e3:3.0f} ns ', c)
    print()

Attempt This Online!

ntessore · March 6, 2025, 4:56pm

It could be any other sentinel value, such as collections.END or similar.

blhsing · March 7, 2025, 2:48am

Cool idea. It makes the code a bit unreadable and confusing though, and I imagine 2**62 would be slow on a 32-bit platform.

blhsing · March 7, 2025, 2:56am

Good idea too. Having to import a library just to use a sentinel for a core syntax feels a tiny little bit inconvenient though.

I’m wondering if we can use ... as a sentinel instead. Feels intuitive enough to read like “beyond the end” to me:

a[...:] = 1,

MRAB · March 7, 2025, 3:34am

Instead of a sentinel that stands only for the end, what about one that supported offsetting too?

In other words, you could add and subtract an int:

a[END - 2 : ]

It would be converted to an actual index when the length was known.

If it supported, say, @, then len(a) @ END == len(a).

blhsing · March 7, 2025, 3:47am

Matthew Barnett:

Instead of a sentinel that stands only for the end, what about one that supported offsetting too?

In other words, you could add and subtract an int:
a[END - 2 : ]
It would be converted to an actual index when the length was known.

Yeah I was just thinking about this too. Indices are often calculated rather than fixed numbers, and currently we have to write a[len(a) - offset:] instead of a[-offset:] whenever there’s the possibility that offset can be zero. And if a is really a complex expression, we have to store it as a variable before we can use len(a) as an index of a. This feature does make a new sentinel much more useful and compelling.

Not sure what you are trying to say here but I know that the Python community prefers to avoid repurposing symbols in general.

gcewing · March 7, 2025, 3:59am

If we’re making negative zeroes exist, we’d need to think about what other properties they should have. Should the following be possible, for example:

def overlay_ending_at(seq, end_index, value):
    seq[end_index - len(value) : end_index] = value

things = [1, 2, 3, 4, 5, 6, 7]
other_things = [17, 42, 88]
overlay_ending_at(things, -0, other_things)

jamestwebber · March 7, 2025, 5:42am

I think it’s important not to conflate the benefits of two different things here–there’s sugar for “the end of a sequence” and then there’s how that gets implemented. A sentinel probably doesn’t save any of the work from happening, but it might make it more succinct to type. Whether the brevity is worth the complexity is debatable, and might benefit from a prototype.

blhsing · March 7, 2025, 6:30am

I don’t think the benefits are necessarily of two different things, but more of a generalization of an offset from the end of a sequence where the offset can be zero.

Implementation-wise, the new sentinel does not need to be a singleton but an object that can save the state of an offset. Something minimal like:

class OffsetFromEnd:
    def __init__(self, value):
        self.value = value

    def __sub__(self, offset):
        return OffsetFromEnd(self.value - offset)

END = OffsetFromEnd(0)

class NewList(list):
    def _normalize_index(self, index):
        # TODO: same treatment for index.stop
        if isinstance(index, slice) and isinstance(index.start, OffsetFromEnd):
            index = slice(index.start.value + len(self), index.stop, index.step)
        return index

    def __getitem__(self, index):
        return super().__getitem__(self._normalize_index(index))

    def __setitem__(self, index, value):
        super().__setitem__(self._normalize_index(index), value)

a = NewList([1, 2])
a[END - 1:] = 3, 4
print(a) # [1, 3, 4]
a[END:] = 5, 6
print(a) # [1, 3, 4, 5, 6]

xitop · March 7, 2025, 7:58am

Exactly. We can reach the zero from both sides, I guess that’s how the negative zero idea originated.

Perhaps a new option to an explicit slice() call could help:

mylist[ slice(start, None, from_end=True) ]

(I didn’t think much about the option name.)

blhsing · March 7, 2025, 8:22am

Indeed that’s how the idea originated. I guess if negative zero had been part of the slice design since day 1 there wouldn’t have been much if any downside. Too bad that ship has sailed long long ago.

That would technically work and does support mylist being a complex expression too, but it looks rather verbose and still involves a few too many bytecodes. I think a sentinel is still a better option.

EDIT: Also, it would be unclear if from_end applies to start or stop or both, so you’d need two options, start_from_end and stop_from_end, just to be clear.

Rosuav · March 7, 2025, 8:58am

Negative zero, per se, would have to be part of much more than just slices; there’s no integer representation for negative zero, and a floating-point slice index causes more trouble than it’s worth.

Pike actually takes a slightly different approach. You can still use negative slice indices, but if you want to absolutely, definitely specify that you are counting from the rear, you can use [<1..] (Pike uses .. where Python uses : - the meaning is the same). This means that [<0..] is entirely meaningful, and functions as “zero spaces back from the end”, equivalent to a negative zero.

This would be a larger change for Python, as it would require actual syntax, but it may be worth exploring.

blhsing · March 7, 2025, 9:10am

CPython’s PyLongObject does already support a negative zero as pointed out in my OP, but the ship has sailed anyway so let’s not go there anymore.

Yeah a new syntax like a[<0:] would be very cool but the relatively small gain in convenience might not be deemed worthy of the effort required for a syntax change by the devs. A sentinel may still be the path of least resistance.

petersuter · March 7, 2025, 10:57am

You could define something like:

def last(n):
    return slice(-n, None) if n>0 else slice(0)

to use:

>>> a = [1,2,3,4,5]
>>> a[last(0)]   # instead of a[<0:]
[]
>>> a[last(2)]   # instead of a[<2:]
[4, 5]

and avoid requiring new syntax.

blhsing · March 7, 2025, 11:12am

That works only for __getitem__. The problem with list.__setitem__ is that there is currently no slice that can express a start beyond the end without knowing in advance the length of the sequence, so no amount of wrapper over slice will help.

jamestwebber · March 7, 2025, 4:12pm

My point was mostly that the proposed solutions are not going to avoid any byte-code (in fact it probably adds more), but it does make the written code shorter and possibly easier to understand.

I was making that point because in your OP you seemed to dismiss using [len(sequence):] because of the overhead of calling len. But we’re not removing any overhead, as far as I can tell?