While using a negative number as a slice index from the end of a sequence is quite a stroke of genius, it hits its limit when one tries to represent a slice beyond the end of a sequence.
But the call to len(tokens) (which involves quite a few bytecodes) makes me wonder if we can simply support -0 as a slice index to fully generalize the idea of using a negative number as a slice index from the end of a sequence, so the above can be rewritten as:
This is technically possible because struct _PyLongValue already supports a negative 0 through the lower 2 bits of lv_tag, so by allowing -0 to result in an object different (but still equal) to 0 we can support this generalization of a negative slice index.
The downside to this idea is that -0 is 0 will no longer be true, but since using the is operator on an integer currently produces a SyntaxWarning: "is" with 'int' literal. Did you mean "=="?, there shouldn’t be any serious project affected by this change.
The term “there shouldn’t be any serious project affected by this change” talking about a language as popular as Python around the world, implies that you will be breaking a lot of code.
In other words, if you want to pursuit with your proposal, you need to answer: Does this change value enough to pay the cost of breaking a lot of code?
from timeit import repeat
for _ in range(3):
for c in [
'a[len(a):] = 1,',
'a[2**64:] = 1,',
'a[2**62:] = 1,',
'a[2**20:] = 1,',
]:
t = min(repeat(c, 'a=[]'))
print(f'{t*1e3:3.0f} ns ', c)
print()
Yeah I was just thinking about this too. Indices are often calculated rather than fixed numbers, and currently we have to write a[len(a) - offset:] instead of a[-offset:] whenever there’s the possibility that offset can be zero. And if a is really a complex expression, we have to store it as a variable before we can use len(a) as an index of a. This feature does make a new sentinel much more useful and compelling.
Not sure what you are trying to say here but I know that the Python community prefers to avoid repurposing symbols in general.
If we’re making negative zeroes exist, we’d need to think about what other properties they should have. Should the following be possible, for example:
I think it’s important not to conflate the benefits of two different things here–there’s sugar for “the end of a sequence” and then there’s how that gets implemented. A sentinel probably doesn’t save any of the work from happening, but it might make it more succinct to type. Whether the brevity is worth the complexity is debatable, and might benefit from a prototype.
I don’t think the benefits are necessarily of two different things, but more of a generalization of an offset from the end of a sequence where the offset can be zero.
Implementation-wise, the new sentinel does not need to be a singleton but an object that can save the state of an offset. Something minimal like:
class OffsetFromEnd:
def __init__(self, value):
self.value = value
def __sub__(self, offset):
return OffsetFromEnd(self.value - offset)
END = OffsetFromEnd(0)
class NewList(list):
def _normalize_index(self, index):
# TODO: same treatment for index.stop
if isinstance(index, slice) and isinstance(index.start, OffsetFromEnd):
index = slice(index.start.value + len(self), index.stop, index.step)
return index
def __getitem__(self, index):
return super().__getitem__(self._normalize_index(index))
def __setitem__(self, index, value):
super().__setitem__(self._normalize_index(index), value)
a = NewList([1, 2])
a[END - 1:] = 3, 4
print(a) # [1, 3, 4]
a[END:] = 5, 6
print(a) # [1, 3, 4, 5, 6]
Indeed that’s how the idea originated. I guess if negative zero had been part of the slice design since day 1 there wouldn’t have been much if any downside. Too bad that ship has sailed long long ago.
That would technically work and does support mylist being a complex expression too, but it looks rather verbose and still involves a few too many bytecodes. I think a sentinel is still a better option.
EDIT: Also, it would be unclear if from_end applies to start or stop or both, so you’d need two options, start_from_end and stop_from_end, just to be clear.
Negative zero, per se, would have to be part of much more than just slices; there’s no integer representation for negative zero, and a floating-point slice index causes more trouble than it’s worth.
Pike actually takes a slightly different approach. You can still use negative slice indices, but if you want to absolutely, definitely specify that you are counting from the rear, you can use [<1..] (Pike uses .. where Python uses : - the meaning is the same). This means that [<0..] is entirely meaningful, and functions as “zero spaces back from the end”, equivalent to a negative zero.
This would be a larger change for Python, as it would require actual syntax, but it may be worth exploring.
CPython’s PyLongObject does already support a negative zero as pointed out in my OP, but the ship has sailed anyway so let’s not go there anymore.
Yeah a new syntax like a[<0:] would be very cool but the relatively small gain in convenience might not be deemed worthy of the effort required for a syntax change by the devs. A sentinel may still be the path of least resistance.
That works only for __getitem__. The problem with list.__setitem__ is that there is currently no slice that can express a start beyond the end without knowing in advance the length of the sequence, so no amount of wrapper over slice will help.
My point was mostly that the proposed solutions are not going to avoid any byte-code (in fact it probably adds more), but it does make the written code shorter and possibly easier to understand.
I was making that point because in your OP you seemed to dismiss using [len(sequence):] because of the overhead of calling len. But we’re not removing any overhead, as far as I can tell?