PEP 204 - Range Literals: Getting closure

Fantastic to see this being discussed. As it happens, I have been considering the general idea myself off and on for the last few years, and last year I did a little work on a drop-in replacement for range to add a bunch of extra functionality, such as representing infinite and semi-infinite ranges.

My question is: why do they need to be separate? Maybe there’s some advanced Numpy thing I’m neglecting, but I feel like all the differences are reconcilable.

  • range has to have definite endpoints - why? There’s nothing conceptually wrong with the idea of a semi-infinite range (granted, it would be hard to define iteration order for infinite ranges, and hard to specify e.g. “the set of all even numbers” vs. “the set of all odd numbers”). Nothing prevents starting at a start point and increasing by step indefinitely; and nothing prevents an O(1) calculation of whether a given integer would be in that sequence.
  • range doesn’t have an indices method - why not? 'foobar'[range(3)] makes about as much sense to me as 'foobar'[slice(3)] does.
  • slice isn’t iterable - why? It would be useful for a lot of user-defined types to be able to define __getitem__ on a slice by, say, using a list comprehension to iterate over the indices that are conceptually in the slice (after normalizing endpoints). I assume it’s because the start and stop can be arbitrary objects, but just how much do we gain that way? Integers and None seem to be by far the common case, even with Numpy. The other fancy Numpy trick is masking, but that uses a completely different type for subscripting.
  • The reprs work differently - seems pretty trivial.

Of course, I suppose I am proposing breaking changes, and I get that the idea of 4.x is still roundly rejected. But I think the breaks are in pretty obscure cases, and perfect semver is a pipe dream anyway.

My original idea was: unify these types, and continue using the : syntax - including surrounding square brackets for clarity (and to avoid any precedence issues). With the ability to represent semi-infinite ranges, this neatly replaces both itertools.count, and enumerate:

# old
for i, element in enumerate(sequence, start):
    ...
    # I never liked how the `enumerate` arguments are in the opposite order of the unpacked values.

# new
for i, element in zip([start:], sequence):
    ...

Better yet, this would automatically allow adding a step value, which would be nice to have .e.g in game engines:

# draw the map, tile by tile, at a specific location on screen
for x, row in zip([x_offset::dx], level):
    for y, cell in zip([y_offset::dy], row):
        render(cell, x, y)

This avenue also seems promising, if distinctions do need to be made. Conceptually, slicing the set of all integers ought to give the corresponding range, yes? So we just need a builtin object to represent that.

This seems pretty wild. Dunder methods aren’t supposed to be called directly, so I assume you have in mind that the range (or slice, depending on other decisions) literal syntax would wrap a call to __range__? That leaves the question of choosing which class to check for a method (I imagine that using the NotImplemented protocol, but three ways… oof). The benefit seems questionable, anyway; string ranges in particular seem problematic (what will you do when there’s more than a single character in the endpoints? What if the endpoints surround the surrogate-pair Unicode range? What if the stop is left unspecified?).

One crucial-yet-subtle difference is the behaviour of negative numbers. It doesn’t really make sense to talk about the range from 1 to -1 (without a corresponding negative step), but it’s perfectly reasonable in a slice:

>>> "Test string"[1:-1]
'est strin'

And this is also why you can’t simply iterate over a slice - not without the underlying object (or at very least, its length).

It may be reasonable to make a unified slice-range hybrid, but to do that, we’d need a syntactic way to indicate “slice from the end” rather than the current arithmetic way. So here’s a quick question for everyone: Have you ever used a variable in a slice, such that the variable might be positive and might be negative? Imagine if the slice were defined with a nonnegative number, but an optional special character to indicate “slice from the end”. Something like this:

> "Test string"[1..<1];
(1) Result: "est strin"

(That’s Pike syntax, which uses .. where Python uses :.)

Maybe this would be suitable? It would allow negative numbers to be perfectly valid in a range/slice (rice??) object, but they would stop it from being meaningful in a subscript; and it would allow from-the-end slicing notation, but it would stop it from being iterable. Obviously this is a complete non-starter if people are doing something like this:

end_position = -1 if some_cond else 10
substr = "Test string"[1:end_position]

so I’m hoping to hear from anyone who’s ever done that.

Unfortunately not. Slicing the set of whole numbers (that is, non-negative integers) would work that way, and if we permitted open-ended ranges, that would already basically function:

# W = range(0, ∞) # okay this part doesn't work yet
W = range(0, 1<<1000) # pretend that numbers stop after a while
assert W[3:5] == range(3, 5)
assert W[5:100:10] == range(5, 100, 10)

(Incidentally, I just checked, and range(5, 100, 10) == range(5, 105, 10), so equality is defined as yielding the equivalent sequence of numbers, even if the start/stop/step aren’t all equal. TIL.)

But the negative numbers problem will continue to plague this kind of notion. It’s not an easy thing to fix.

Here (as well as in PEP 204) we seem to be attacking three problems as one:

  1. A shorthand for range(), such as e.g. [1:5] or 1..5.
  2. Extending the shorthand to compound expressions which are not expressible as a range, such as e.g. [1, 3:6, 7].
  3. Adding range features, such as e.g. character ranges.

We shouldn’t conflate these questions. For example, “itertools.chain does that” isn’t an answer to the example in (2) because the question was precisely how to write itertools.chain more compactly.

That would be a good compromise if we were to differentiate slice objects from range objects. Square brackets are confusing because that’s where slices live currently. Very interestingly, slices right now cannot be surrounded by parenthesis which makes a great candidate.

my_list[1:2, 3:4, 5:6]   # Middle one 3:4 is loosely placed
my_list[(1:2)]  # Syntax Error

Python grammar specification says it is a comma separated list of slice objects, making slice just the colon separated groups.

slices:
    | slice !',' 
    | ','.(slice | starred_expression)+ [','] 

slice:
    | [expression] ':' [expression] [':' [expression] ] 
    | named_expression 

So if range syntax could work identically as generator expressions where it must be surrounded by parenthesis where a function call ones might be enough if the range is the only argument. The only difference between generator expressions and list comprehensions is the type of braces surrounding it and we are fine with that.

for x in (1:10):
   print(x)

my_list = list(1:10)

C# uses ^ where Pike uses <, i.e. "Test string"[1..^1].

I wonder if this use case is the target of
the python ranges package ?

That package’s Range class implements Interval logic, i.e. not iterable, which a built in range is

1 Like

That looks unrelated, but it does resemble something else I’ve had in mind to implement for quite some time.

Python does let you create such a range and I’d be satisfied if iterating over such a slice simply didn’t yield any elements. The indices method returns a “normalized” 3-tuple of start/stop/step values. In the design I had in mind, it would return another instance of the same unified class.

The more I think about it, though, the more it makes sense to have separate classes. A slice conceptually seems to represent abstract information for pulling out indices, not the indices themselves.

I was getting confused with some details of the design I was implementing locally. My “ranges” simply don’t treat negative indices in the standard way; it wouldn’t make sense, anyway, since they support infinite and semi-infinite intervals. Without the reverse-indexing semantics, slicing a “range of all integers” conceptually does give the range results I want, but it takes a bit of work to implement.

For my own purposes, I think I will stick with a separate library class and continue treating slice as an implementation detail. Z[a:b:c] (or ℤ[a:b:c], which normalizes to the same) is nice enough syntax already.

How? Let’s say I have Z as a representation of all integers. What is Z[10:20]? That is to say - what’s the tenth integer? If you ask a mathematician to organize the integers such that they can be enumerated, you’ll probably get this sequence:

0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7…

By that definition, Z[10] would be -5 and Z[10:20] would be [-5, 6, -6, 7, -7, 8, -8, 9, -9, 10]. I’m not entirely certain, but I don’t think that’s what most people would expect range(10, 20) to be :slight_smile: But if you DON’T number the integers in this sort of way (interleaving positive and negative), how do you define the Nth integer? How do you slice such that you can find negative numbers?

That’s why it’s only reasonable to slice non-negative integers (or whole numbers) in this sort of way.

In my design it’s straightforward: 0 is the 0th integer, 1 is the 1st integer, … and -1 is the -1th integer etc.

But, yes, I can’t have an iteration compatible with the index/length protocol that way and also have it iterate all elements. (Not that Z has a valid len anyway!) Having to special-case iteration for Z vs. finite and semi-infinite ranges - in order for it to be completely iterable at all - was already bothering me, in fact. But reading this I’ve realized that I can just, instead of making the class directly iterable, offer multiple named methods returning iterators (like what dict does, but without a default). People who want the [-5, 6, -6, 7, -7, 8, -8, 9, -9, 10] result can use itertools.islice or something.

Yeah, that still wouldn’t make sense with the idea that slicing this infinite list would be equivalent to a range object. You can’t just slice the -1th element to the 5th element, for instance.

I want to keep this conversation going, so I did some rudimentary changes to Python grammar as to allow parenthesized colon separated expressions to output range objects.

>>> for x in (1:4):
...    print(x)
...
1
2
3

>>> list(1:10)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> from dis import dis
>>> dis('(1:10:2)')
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (1)
              4 LOAD_CONST               1 (10)
              6 LOAD_CONST               2 (2)
              8 BUILD_RANGE              3
             10 RETURN_VALUE

The only changes were adding the BUILD_RANGE opcode that calls a custom PyRange_New which right now only deal with integers.

Source code of changes: Range Expressions by jbvsmo · Pull Request #1 · jbvsmo/cpython · GitHub (Branch: GitHub - jbvsmo/cpython at rangexpr_v0)

My idea here is to create a playground for people wanting to experiment with other syntax, although I’ve grown to like this option.

I believe the greatest benefit of range expressions isn’t the shorthand syntax for ease of typing, but the possibilities this open to greatly optimize loops during the bytecode generation step removing a call to a global function and at least storing range objects as constants. This would be aligned with the current push for performance we’ve seen starting with Python 3.11.

2 Likes

i can use a range i get from range() as a key in a dictionary. how will that be affected by range literals?

Nothing should change in that regard. Range objects should continue being hashable

Why this drive to type fewer and fewer character to express ideas in code?

I find range() far easier to read the (a:b:c). And its easy to search for.

One of the things I value about python is that there are few magic characters to understand.

I have worked with people that know a domain that I’m working on but not python and its possible to pair up and fix domain bugs in my python because it makes sense.
Its not hard for a non-python coder to understand the algorithms.

As is often said “code is read far more often then it is written”.

9 Likes

so a range literal inside [ ] should be valid as an index and not have some other special meaning since a range object can be an index.

It’s currently the case that d[(0, 5)] is the same as d[0, 5] , but d[(0 : 5)] would be d[range(0, 5)], so not the same as d[0 : 5].

On the other hand, if range(0, 5) could be written as [0 : 5], there might be less chance of confusing d[[0 : 5]] with d[0 : 5], except that it looks more like a list, and lists aren’t hashable…

Although, maybe that’s a good thing, as it’s more likely to raise an exception and point to a bug.

2 Likes

Range objects are not and cannot be used as indexes. (Unless you have a class with a custom __getitem__ method.)

>>> [10, 20, 30, 40][range(0, 3)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not range

Ranges are not slices, and this proposal to make ranges look even more like slices will surely cause a lot more confusion.

3 Likes

Sets and Tuples are very different concepts with almost no overlap other than they are some sort of containers. The only difference between declaring a set and tuple are the braces surrounding them, like

s = {1, 1, 2, 3, 3, 3}
t = (1, 1, 2, 3, 3, 3)

The set has 3 elements while the tuple has 6 and they look almost identical, don’t you agree?

Same with function calls and array indexing:

a = foo[1, 2]
b = foo(1, 2)

Very different concepts. Similar execution.

So why

a = foo[1:2]
b = (1:2)

would be confusing? They don’t even appear in the same context. And if you put a range inside square brackets, it still would require parenthesis which is disallowed for slices currently (SyntaxError).

All this considered, I am not saying we must follow this syntax. I would like to see other ideas. See my previous post here where I implemented the BUILD_RANGE opcode. You can just edit line 807 of Grammar/python.gram file to any syntax you think is nice and give it a try because I did the heavy lifting already.

1 Like

Ranges are hashable, so:

>>> d = {range(0, 5): 'Some value'}
>>> d
{range(0, 5): 'Some value'}
>>> d[range(0, 5)]
'Some value'