Feature proposal: mixing 0-based and 1-based indices with `a[i]` and `a{i}`

ilotoki0804 · November 13, 2024, 10:16am

If you can’t give up on 1-based indexing, put dummy data on the first element and use it as if it were 1-based indexing. It’s not full 1-based indexing, but it has been recognized as useful in many mathematical applications that require it.

Alex-Wasowicz · November 13, 2024, 1:05pm

That’s not what I was focused on, while saying about 1-based indexing being worse. Python should have only 1 way of doing something. Currently for any (non-zero) k-based indexing, the obvious way is a[i-k]. To say that syntax for 1-based indexing is needed, is to say that 1-based indexing is not only common, but it’s normal. (If being common is the argument, then a[i-1] is perfectly ok.)

Sorry if sounded harsh. I just don’t think Python should implicitly recommend 1-based indexing. I don’t think anyone should.

ayhanfuat · November 13, 2024, 3:06pm

He has a blog post on this topic: Why Python uses 0-based indexing

tim.one · November 13, 2024, 6:44pm

Great! His reasoning there essentially echoes Dijkstra’s, op. cit., 1982 note, intimately tied with how slicing (aka range()) works.

However, t think the reasoning “half-open slicing implies 0-based indexing” isn’t really compelling. The Icon language combines half-open slicing with 1-based indexing, and it works fine. In one way, it’s arguably a bit more elegant. In Python, if you want to explicitly construct a slice spanning an entire sequence s, it’s

s[0 : len(s)]

but that ugliness is hidden by the shortcuts s[0:] and even s[:]. In Icon, it’s

s[1 : 0]

instead. That is, 0 in a slice is viewed as being to the left of the first element in Python, but to the right of the last element in Icon. Negative indices work the same way in both languages, but again Icon’s

s[1] is the first element from the left, and s[-1] the first from the right

is a bit more elegant on its own than Python’s

s[0] is the first element from the left, and s[-1] the first from the right

Either works fine in practice (I used Icon extensively before Python existed).

But, no, having both a single language isn’t attractive to me. The differences (like the meaning of 0 in a slice) are subtle and easy to trip over.

willingc · November 15, 2024, 6:40am

Agreed. If folks wish to lean into 1-indexing, Julia is a reasonable language option and in some ways similar to Python in terms of learning curve.

olivier-ploton · November 16, 2024, 8:43am

The sentense fragment you cite is out of context. I meant “Converting 1-based indices makes the implementation artificially harder for the programmer”.

So, if I understand well, you mean “Converting 1-based indices makes the implementation artificially harder for the language”, and the expression “our codebases” in your reply designates the source code of a Python implementation (e.g. cpython).

I suggest you to have a look at my sample implementation (here). I did not count precisely, but (excluding unittests) there are only 100 or 200 lines modified wrt the standard cpython, and absolutely no Python code of the stdlib is modified. Of course, a serious implementation would be harder, but would not require deep source modifications.

If the expression “our codebases” in your reply also designates the codebase you maintain as a Python programmer, I suggest you to compile my sample implementation and to run your code with it. Normally there must be no change.

olivier-ploton · November 16, 2024, 9:02am

Thanks for the note. Very interesting. You’re preaching to the choir: I find 0-based indexing and semi-open slices very nice. The spirit of my proposal is to consider 0-based and 1-based indexing as complementary, not as opposed.

olivier-ploton · November 16, 2024, 9:34am

Exactly. And a{2:0} also appears to me as confusing, so of not interest, and it’s far better to use a[1:-1]. The point is: we are not obliged to use (always a[...] and never a{...}) or (always a{...} and never a[...]). We can mix them. We are not accustomed to do it, but it can be fruitful. Example:

plo@buxin:~/GIT/cpython$ ./python 
Python 3.14.0a1+ (heads/index01:05aceb5634, Nov 11 2024, 17:16:26) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = ["first", "second", "third", "fourth", "fifth"]
>>> # I need the 4th item. Just ask for it.
>>> a{4}
'fourth'
>>> # I need all but 1 item on the left and 1 item on the right
>>> # The standard Python notation is the best, just use it.
>>> a[1:-1]
['second', 'third', 'fourth']
# I need the 2nd item from the right. Just ask for it.
>>> a[-2]
'fourth'
# # I need to modify the 3rd item. Just do it.
>>> a{3} = "3rd"
>>> a
['first', 'second', '3rd', 'fourth', 'fifth']
>>>

olivier-ploton · November 16, 2024, 10:35am

A nice feature of the Icon convention you describe is that requiring all elements but the n first and the p last is very easy: s[n:-p] (as in Python, but works also for p=0 if I understand well). There is no simple equivalent with the notation s{...} I propose, but this is fine, just use the stantard Python expression s[n:-p] even if you thing s as 1-based.

But, no, having both a single language isn’t attractive to me. The differences (like the meaning of 0 in a slice) are subtle and easy to trip over.

We think dual-mode indexing as toxic because it is usually implemented on a type per type basis. On the opposite, the spirit of the proposal is to bind the indexing mode to the notation instead of the type. I think it opens perspectives worth considering.

By the way, I do not know any “mainstream” language supporting both 0-based and 1-based indexing, and where the indexing base is explicitly expressed by the operator, like the a[i] vs a{i} I propose, instead of being bound to the type or class of the indexed object a. Does anybody have examples of such languages ?

olivier-ploton · November 16, 2024, 12:07pm

I’m quite astonished to see a Python developper advicing me to quit Python for another language. The tremendous success of Python is due to ergonomy rather than speed, isn’t it ? Julia’s popularity seems to be growing very rapidly among the scientific computing community. It’s symptomatic that such a recent and successful language has chosen 1-based indexing. It reveals the need for 1-based indexing support among a whole community.

I’m a novice in Julia (say, 1 week training) but I plan to practice it seriously (in my research speciality, Operasion’s Research / scheduling, I’m involved in a research project combining Machine Learning and optimization, more and more of my colleagues are using it).

As I said many times, I do not personally adhere to 1-base indexing. AFAIK, Julia developpers recognize that 0-based indexing is a must, and they accept it (through offset-arrays, so on a type-by-type basis, a slippery slope according to me because I find it toxic).

From the replies I got to my proposal, I have the impression that, excluding (toxic) type-based indexing, programmers only consider two universes as possible: all the world is 0-based or all the world’s is 1-based. I think my proposal would reconciliate the two universes and enhance the (already fantastic) ergonomy of Python.

olivier-ploton · November 16, 2024, 12:17pm

Sorry if sounded harsh.

No problem.

I agree with you about dates. 1-based years/century/milleniums are awkward (with year 2000 not being the start of the 3rd millenium !!!). The trouble is, we are bound by these conventions. Same for indices. When we are bound by math conventions, i.e. 1-based indexing, a support from the language would be appreciated.

pf_moore · November 16, 2024, 2:03pm

It’s hardly advising you to quit Python. But if you like 1-based indexing, Julia has it and Python doesn’t. And honestly, it’s not something that’s going to get added to Python, no matter how much you argue for it - there simply isn’t anything like enough benefit to justify the cost. So if 1-based indexing matters to you, switching languages is your only realistic option (assuming none of the ways people have suggested for doing this in Python as it stands appeal to you).

I’m baffled as to why you’re even proposing this feature if you don’t even intend to use it yourself.

Not at all. But languages have to make choices - one of the things that made Python as popular as it is, is precisely the fact that there aren’t a plethora of ways of doing the same thing. So having multiple ways of indexing is contrary to the design philosophy that made Python a success.

olivier-ploton · November 16, 2024, 3:20pm

You’re right, it seems paradoxical. Said very shortly, I prefer 0-based indexing, but I often have to program in contexts where the pressure for 1-based indexing is very high.

Sorry, I lack time at the moment, I plan to send a more argumented reply soon (few days).

jamestwebber · November 16, 2024, 5:24pm

I was responding to your point that converting 1-based indices from an algorithm or mathematical formula makes implementation a little harder–I agree, but this happens at the point of implementing a concept into code, and then it’s done.

What you’re proposing is to introduce the potential for that mismatch in between every package. Any given package might decide to use 0- or 1-based indexing in its interface, and now I have to check each of my dependencies for this, and do the conversion in my own code when necessary. It’s vastly worse for the Python ecosystem.

I’ll say that I already have to deal with both of these problems (I have to work with file formats that use 1-based closed ranges, for mysterious reasons). I still wouldn’t want this feature, it would just make things worse.

sirosen · November 16, 2024, 5:35pm

I think you will not find any – certainly I don’t know of any.

If I had to guess why, I would assume that most language designers agree with the sentiments expressed in this thread by several respondents that mixing indexing styles is a bad idea.

IMO there’s an extremely strong argument against this, which comes in two parts:

it is likely to confuse beginners
it offers insufficient new expressive power to seasoned practitioners

You will have to overcome that argument if you really think this proposal can succeed. I don’t see a clear case being made here.

Mostly, from what I’ve read, you’re working off of the fact that some domains have problems which are traditionally expressed with 1-based indexing. To which I say… “Yes, and?” You need much more than that to make a case for language syntax. Python already has a syntax for 1-based indexing: L[n-1]. I have always found that syntax sufficient.

My blunt recommendation is to drop this idea. I think it’s a bad one. (Trust me, I’ve had lots of bad ideas of my own! ) But if you decide to persist, my recommendation is to try to formulate a strong argument as to why n - 1 indexing isn’t a satisfactory solution.

tim.one · November 16, 2024, 9:05pm

It seems to me that people have been considering it, but just don’t see benefits commensurate with the costs and opportunities for confusion. Leaving aside unaddressed complications. For example, how sequence.index(elt) should behave was raised as an issue, but there’s no clear answer. It doesn’t end with that. There’s no clear answer about what to do in any context returning an index. For example, what should bisect.bisect() do? The many components of a regexp match object exposing target-string indices?

I’d be happy with 1-based indexing too, but actively don’t want a choice in the language. The benefits of 1-based indexing as an option are just too minor to justify much of anything. Like others here, on the rare occasions I want it, I stick a (intended never to be referenced) None at the start of a list or tuple. I’ve never wanted it for other sequence types (string, array.array, memory-mapped file, …).

The heapq.py example left me cold, because over the decades I may have changed every line of the implementation at least once, and never had the slightest problem with adjusting “textbook algorithms” to Python’s 0-based indexing. Because the job it’s doing is so straightforward I didn’t look at textbooks at all (although did study analyses of variants in Knuth’s exercises).

There is one argument in favor of 1-based indexing that hasn’t been made yet: string.find(sub). That returns -1 if sub isn’t found, and that’s a bug magnet. That is, the result is truthy if and only if sub is not a prefix of the string. In a 1-based world, it would return 0 instead, and then the result would be truthy if and only if sub in string, and that’s what people “intuitively expect”. That’s a small wart we live with.

diekhans · November 16, 2024, 10:33pm

Please don’t do this. Python already has too much syntax, and this reuse of {} just makes it hard to learn, pushing it towards the PERL mess.

The use case shows a lack of good data abstraction. I work with one-based and zero-base data all the time. The one-based data gets converted to zero-based on input and backed to one-based on output.
If I worked in a one-based language, the abstraction would convert zero-based to one-base internally.

It is dangerous not to normalize the data internally to a single convention. I often deal with integrated one-based and zero-based data. If both are conventions used internally, it creates an awful mess.

If one really wants to do one-base indexing, one can create a derived list type. Just don’t put it in the language.

Melendowski · November 17, 2024, 3:05am

If you want an example, from another language that supports multiple types of indexing syntax, look no further than to Matlab which supports () and {} and . all as valid methods of indexing depending on the container type. Arrays, cell arrays, strings, struct arrays, tables.

It’s horrible design choice, leads to confusion and inconsistencies.

Even as of a few months ago there are questions like this being asked on their forums.

https://www.mathworks.com/matlabcentral/answers/2152700-is-there-a-meaningful-guideline-on-when-to-use-parentheses-vs-square-brackets-vs-curly-brackets-in-w

https://www.mathworks.com/help/matlab/ref/subsref.html

dg-pb · November 17, 2024, 9:35am

The way I see it, 99% of all the reasons for “There should be one-- and preferably only one --obvious way to do it.” apply to this case.

dg-pb · November 17, 2024, 2:04pm

The issue with custom syntax that I see is that there are many variations and making up new syntax just for one of them doesn’t seem right.

While I appreciate that someone might have a reason to like it, the proposed variation, where a{:0} == a[:-1], is semantically ambiguous at best.

Having that said, I can think of one possibility that I wouldn’t mind too much.

Custom indexing can already be done in pure Python, e.g.:

class indexing:
    obj = None
    def __init__(self, transform):
        self.transform = transform
    def __call__(self, obj):
        self.obj = obj
        return self
    def __getitem__(self, idx):
        return self.obj[self.transform(idx)]

i1 = indexing(lambda idx: idx - 1)

lst = [0, 1, 2]
i1(lst)[1]    # 0

However, there are 2 issues with it:
a) not very convenient
b) slow

To overcome these, somethin similar could be implemented in CPython. E.g.:

default_index_transforms = {'i1': lambda idx: idx-1}
index_transforms_stack = []

class index_transform:
    def __init__(self, transform):
        if isinstance(transform, str):
            transform = default_index_transforms[transform]
        self.transform = transform
    def __enter__(self):
        index_transforms_stack.append(transform)
    def __exit__(self):
        index_transforms_stack.pop()

lst = [0, 1, 2]
print(lst[1])           # 1
with index_transform('i1'):
    print(lst[1])       # 0
    with index_transform(lambda idx: idx - 2):
        print(lst[1])   # 2
    print(lst[1])       # 0
print(lst[1])           # 1

So if index_transforms_stack is empty, it would use current indexing, and if it is not, then it would use the last transform.

This way:
a) If this is not used, there would be no changes to syntax / behaviour
b) Some common transforms can be implemented efficiently in CPython
c) Users can play with custom transforms by implementing them in pure python (and propose for addition to CPython or write extension to improve performance)

Of course there are issues:
a) limiting scope, to prevent a case where importing package appends transform to stack, but does not remove it

Maybe could just clear the stack after exiting current scope and only allowing it inside function. After all, if this is mostly useful for translating algorithms to code, the scope is limited:

@index_transform('i1')
def bubble_sort(lst):
    # code

Although the above is just my initial thoughts and is far from PoC, but I would guess there exists a path along these lines that could solve this without introducing new syntax.