Feature proposal: mixing 0-based and 1-based indices with `a[i]` and `a{i}`

Hi everyone.

This is a proposal for a new notation enabling to mix 0-based indexing (usual Python and Computer-Science conventions) and 1-based indexing (usual mathematical conventions), at any time, for every indexable object, and without any nasty interference. Just write a[i] as usual for 0-based indexing and a{i} for 1-based indexing.

The hope is to eventually derive a PEP from this proposal. So, comments and advices are welcome.

I wrote a proof-of-concept implementation, available on GitHub, forked repository cpython, branch index01. I’m quite new to git, so please tell me if my git repo is in an inconsistent state.

I am really convinced of these 2 ideas, which I will now develop:

  • we need 1-based indices, and a{i} is the right notation for them
  • we need closed slices (slices including both bounds)

a{i}: the right notation

Usually, languages have a privileged convention, e.g. 0-based indexing for Python, C, Java, Ocaml, and 1-based indexing for Julia. This makes indexing easy when the conventions of the problem match the conventions of the language, but awkward when they don’t.

I don’t think that indexing on a type per type (or class per class) basis, as done for standard types in Pascal or Ada, is convenient. We end up with code duplication or with something like
array[first_index + offset], a complicated version of 0-based indexing.

Writing a{i} along with a[i] seems far better to me:

  • It is very easy to implement: for integer indices, a{i} becomes a[i-1], and for slices with integer bounds, a{i:j:k} becomes a[i-1:j-1:k]. No deep modification of the language and libraries. No incompatibility with existing Python code.

  • There is a very good balance between the notations a[i] and a{i}. The choice of the programmer is not biased by ease of writing or so on, it is only guided by the convention which conceptually best suits the problem to solve: either CS or math, either 0-based or 1-based indices.

  • a{i} is easy to remember as the mathematical convention: everyone is already accustomed to writing an index as a_{i} or M_{i,j} in latex.

  • The a{i} notation is Pythonic (as far as I understand this word): if, for any reason, you have to manipulate 1-based indices, “there should be one-- and preferably only one --obvious way to do it”, this obvious way is to write a{i}.

a{i}: use cases

Prototyping, converting theory into practice

In my humble opinion, Python is a wonderful language for prototyping, because it enables the programmer to smoothly convert a theoretical algorithm and its mathematical formulas into a (perhaps suboptimal but) working piece of code.

In this smooth conversion process, a trap remains, because there are two incompatible conventions to access elements of a list: the usual conventions of Computer-Science languages, i.e. 0-based indexing, and the usual mathematical conventions, i.e. 1-based indexing.

The vast majority of theoretical CS books or papers use 1-based indexing. Have a look at the Python heap queue: The authors explicitly state: “The API below differs from textbook heap algorithms […] We use zero-based indexing. This makes the relationship between the index for a node and the indices for its children slightly less obvious, but is more suitable since Python uses zero-based indexing.” This means that the authors had to convert the textbook algorithm from 1-based indices to 0-based indices, a simple but non-trivial and actually error-prone operation.

It seems very simple to convert by hand a 1-based index i into a 0-based index: just subtract 1, so write a[i-1] instead of a[i]. But it is tedious and error-prone, especially since a[i-1] is not intuitively understood as “the i-th element, 1-based” but as “the element preceding the i-th”,
which is confusing.

Interactive use

Python is great for interactive use, especially inside notebooks (I teach maths and CS, so I often make live demos with sympy for my students). The trouble is, in any array (text, spreadsheet, matrix, line or column vector), the universal convention is to count lines and columns from 1,
so one has to translate 1-based indexing into 0-based Python indexing. It’s a real pain in an interactive use requiring an immediate, on-the-fly, translation.

Consider, for example, a list a with, say, 10 items, and iterate this test: pickup 2 random 1-based indices and type-in on the fly Python code to swap the i-th and j-th element. For the 4-th and 8-th elements, I bet you’ll find it more convenient and more reliable to type a{4}, a{8} = a{8}, a{4} than a[3], a[7] = a[7], a[3] especially if you say out loud “Let us swap the 4-th and the 8-th element” while you are typing or just before.

Negative indices

Notice that Python already uses indices starting from 1, implicitly and without letting the programmer choose: when you write a[-1], a[-2], a[-3], … in general a[-i], you count indices from right to left, starting at index i=1. This introduces a harmful asymmetry between the left-to-right and the right-to-left cases (even if, arguably, the right-to-left case is less important).

With the a{-i} notation, you can write a{-0}, a{-1}, a{-2}, … and the asymmetry disappears. So, the programmer can choose:

  • a[i], 0-based indices from left to right
  • a{i}, 1-based indices from left to right
  • a{-i}, 0-based indices from right to left
  • a[-i], 1-based indices from right to left

Semi-open and closed slices

A standard, 0-based, Python slice is semi-open, because its lower bound is included but its upper bound is excluded. With 1-based indexing, semi-open slices are fine, but we also need closed slices, in which both lower and upper bounds are included.

Closed ranges have been much discussed, and quite recently in this post: PEP 204 - Range Literals: Getting closure [of the debate] (messages #8 and around #30, by @Rosuav, @ntessore, @steven.daprano, @jbo and others). Notice that I’m not asking to reopen PEP 204. I’m only asking for closed slices. Slices are not ranges, but IMO ranges are very easy to derive from slices. As illustrated by the amount of discussions about it, there is a need for closed ranges, so there is a need for closed slices.

I provisionally denote a closed slice with a double colon after the upper bound, i.e. by writing :j:: or i:j:: or i:j::k. This is currently the notation accepted in the proof-of-concept implementation. If I dared to introduce an incompatibility with current Python, I would suggest to simplify :j:: into :j: and i:j:: into i:j:, because in practice :j: and i:j: are superseded by :j and i:j. As usual, any advice or comment about it is welcome.

Use cases for closed slices

  • Math conventions widely use closed integer intervals, i.e. slices without an explicit step, especially from 1 to n inclusive, i.e. 1:n::

  • With negative indices, closed slices enable to reach the end of a list without having to cope with any annoying particular case. For example, a[:-n] usually selects all but the n last elements of a. But this expression does not work with n=0, and we have to write a[:-n or None] to take this particular case into account. With a closed slice we can write a{:-n::} and it works smoothly.

  • With closed slices, we have an opportunity to avoid a confusion between included and excluded upper bound in ranges and loops. Consider for example the sum of inverse squares of the n first positive integers, i.e. from 1 to n inclusive. In Python it is sum(1/i**2 for i in range(1, n+1)). Python avoids the awful trap of C and Java by which 1/(i*i) computes the integer, Euclidean division, i.e. 0 unless i=1. But the upper bound n+1 is a trap, it is very easy to forget +1.

    Suppose now we can write range[slice] for range(start, stop, step). Just use a closed slice and write sum(1/i**2 for i in range[1:n::]). The indication “n included” becomes explicit and obvious. Similarly, we could write random.randrange[i:j] for random.randrange(i, j) and random.randrange[i:j::] for random.randint(i, j).

    I know there are issues because range is not primarily a function but a class. In my mind, this is not the point here. Just rename range e.g. by adding an underscore and write range_[slice] if needed. Or pick-up a completely different name, as seq for sequence. If you wish, have a look at DEMOS/closedslices.py on the GIT repository.

How to compute a closed slice

Unless I make a mistake, here is how a 0-based closed slice start:stop::step should be transformed into an equivalent standard semi-open Python slice. We must be careful, because “up to -1 included” does not mean “up to 0 excluded”.

  • step=None is equivalent to step=1

  • For a positive step and an integer stop, replace stop=-1 included with stop=None, otherwise replace stop included with stop+1 excluded.

  • For a negative step and an integer stop, replace stop=0 included with stop=None, otherwise replace stop included with stop-1 excluded.

As this transformation only works with a 0-based slice, a 1-based closed slice i:j::k has to transformed into the corresponding 0-based closed slice i-1:j-1::k before being processed into a semi-open slice.

Other questions

  • How to mix 0-based indices and 1-based in the same subscript ?

    Often, a[slice1, slice2] is not equivalent to a[slice1][slice2]. So, a[slice1]{slice2} means a[0-based slice1][1-based slice2], but how to denote a[0-based slice1, 1-based slice2] ?

    I suggest writing a[slice1].{slice2}, i.e. the dot between brackets and braces merges both subscripts into a single subscript. Notice that it does not interfere with the object.attribute notation. This suggestion is implemented as part of the proof-of-concept implementation, but any comment or alternate notation is welcome.

  • Should 1-based slicing be configurable depending on the indexed object (e.g. applicable to non-integer data) ? To my mind, yes.

    This suggestion is implemented as part of the proof-of-concept implementation: if a class defines __xkey__(self, key, braced: bool, closed:bool), this method is used to translate extended keys (indices or slices) into standard keys before calling the __getitem__ / __setitem__ / __delitem__ methods.

  • What should we do with methods involving indices ?

    For example, should list.index have a 1-based counterpart list.index1 or an option like list.index(..., base=1) ?

Strong -1 from me. The existence of multiple indexing schemes across languages is enough of a pain without adding the possibility of combining them within a language.

31 Likes

I think this specific proposal should use a new system and gets a -2

20 Likes

Can’t you define a function to subtract 1?

def _(i): # or idx
    return i-1

a[_(i)]

You could also write a function to add 1 to the upper bound of a slice.
Or a function to make the index negative and make 0 sys.maxsize.
You could even create a subclass of list which has the desired semantics.

You don’t need syntax for this, and the names of the functions could be self explanatory.

1 Like

when you write a[~0], a[~1], a[~2], … in general a[~i], you count indices from right to left, starting at index i=0

2 Likes

I’m also -1 on this. It’s fine, it’s a convention that was chosen. This will cause WAY more confusion than it reduces.

This is a cool idea (in an academic sense, I would never use it). Would be cool to see that implementation. Then if someone, like the OP, really wants 1-indexing they can use this class.

2 Likes

I enjoy figuring out ways to accommodate weird desires in Python, but this seems like an even worse idea than 1-indexing syntax.

def get_first(lst):
    return lst[0] if lst else None

A 1-indexing list could never be passed to a function like this, or anything that uses indexing and expects lists to behave like lists. And if code is written to expect 1-indexed lists, you can’t pass them normal Python lists. You would have entirely separate silos of function/list-type compatibility, and one of those lists would be a subclass of the other…

I would hope anybody trying to use such a class would immediately run into errors and abandon the attempt, but if things somehow worked for a while, they’d be racking up technical debt.

1 Like

Setting with 1-based indices is a bit tricker (can the call return a property on self with a getter and setter defined on it?), but just to read from a sequence with 1-based indices, it’s trivial to make a list into a callable:

class OneIndexCallableList(list):
    def __call__(self, index):
        return self[index-1]


cl = OneIndexCallableList(range(1,9))

for i in cl:
    assert i == cl(i) == cl[i-1]
2 Likes

I should note that 1-based indexing is generally not intuitive in computer science, particularly when considering aspects like memory addressing and the binary system. While it aligns with mathematical conventions and is featured in certain math-centric programming languages, it would be inconsistent with Python’s current 0-based indexing and could introduce unnecessary confusion.

1 Like

Zen counter: There should be one—and preferably only one—obvious way to [access an item].

6 Likes

You could do something like

  • The class is a normal list
  • But it has an extra attribute called o or one that shifts the list into 1-indexing mode. So
mylist = ZeroOrOneList(['a', 'b', 'c'])
mylist[0]
# 1
mylist.o[1]
# 1

So [...] gives you zero indexing, .o[...] gives you one indexing. I think such a class could be written and achieve what the OP is asking for, just replacing the OPs syntax of {...} with .o[...].

5 Likes

A long time ago someone thought it’s a good idea to count current year in “Anno Domini” (starting at 1). Then there was a need to name years before the first year (relative, so starting at -1). Now a lot of people think there exist a year 0.

Indexing from 0 is consistent, while indexing from 1 is missleading. Indexing from 0 is always relative, while indexing from 1 is always asymetric, because it works only in one way, so the other is forced to be wierd. Big -1 from me

6 Likes

The potential for a confusion with mixing two types of indices is really great and it was already said quite clearly. Anyway, use-cases for 1-based indexing remain.

Last time I was using it was when MONTH[1] to MONTH[12] represented 'January' to 'December'. Without any doubts is MONTH[m] much better than MONTH[m-1]. However, all it took was MONTH = [None, 'January', .... 'December']. That makes me wonder how many of the use-cases can (or cannot) be solved simply by ignoring the very first item at position zero - or perhaps by making access to it an error.

3 Likes

Very nice notation, the notation I propose, a{-i}, is equivalent in ergonomy but admittedly not better than a[~i]. I remember I have tried to exploit it to have 1-based indices inside brackets. Rationale:

The well-known two’s complement formula: n + ~n = -1, applied to n = -i, gives -i + ~-i = -1, i.e. ~-i = i - 1. So, we can write:

  • a[i], 0-based indices from left to right
  • a[~i], 0-based indices from right to left
  • a[-i], 1-based indices from right to left
  • a[~-i], 1-based indices from left to right

The main use case for indices is from left to right. We could take a[~-i] as a Python idiom to indicate 1-based indices (no confusion with previous element as in a[i-1]). The trouble is, ~- is very easily confused with -~, and -~i = i + 1. So, unfortunately, the a[~-i] notation is very error-prone, and thus unsuitable.

Sorry, I lack time to reply to each post separately, especially since many posts converge towards the same ideas. So, I pick-up one of them and I reply to it for all the others.

What is confusing with my proposal ?

I can’t see how we could be confused by 1-based indexing. Is it a problem with the notation itself ? No one is confused by (x, y, z) vs [x, y, z] vs {x, y, z} i.e. tuples vs lists vs sets. Why should we be confused by a(i) vs a[i] vs a{i} i.e. function call vs [0-based] indexing vs 1-based indexing ?

Can there be any nasty interference between 0-based and 1-based indexing ? Please tell me if you find a concrete example, because I can’t see how. The semantics of 0-based indexing is totally preserved by my proposal, and 1-based indexing is a pure addition: the programmer is free to use it or not, there is neither obligation nor prohibition. In particular, the indexing base is not related to an object or a class, 1-based indexing is available exactly when 0-based indexing is available.

I am not willing to use 1-based indices by pleasure but by necessity.

Obviously, I am implicitly accused of preferring 1-based indexing to 0-based indexing. That’s wrong. I’ve always used 0-based indexing (mainly with C, ML and Python) and I’m most often very happy with it. I share this idea expressed in many posts: 0-based index arithmetic is convenient (and so are semi-open slices).

So, why a proposal for 1-based indices ? Because, when you have to implement a math or theoretical CS algorithm, it is surely expressed using 1-based indices. This convention, while arguably not the best, is several hundred years old and is omnipresent. Converting 1-based indices makes the implementation artificially harder. This use case interests the whole Scientific Python community, IMO it would be a pity to deprive ourselves of 1-based indices.

Thanks fo reading.
Olivier.

I just had a look at Wikipedia’s articles about insertion sort and quicksort, and both use 0-based indices. Is that because they’re not theoretical?

Yeah, that’s why no one wants to introduce this headache into our codebases!

2 Likes

FYI, I don’t recall ever discussing this with Guido, but I always assumed he was following Edsger Dijkstra’s brief analysis in this 1982 note. Which is really (in the context of Python) about what range(lo, hi) should mean. Curiously enough, that indexing “should start” with 0 is a consequence of his view of what range() should mean.

I think Antony Jay is right when he states: “In corporate religions as in others, the heretic must be cast out not because of the probability that he is wrong but because of the possibility that he is right.”

Then again, acerbic Dutchmen and their oracles are way out of style now :wink:.

4 Likes

So a{2:} is equivalent to a{2:0} ?

So a[1:-1] will be equivalent to a{2:0} ? IMO the 0 feels really confusing.

I also use 1-based indexing in high-level abstractions. For simple enums, I would simply ignore index == 0, or for more complex algorithms, I might create a subclass for convenience.

I don’t see this as an issue with the language itself; Python provides all the necessary utilities to implement 1-based indexing or any other indexing system, depending on your use case. For example, when working with time references, you could use a 2001-based indexing system for the current millennium.

class OneBasedList(list):
    def __getitem__(self, index):
        # Adjust for 1-based indexing
        if isinstance(index, int):
            if index < 1 or index > len(self):
                raise IndexError("Index out of range")

            return super().__getitem__(index - 1)

        # Allow slicing with 1-based indexing
        elif isinstance(index, slice):
            start = index.start - 1 if index.start is not None else None
            stop = index.stop if index.stop is None else index.stop - 1

            return OneBasedList(
                super().__getitem__(slice(start, stop, index.step)))

        else:
            raise TypeError("Index must be an integer or a slice")

    def __setitem__(self, index, value):
        # Adjust for 1-based indexing
        if index < 1 or index > len(self):
            raise IndexError("Index out of range")

        super().__setitem__(index - 1, value)


# Example usage
one_based_list = OneBasedList([1, 2, 3, 4, 5])
print(one_based_list[1])  # Outputs 1
print(one_based_list[3])  # Outputs 3

# Update a value using 1-based indexing
one_based_list[2] = 25
print(one_based_list)  # Outputs [1, 25, 3, 4, 5]

# Slice with 1-based indexing
print(one_based_list[2:4])  # Outputs [25, 3]

# Using enumerate with start=1 for 1-based indexing
for index, value in enumerate(one_based_list, start=1):
    print(f"Index {index}: Value {value}")
5 Likes