A command to return the maximum index of a list or array - the better len

PLSeuJ · June 12, 2022, 12:46pm

Hello ,

short form: Let’s have something called like lix() that is defined as len()-1, please!
def lix(obj, /): return len(obj) - 1

Motivation: When I started coding in python a few years ago, I did not understand the general use, why len() returns one more than the maximum valid index. Yes it makes sense from the linguistic approach, but coding wise at least I was ever more so often interested in the maximum index of e.g. MyList. However a few hours into coding I was okay with using len()-1 in those cases. I guess most get used to this very fast.
But yesterday I watched The Worst Programming Language Ever - Mark Rendle - NDC Oslo 2021 - YouTube. and somewhere in his talk he points out that this inconvenience exists in any programming language, despite the fact one wants to know the maximum index of a list or array in most of the cases.
So now, that I am not alone on this and recall my first thoughts about this, I asked myself: Why not improve on this? I will definitely define lix() for myself in my codes and I don’t know a good reason, why this should not be an improvement. It is a very lightweight but neat feature and any newcomer will certainly find this useful too.

Why the name lix()? I thought combining len, “index” and also maybe “maximum”, lix() would make a memorisable and sufficiently easy to spell command. However I certainly love to see an even better command for this! After all it is about convenience.

Disclamer: I am absolutely no expert in python and don’t know how big of a deal it is to add something to the standard library. However (programming) languages are changing over time and this simple change seemed apparent when I started coding until I got used to it.
I also don’t want to say python is a bad programming language at all. I actually think quite the opposite is true and I had chosen python as my entry into programming when I started because of its very good reputation.

Happy to know what your thoughts are

merwok · June 12, 2022, 4:35pm

Adding a function like lix is not needed because Python sequences support indexing from the end, so the maximum valid index is simply -1.

_builtin.self · June 20, 2022, 12:27pm

What are the use-cases of lix() that can’t be solved by negative indexing? Isn’t my_list[lix(my_list)] the same as my_list[-1] but longer?

mlgtechuser · July 2, 2022, 10:57am

This may highlight a missed opportunity to define len() as max_index() versus its current definition of count().

In pseudocode…
len() returns the equivalent of ~.count(*) where ‘*’ is a wildcard. Therefore, we could have defined list.count() with no argument as a “count all members” default behavior (and still could). This would free up len()to return a directly indexible result. (Of course, len() is not only used for lists, but also strings, tuples, sets, and dictionaries. Arrays, also. So such a migration would have to be carefully planned and executed.)

Redefining len()would break a lot of code but count() as a built-in function isn’t defined as of Python3.10 so could be put in place as a replacement for len(). After 20 years or so, perhaps len() could be redefined to return the index value without breaking much–perhaps not any–legacy code. At that point the users should know not to try to run the code on anything later than Python 4.9 or something.

Yes, even the modern programmer quickly realizes that a common bug is that your pointer is either one step past your target or one step short of it. Maybe we can evolve to a higher state…

CAM-Gerlach · July 2, 2022, 3:10pm

Taking this post at face value, I’m a little confused. Supposing for a moment that we were going to define a whole new builtin for this, why wouldn’t we instead simply define a max_index() builtin instead and leave len() alone, which would solve the posed problem now instead of in 20 years, not require changing the behavior of an existing builtin and breaking existing code, avoid two redundant builtins that do the same thing, and not confuse existing or future users about why len does not return the length of their collection. This would seem to provide a superset of the benefit for a subset of the cost—have I misunderstood something?

steven.daprano · July 3, 2022, 12:18am

Precision in naming functions is an important skill, functions should be named by what they do and what they mean, not by something kinda sorta nearly but not quite the same.

“Length” implies a contiguous distance, while “count” implies that the items you are counting could be separated from each other:

# Best viewed using a monospaced typeface.
Length:  |<------------- 15 ------------>|
Data:    [ 5 2 4 0 2 9 8 4 9 1 6 3 7 4 8 ]
Count 4:      (1)       (2)         (3)

If you want to count how many of an item is in a data structure, you should call that function count, not “len”, or “length”, or “width”, or “height”, or “size”, or “magnitude”, or any other near-synonym.

It is important to distinguish between len in Python, which is (for builtin classes at least) a near-instantaneous operation that returns the length of the data structure without counting, and an operation which has to inspect each and every item in that data structure and count only the appropriate ones.

count is already a part of the sequence ABC, so all sequence types should have a count method. (If they don’t, that should be reported as a bug.) So in the builtins, tuples, lists, strings, bytes, bytearrays all have a count method.

Dicts and sets don’t support a count method, because it would be pointless: each key or element is either in the dict or set, or it isn’t. It cannot be present more than once.

If you want a fancier count, that can be performed on any iterable:

# Count the number of multiples of 3 or 7.
n = sum( (i % 3 == 0) or (i % 7 == 0) for i in numbers )

# Count the standard English vowels in a word.
n = sum(c.casefold() in set('aeiou') for c in word)

(Fun fact: in English, the rules for what counts as a vowel are actually much more complicated than just “A E I O U”. “Y” is often a vowel; occasionally, even W and R can be vowels!)

Edit: fixed a couple of small typos. Re-inserted a line deleted by Discuss.

steven.daprano · July 3, 2022, 8:00am

Oh I am getting so sick of obscure Discuss bugs in their email interface. Nearly ready to bail out of here.

In my previous post, I included a code block (indented by four spaces) with four lines. The email I received back from Discuss deleted the third line, making the example nonsense as a consequence.

WTF. Now I have to go onto the website to fix it.

The first line of the block was the comment # Best viewed using a monospaced typeface.

The third line (deleted by Discuss) was Data: [ 5 2 4 0 2 9 8 4 9 1 6 3 7 4 8 ].

mlgtechuser · July 4, 2022, 9:52am

Perhaps only that the idea was hypothetical.

Isn’t this what the OP did? Except he called it lix().

I think he was referring to calling up the maximum valid index as a value, not as the pointer to the last item. Perhaps he could provide a use case. I put a little thought into coming up with one but none of them illustrated anything compelling.

vbrozik · July 4, 2022, 12:39pm

I do not think adding another function/method/property would be useful. Anyway there a use-case for the last index value from a recent discussion. It is removal of a_list list items in-place based on a condition:

a_last_index = len(a_list) - 1
for reversed_index, item in enumerate(a_list[::-1]):
    if item[-1].startswith('X1:'):
        del a_list[a_last_index - reversed_index]

The original post where is some context (unrelated to this discussion):

mlgtechuser · July 4, 2022, 12:45pm

I think we’re missing use cases for a function that produces native indexes rather than having to apply an offset of (-1) to get index values. So here’s one:

for idx in range(len(column_locations)-1):
    new_table.append(row[column_locations[idx]:column_locations[idx+1]].rstrip()

Context: These lines segment the values in a column-aligned dataset (example pasted below) into a list using each column start value up to the next column start (and strip the spaces that align the columns). column_start and column_stop are adjacent pairs in the list column_locations that contains the start positions.

Not only is it tedious to remember, but a subsequent reader also has to stop and deduce why the offset was applied, so readability is improved by a self-evident case of indexing. Intent can be shown by the choice of iteration variable name–using idx or indx does help (‘index’, though, is ~~a keyword~~ used in method names and therefore may or may not be a good choice, depending on your aversion to ambiguity–mine is very high). I didn’t include i because it could also mean item.

Header1  Head2  Header3  Head4
123      456    789      012
345      678    901      234

Keys are unique but Dictionary values can be duplicated. Someone might want to iterate the dictionary with count() to find such duplicate values in the key:value pairs. Without a ~.count() method, it takes a few steps. As you point out, counting instances of keys is pointless; that means that a dict.count() could sensibly ignore the keys and address values only.

vbrozik · July 4, 2022, 1:09pm

Note that in this case the natural meaning of len(column_locations)-1 is not the last index of column_locations but the length of column_locations minus one.

range(len(column_locations)) generates all indexes of column_locations.
range(len(column_locations) - 1) generates indexes of column_locations except the last one (like indexes of a list one element shorter).

Here are alternative implementations without need for len() and accessing column_locations items through the [] operator.

for index_start, index_stop in zip(column_locations, column_locations[1:]):
    new_table.append(row[index_start:index_stop].rstrip())

As a bonus this allows us to make it into an iterator accepting column_locations as iterator too:

from itertools import tee, islice

column_locations_iter0, column_locations_iter1 = tee(column_locations)
new_table_iter = (
    row[index_start:index_stop].rstrip()
    for index_start, index_stop
    in zip(column_locations_iter0, islice(column_locations_iter1, 1, None)))

Note: This shows just the possibilities. I have no idea if the iterator form would be beneficial for your case.

steven.daprano · July 4, 2022, 2:03pm

What do you mean by “native indexes”?

As for a hypothetical dict.count method, all other dict methods operate on keys, not values. (Or in some cases, both of them together.):

membership testing checks for the key;
iteration iterates over the keys;
subscripting dict[key] operates on keys;
the get and pop methods take a key as argument;

etc. The reason is that only operations on keys is efficient, and dicts are all about the efficiency.

It would be odd to have one dict method out of so many that operates on values instead of keys.

If you want to operate on the values, you should extract them into a list, or at least an iterable, and work on them:


sum(item == target for item in mydict.values())

list(mydict.values()).count(target)

But really, that’s quite rare. Even list.count is rare. I don’t think I’ve ever used it in real code. Its more something that seems like it should be useful, but when you try to use it, there’s usually a better way.

vbrozik · July 4, 2022, 2:22pm

I noticed this only later. index is neither a keyword nor a builtin. I use it all the time.

$ python3
Python 3.10.4 (main, Apr  2 2022, 09:04:19) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keyword
>>> 'index' in (keyword.kwlist + keyword.softkwlist + dir(__builtins__))
False

mlgtechuser · July 4, 2022, 4:01pm

An integer series corresponding to the length of list that starts at 0 as does the index.

It would be odd to have one dict method out of so many that operates on values instead of keys.

Good point! This would not be a case of “Foolish consistency”.

mlgtechuser · July 4, 2022, 4:09pm

This is what I was referring to:

list.index(x[, start[, end]])
array.index(*x* [, *start* [, *stop* ]])

If not “keyword” then what term applies?

I don’t understand. Are you saying that they aren’t numerically equivalent?

Quercus · July 4, 2022, 4:35pm

In each of those two examples, index is the name of a method.

mlgtechuser · July 4, 2022, 4:41pm

That it is, but is the name of a method not a keyword?

vbrozik · July 4, 2022, 4:47pm

To extend the answer by Quercus: methods (and other class/object attributes) are accessible only within the class/object namespace. So there is no conflict of the names.

Answer to your latest post: Identifiers like names of methods and other variables are not keywords.

By natural meaning I meant the first logical meaning of the expression by taking “the shortest path” of reasoning. They are numerically equivalent. In the expressions below I tried to demonstrate my shortest path of reasoning about the meaning of the expression.

BTW I noticed that the standard library contains itertools.pairwise() this makes the iterator code much shorter and easier to understand:

from itertools import pairwise

new_table_iter = (
    row[index_start:index_stop].rstrip()
    for index_start, index_stop in pairwise(column_locations))

Quercus · July 4, 2022, 4:49pm

Referring to it as a keyword would imply that it is reserved for a particular purpose, and is therefore unavailable as the name of an object, for example, a function or variable.

mlgtechuser · July 4, 2022, 4:54pm

True. I was (mis)using the term a bit too broadly to mean a word already strongly associated with a particular use. I suppose it’s a matter of preference whether to use common method names as variable names. Something to consider…