Add `index: bool` keyword to `min`, `max`, `sorted`

sayandipdutta · August 8, 2024, 4:32am

Hmm. I see, and I agree on the points about order. But the cases where min index is useful, I don’t think this sort of issues are relevant. So the main concern boils down to footguns?

sayandipdutta · August 8, 2024, 4:56am

Also, all the concerns about min,max,sorted with index remains the same for enumerate. On the flipside, the performance and usability (when not using on fringe data structures) benefits would be tremendous if index were included.

Rosuav · August 8, 2024, 5:06am

Hard to say. You’re asking about a highly theoretical situation here; it doesn’t really make sense to ask about the position within a non-sequence iterable, and for a sequence, there are other solutions available (the two-pass search, for example).

sayandipdutta · August 8, 2024, 6:16am

Yes, it does make more sense for sequences. But at the same time, I can easily see it being useful for iterators.

Don’t get me wrong, I am not trying to be headstrong here. I just want to understand the main concern against this feature. I will try to summarize the discussion thus far.

Argument: For arbitrary iterables, iteration cannot be guaranteed to be of stable order.

I (and everyone here for that matter) say, in such cases asking for index does not make sense. And that is precisely why I feel people won’t use it in these situations. It can be a potential footgun for those who are inexperienced in programming in general, but that is true for enumerate as well, and many other concepts (e.g. least astonishment regarding mutability). This would simply have to be a learning opportunity.
Sidenote: numpy.argmin works for sets, sure it first converts it into an array. But I argue it doesn’t make sense there, but numpy allows it because this should be user’s responsibility.

Argument: There are other ways available.

Sure. And it is not as if I will stop using python for the lack of this feature, the alternatives are serviceable if push comes to shove. But in other solutions, either you need to do two passes, which is slow compared to single pass in C, and doesn’t work for iterators. Or you loop in python speed (which, again could be its own topic of discussion). My point is, whenever you iterate on something, there is a “natural order”, I just want min, max, sorted to keep track of it, that is the principle on which enumerate works. Does enumerate bring much to the table? Would zip(count(), iterable) not be enough? It just makes things very clean and intuitive. And the performance benefit of min index is higher than enumerate.

Argument: Only makes sense for Sequence.

*Mostly* makes sense for sequences. I can imagine a scenario where you have an iterator, from which you only need, say argmin, which you will use in other sequences. Having an index to min lets us save space. Although index keyword working only for Sequences, or some sequence method which returns an index based on some key func would be useful enough, I think baking this into min would be more general, and IMO not unreasonable.

If the argument is: the performance penalty for using other work-arounds is insignificant if your chosen language is python, then I don’t have any further points to make.

Again, I am not trying to push the feature, I just want a justification (a strong word?) for why I should use the other (comparatively) slower workarounds, or rely on third party libraries, when an inherent order is available at the time of performing the main action (say finding min), and is essentially free.

Do you really feel this to be a niche use-case? On this being a bloat, I have to fall back to enumerate here. I think enumerate is a universally loved feature even though it is simply zip(count(start), iterable).

OT: Although I will say, if there were a special syntax for just capturing the index of an iterable in a for loop, e.g. for |i|, item in ..., it would be better, because I constantly forget to put enumerate after writing for i, item in iterable.

jamestwebber · August 8, 2024, 1:44pm

For me personally, at least. I rarely need the index of the min/max outside of a situation where I’m already using numpy, and if I do the performance of the pure-python solution is totally fine.

The existence of enumerate^[1] and other convenience functions isn’t a great justification for adding something else. enumerate was added in 2.3–python is in a different place now and decisions are made based on that.

It’s not a totally out-there idea, but I just don’t think it’s worth it. But I’m just someone on the internet, so maybe others will disagree.

which I use far more ↩︎

NeilGirdhar · August 8, 2024, 3:52pm

I think it would really help this proposal if you could find many examples in real code (for example, the Python or numpy codebase). enumerate is beloved because it’s extremely useful (in my code, more often than one in a thousand lines). I couldn’t find any uses of wanting the index in my code.

To me, the other arguments against your proposal aren’t that convincing.

dg-pb · August 8, 2024, 9:13pm

True, but they both do very similar thing. Their performance is very close.

So the proposal is regarding efficiency, but I have not seen any benchmarks yet. How much can be gained?

a = [10, 2, 12, 1, 14] * 20_000 + [0]
%timeit min(a)                                  # 1.5 ms
%timeit a.index(min(a))                         # 2.4 ms
%timeit opr.indexOf(a, min(a))                  # 2.6 ms

So it is maximum 40% improvement.

dg-pb · August 8, 2024, 9:32pm

When I write code I often unify dict and list inputs in the following manner:

def foo(arg):
    if isinstance(arg, dict):
        arg = arg.items()
    else:
        arg = enumerate(arg)

So that both sequences follow the same structure. I.e. pairs of (key/index, value). I don’t think there is a need to break this and make exceptions, but rather follow:

Mo proposal would be 2 changes:
a) Function calls are expensive in python, thus calling min/max/sorted on structures above are very inefficient.

a = [10, 2, 12, 1, 14] * 20_000 + [0]
%timeit consume(enumerate(a))                     # 2.5 ms
%timeit min(enumerate(a), key=opr.itemgetter(1))  # 7.1 ms

This could be optimised by detecting opr.itemgetter presence in C code and instead of calling it every time, just do straight forward access based on its internal index if len(itemgetter._items) == 1.

Alternatively, could allow min(key: int) to serve as the same signal.

b) Detect if input object is wrapped in enumerate and if itemgetter in a) is pointing to index=1, then do internal counting while only iterating through object wrapped in enumerate.

These 2 should theoretically bring it close to optimal performance (that of plain min).

“a)” (itemgetter) would also improve performance on other cases, where the structure is already in the right format for the problem, such as dict.items().

argsort would only benefit from “a)”, because full integer sequence needs to be generated regardless and I don’t think there is much that can be done here. Although user could pre-define ranges for critical problems so that integers don’t need to be generated via range / enumerate / itertools.count every time.

dg-pb · August 8, 2024, 10:10pm

Quick use case search. How much of existing code would be improved by “a)” if it was done for itemgetter?

CPython

rg 'sorted\(.*itemgetter.*\)' | wc -l - 13
sort - 1
min - 0
max - 0

Github

/sorted\(.*itemgetter.*\)/ Language:Python - 78.8k files
sort - 24.6K files
min - 4.7K files
max - 11.7K files

dg-pb · August 8, 2024, 10:19pm

And cases that use both “a)” and “b)”, which would result in 70% runtime decrease:

Github:

/min\(.*enumerate.*itemgetter.*\)/ Language:Python - 1.2K files
max - 1.7K files

But I think these numbers are very conservative as the total number of cases is much higher if people used the correct approach. There are a lot of missed cases, where:
a) People pre-defined enumerate beforehand
b) Used lambda
c) Used alternative methods to find argmin/argmax.

If this is important enough for someone who is willing to make an effort, this would be no-API change optimization, given coredevs approved ofc.

ncoghlan · August 8, 2024, 11:16pm

I haven’t seen the suggestion come up in the thread: how does a “trailing enumeration” perform relative to the other options?

That is, code like: minitem, minidx = min((item, idx) for idx, item in enumerate(iterable))

dg-pb · August 9, 2024, 1:00am

Similar one was already suggested by @Stefan2

It is actually the worst. I guess it is because it has to compare tuples. Here is a full list of timings (with some extra reference points):

import collections as coll
import itertools as itl
import operator as opr
consume = coll.deque(maxlen=0).extend
a = [10, 2, 12, 1, 14] * 20_000 + [0]
IG0 = opr.itemgetter(0)
IG1 = opr.itemgetter(1)
%timeit min(a)                                  #  1.5 ms
%timeit consume(enumerate(a))                   #  2.5 ms
%timeit a.index(min(a))                         #  2.4 ms
%timeit opr.indexOf(a, min(a))                  #  2.6 ms
%timeit min(range(len(a)), key=a.__getitem__)   #  5.0 ms
%timeit consume(enumerate(a))                   #  2.5 ms
%timeit min(enumerate(a), key=IG1)              #  7.1 ms
%timeit min(zip(a, itl.count()), key=IG0)       #  7.4 ms
%timeit min(zip(a, itl.count()))[1]             #  8.5 ms
%timeit min((v, i) for i, v in enumerate(a))    # 13.0 ms

sayandipdutta · August 9, 2024, 3:59am

I think improving min(iterable_of_tuples, key=itemgetter) would be great. And this should be the idiomatic way to calculate argmin/max/sorted. This will be more general than min(index: bool).
I think people expect one pass to be faster than two passes. The most idiomatic way should support that theory.

I also like the idea of key: int as a shorthand for itemgetter.

Could the new JIT help in these cases?

dg-pb · August 9, 2024, 4:25am

I don’t think so.

dg-pb · August 9, 2024, 5:40am

The key for performance would be to unwrap enumerate object.

I have just tested it and it would literally make argmin performance same as min:

Before:

$PYMAIN -m timeit -s 'import operator as opr; a = list(range(5)) * 20_000 + [0]' 'min(enumerate(a), key=opr.itemgetter(1))'
50 loops, best of 5: 7.95 msec per loop

After

~ ❯ $PYMAIN -m timeit -s 'a = list(range(5)) * 20_000 + [0]' 'min(enumerate(a), key=1)'
200 loops, best of 5: 1.36 msec per loop
~ ❯ $PYMAIN -m timeit -s 'a = list(range(5)) * 20_000 + [0]' 'min(a)'
200 loops, best of 5: 1.34 msec per loop

Allowing int key is a prerequisite for this, otherwise there is no way for it to know what to do. On its own it offers some performance as well:

~ ❯ $PYMAIN -m timeit -s 'import operator as opr; a = list(enumerate(list(range(5)) * 20_000 + [0]))' 'min(a, key=opr.itemgetter(1))'
100 loops, best of 5: 2.56 msec per loop

~ ❯ $PYMAIN -m timeit -s 'a = list(enumerate(list(range(5)) * 20_000 + [0]))' 'min(a, key=1)'
200 loops, best of 5: 1.74 msec per loop

dg-pb · August 9, 2024, 7:16am

Finally, this would end my quest of efficient item counting (see Challenge: Quest for counting iterator length at C speed):

$PYMAIN -m timeit -s 'a = list(range(5)) * 20_000 + [0]' 'max(enumerate(a), key=0)[0] + 1'
500 loops, best of 5: 445 usec per loop

NeilGirdhar · August 9, 2024, 10:20am

FYI: implementing argmax in Python - Stack Overflow

Also, I think all the speed in the world doesn’t really matter if this rarely comes up.

dg-pb · August 9, 2024, 5:57pm

Opened improvement suggestion on github: Optimization of argmin/argmax/argsort · Issue #122863 · python/cpython · GitHub

jamestwebber · August 9, 2024, 6:15pm

I don’t get the impression from this thread that this is at all worth doing. Special-casing on the specific usage is very rarely going to help.

To clarify what I meant when I posted above, I wasn’t saying “it’d be useful to add lots of special cases to the interpreter”.

I was referring to the broader efforts and explorations at speeding up the interpreter generally, via strategies like the adaptive interpreter, tagged pointers, (someday) the JIT, etc. Those projects can speed up everything without needing to optimize every single use case.

ZeroIntensity · August 9, 2024, 6:27pm

Agreeing with James, special-casing is going the wrong direction in terms of performance, because typically they slow down the other cases (as there’s an extra check in the way beforehand).

Better performance in pure-Python is nice, but if you really want more speed, I would just use Cython or write a C extension.