Add `index: bool` keyword to `min`, `max`, `sorted`

sayandipdutta · August 7, 2024, 8:34pm

It is often useful to calculate the minimum index, maximum index, or sorted indices of an iterable, similar to argmin, argmax, argsort provided by numpy like modules. The current solutions (AFAIK) are quite slow if compared to what it would be if it were builtin. I doubt I will need to convince anyone about the usefulness of argmin, argmax, argsort, surely everyone here has needed such code, which is why I feel this must have been discussed before? If yes, I have not been able to find any reference to such discussion(s).

Current Solutions

Here are the current solutions worth considering:

min(range(len(finite_iterable)), key=finite_iterable.__getitem__)
# iterates twice, only works with sequences, fast
min(enumerate(finite_iterable),  key=itemgetter(1))[0] 
# significantly slower than above, which is slower than `min(finite_iterable)`
finite_iterable.index(min(finite_iterable))
# iterates twice, only valid for sequences

Proposal

The proposal is to add a new keyword-only argument index: bool which, if True, would return something like a IndexedResult/ComparisonResult object, with roughly the following signature:

@dataclass(frozen=True, slots=True)
class Indexed[T, I: (int, list[int])]:
    value: T
    index: I

@overload
def min[T: Comparable, V](
    iterable: Iterable[T], *, default: V, key, index: Literal[True]
) -> Indexed[T | V, int]: ...

@overload
def min[T: Comparable, *Ts=Comparable](
    arg1: T, arg2: T, *args: *Ts, key, index: Literal[True]
) -> Indexed[T, int]: ...

@overload
def sorted[T: Comparable](
    iterable: Iterable[T], *, key, reverse, index: Literal[True]
) -> Indexed[list[T], list[int]]: ...

Even though I haven’t been able to find reference to past discussion(s) on this topic, I am convinced this must have come up. If yes, I will search harder.

jamestwebber · August 7, 2024, 9:07pm

Sayandip Dutta:

min(range(len(finite_iterable)), key=finite_iterable.__getitem__)
# only works with sequences, fast
min(enumerate(finite_iterable),  key=itemgetter(1))[0] 
# significantly slower than above, which is slower than `min(finite_iterable)`

If the iterable is not a sequence, the ordering of the elements is not necessarily defined. It might be defined (as in dict) or it might be reliable due to implementation details, but it seems like it might be introducing a footgun if min etc might assume the input has a reliable order.

In any case, here’s another couple ways to accomplish this, both were faster than your examples on my machine:

# if you have a sequence
finite_iterable.index(min(finite_iterable))
# if you don't but you _really_ want the index for some reason
min_it = min(finite_iterable)
next(i for i,it in enumerate(finite_iterable) if it == min_it)

bschubert · August 7, 2024, 9:19pm

Related past discussion about adding argmin/argmax methods to list:

https://mail.python.org/archives/list/python-ideas@python.org/thread/NDEJQUABSNJSK76UDKHFTFW7YIFXJN5B/#BIT3CJZTC3NM4U2ALCS3JALH5KUPYVBJ

Stefan2 · August 8, 2024, 1:22am

Try operator.indexOf(finite_iterable, min(finite_iterable)).

Wombat · August 8, 2024, 1:51am

>>> def argmin(seq):
...     return min(range(len(seq)), key=seq.__getitem__)
...
>>> argmin([10, 2, 12, 1, 14, 5])
3

Stefan2 · August 8, 2024, 1:54am

What about that? That’s already in the OP.

Stefan2 · August 8, 2024, 1:56am

Is the first solution “quite slow” or is it “fast”? You say both. For argsort, I suspect it really is fast and doubt the parameter would make it much faster.

Stefan2 · August 8, 2024, 2:07am

Although the genexp could use it is min_it instead of it == min_it. Could be faster, if == is expensive.

Stefan2 · August 8, 2024, 2:12am

One more: min(zip(finite_iterable, itertools.count()))[1])

sayandipdutta · August 8, 2024, 3:14am

I guess you mean things like set? So when you say unreliable ordering, what exactly do you mean? Sure, you can’t expect a set to have elements in the same order as you defined. But once defined, whatever the internal order is, remains unchanged, unless the set is modified. In which case even plain min would be unreliable.
Am I misunderstanding something?

I have this one in my post. Yes, I see how the other one would be faster.

sayandipdutta · August 8, 2024, 3:32am

It is fast compared to other two. Could be faster if len is known in advance. I expect your indexOf solution to be fast as well. But doesn’t work for non-sequence iterable.

As far as I am concerned, for unordered sequences, min should work like this:

s = random_set
l = list(s)

assert min(s, index=True) == min(l, index=True)

If for some other reason, it is still not acceptable, then simply require the iterable to be hashable as well. But I don’t see why that would be required.

Stefan2 · August 8, 2024, 3:45am

Hmm? What makes you think it doesn’t?

Rosuav · August 8, 2024, 3:56am

Is that guaranteed anywhere? It always was for dictionaries, but not for sets.

jamestwebber · August 8, 2024, 4:03am

Via duck typing, lots of objects can act as an iterable, by implementing the necessary interface. They aren’t required to support indexing, and they aren’t required to return in a consistent order. Consider for instance a collection of asynchronous listeners–perhaps iterating over them returns based on time order, as they receive the signals they are listening for.

Using index wouldn’t break in that instance, but it might lead to confusing bugs if the user wasn’t careful. Basically, I’m just saying that when functions support generic iterables, they shouldn’t have options that only work for some iterables.

Given that this particular option is trivial to do in Python anyway, I think that confusing behavior isn’t worth it.

sayandipdutta · August 8, 2024, 4:04am

IndexOf calls ‘sequence.index’ under the hood doesn’t it? And surely, for iterators, the first min will consume it.

Maybe I am wrong, but the guarantee for dicts is it maintains the order oit was created in. And set is not guaranteed. But I expect once data is stored in a particular order, it doesn’t change for readonly operations. Is this not true, or guaranteed? Is there any possible way this could not ne true?

This essentially defeats the purpose. If the notion of index is troublesome for set like things, better call it position.

jamestwebber · August 8, 2024, 4:06am

One more thought–in general, I think it’d be more interesting to understand why the pure Python version isn’t fast enough, and fix that problem. Adding special options to speed up particular niche use-cases seems like a recipe for bloat.

sayandipdutta · August 8, 2024, 4:09am

I see your points, regarding asynchronous iterables. Although I guess the user already has a footgun in their hand if they don’t know what they are using, without need min with index. But point taken.

And yes, if the other options for non-sequence iterables were competitive enough, I would be glad to use them.

The problem in general seems like either there are two iterations involved, or the loop is run in python and not c.

Stefan2 · August 8, 2024, 4:12am

No, it doesn’t.

sayandipdutta · August 8, 2024, 4:20am

Sorry, I misspoke. I meant to say, I understand it requires a sequence as first argument. But I stand corrected. It does support non-sequence iterable. However, it would only work if min value is known in advance without consuming the iterable.

Rosuav · August 8, 2024, 4:23am

The guarantee currently is that it maintains insertion order. The old guarantee was that, if you called keys() and values() without mutating the dictionary in between, they would be in the same order - which effectively mandates that you can call keys() and keys() and they’ll be in the same order.

It all depends on how the data is stored internally. When the iteration order isn’t guaranteed, it’s allowed to change at any time, for any reason. For example, a Python implementation could theoretically use a splay tree for its set implementation; and if the iteration order is “start at the root, breadth-first search”, your effective iteration order will broadly correspond to how recently each item was accessed.