Allow `min()` and `max()` to return both the key and value

EwoutH · September 5, 2024, 10:40am

Problem

Currently, min() and max() return only the minimum/maximum value, or in cases where the input is a mapping (like a dictionary), only the key. There are many situations where both the value and the associated key or object are needed. This leads to redundant code where users have to call the function twice or perform extra lookups.

data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}

# Current behavior
min_key = min(data, key=data.get)  # 'd'
min_value = data[min_key]  # 1

Solution

Introduce a return_both keyword argument to min() and max(). When set to True, the function returns a tuple containing both the key (or element) and the value used for comparison.

# Proposed behavior with return_both=True
min_key, min_value = min(data, key=data.get, return_both=True)  # ('d', 1)

Benefits

Simplifies code when both the element and the comparison value are needed.
Reduces the need for extra lookups or redundant calls to min() and max().
Potentially increases performance due to an eliminated dict lookup.

Backward Compatibility

The default behavior remains unchanged. Only when return_both=True is explicitly set will a tuple be returned.

I’m open to suggestions for naming for the return_both keyword argument.

massover · September 5, 2024, 10:51am

>>> data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
>>> list(data)
['a', 'b', 'c', 'd']
>>> min(data)
'a'
>>> key, value = min(data.items(), key=lambda x: x[0])
>>> key
'a'
>>> value
5

min returns the minimum of an iterable. by default a dictionary will iterate over it’s keys, and thats why it picks the min of the keys, not values. if you want to return something else, pass in a different iterable like dict.items(), and then use the flexible key argument.

MegaIng · September 5, 2024, 10:55am

See Add `index: bool` keyword to `min`, `max`, `sorted`

brass75 · September 5, 2024, 1:12pm

>>> data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
>>> min(data.items())
('a', 5)

If you use min or max on the items() of a dictionary you get the key value pair.
You can also use the key parameter in min or max to compare the values as opposed to the keys:

>>> min(data.items(), key=lambda x: x[1])
('d', 1)

This functionality exists already although you do have to know how to do it.

EwoutH · September 6, 2024, 10:34am

Thanks everyone! I was thought I couldn’t be the first person with the problem, considering how general it’s. Sorry for the noise.

Thanks for the replies, it’s appreciated!

Edit: Maybe this snippet with a short explanation can be added somewhere in the docstring? What would be a fitting place, if any (considering it’s generalizable to min, max, sorted, and probably more)?

min(data.items(), key=lambda x: x[1])

Wombat · September 10, 2024, 7:17am

Your baseline solution was better in almost every way:

data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
min_key = min(data, key=data.get)
min_value = data[min_key]

chepner · September 10, 2024, 1:07pm

Note really. This allows (no matter how hard you would need to try to mess it up) for min_value to be assigned a value other than data[min_key]. Calling min on data.items() does not, as there is no opportunity to index data independently.

min_key, min_value = min(data.items(), key=lambda x: x[1])

Stefan2 · September 10, 2024, 1:58pm

How so? It literally ends with min_value = data[min_key].

chepner · September 10, 2024, 2:02pm

data.items() returns tuples of true key/value pairs. min_value = data[min_key] relies on the programmer to use the correct variable as the index.

Suppose I decided to rename min_key to smallest_key. Using min(data.items(), ...), I only have to make that change in one place; min_key = ...; min_value = data[min_key], there are two places to make that change.

Stefan2 · September 10, 2024, 2:14pm

Ok but that’s different. Anyway, Zeke did say “almost every way”.

Another, with only one min_key (but it’s somewhat hidden):

data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
min_value = data[min_key := min(data, key=data.get)]

blhsing · September 11, 2024, 1:09am

One way how this is better is that it avoids the performance overhead of creating new objects while data.items() has to create a new tuple for each entry in the dict.

from timeit import timeit

def stefan2():
    min_value = data[min_key := min(data, key=data.get)]

def brass75():
    key, value = min(data.items(), key=lambda x: x[1])

data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
print(timeit(stefan2, globals=globals())) # outputs 0.308526000007987
print(timeit(brass75, globals=globals())) # outputs 0.5086688000010327

Stefan2 · September 11, 2024, 4:49am

I think it only creates a new tuple object for each entry if the values are strictly decreasing, otherwise it reuses the tuple. But filling the tuple and running the lambda function still costs.

blhsing · September 11, 2024, 5:34am

Sorry but I don’t get what you’re saying here. dict.items creates tuples unconditionally for all dict entries, which min iterates over unconditionally.

The slow bytecode execution of a lambda function is indeed also what slows down the latter code.

The gap closes significantly if it is switched to the C-implemented operator.itemgetter instead:

from timeit import timeit
from operator import itemgetter

def stefan2():
    min_value = data[min_key := min(data, key=data.get)]

def brass75():
    key, value = min(data.items(), key=itemgetter(1))

data = {'a': 5, 'b': 2, 'c': 7, 'd': 1}
print(timeit(stefan2, globals=globals())) # outputs 0.2945460000773892
print(timeit(brass75, globals=globals())) # outputs 0.3461032999912277

Stefan2 · September 11, 2024, 6:09am

That’s not the items view/iterator, that’s building a list of all items at once.

blhsing · September 11, 2024, 6:25am

Ah I see the condition now. Thanks.

Stefan2 · September 11, 2024, 6:33am

Oh and we’re talking about per-element costs, so better measure with more than just four elements. For example:

data = {random(): random() for _ in range(10000)}
print(timeit(stefan2, globals=globals(), number=1000))
print(timeit(brass75, globals=globals(), number=1000))

Attempt This Online!

I actually get slower times for the 2-steps solution when compared to the itemgetter version (but a little faster compared to the lambda version).

blhsing · September 11, 2024, 6:39am

Good point. That has to be caused by the overhead of hashing every key by calling dict.get for all dict entries. With dict.items it’s a direct dump with no hashing needed.

Stefan2 · September 11, 2024, 7:06am

One more thing about “stefan2”: I just walrussed the existing solution. Here’s something actually Stefan-ish

from operator import indexOf
from itertools import islice

def stefan2():
    values = data.values()
    value = min(values)
    index = indexOf(values, value)
    key = next(islice(data, index, None))

Seems a bit faster on average than the itemgetter solution on my test data (but the indexOf search can make it slower when the values aren’t just floats but objects that check equality more slowly).

(I actually offered that for statistics.mode once.)

blhsing · September 11, 2024, 8:27am

That’s quite clever indeed. Whether it’s faster than the itemgetter solution depends on the index where the minimum value is found. If it’s near the start it gets to save a lot of time by short-circuiting. On average your solution is indeed faster when the minimum value occurs in the middle.

Relying on the index of the value to retrieve the key means that it is highly thread-unsafe though, unlike dict.items, which guarantees the correctness of every key-value mapping it yields.

Stefan2 · September 11, 2024, 1:23pm

Yes, middle is also what I did when I said “on average”. More elaborate now, with 20 different positions all over the range:

   291 ±  1 ms  selfmade
   371 ±  1 ms  value_then_key
   486 ±  7 ms  items_itemgetter
   777 ±  9 ms  key_then_value
   994 ±  9 ms  items_lambda

Python: 3.12.2 (main, Jun 12 2024, 09:13:57) [GCC 14.1.1 20240522]

(selfmade is new, doesn’t use the min function)

Code

from timeit import timeit
from operator import itemgetter, indexOf
from itertools import islice
from random import random, shuffle
from statistics import mean, stdev
import sys

def key_then_value():
    value = data[key := min(data, key=data.get)]
    return key, value

def items_lambda():
    key, value = min(data.items(), key=lambda x: x[1])
    return key, value

def items_itemgetter():
    key, value = min(data.items(), key=itemgetter(1))
    return key, value

def value_then_key():
    values = data.values()
    value = min(values)
    index = indexOf(values, value)
    key = next(islice(data, index, None))
    return key, value

def selfmade():
    it = iter(data.items())
    key, value = next(it)
    for k, v in it:
        if v < value:
            key = k
            value = v
    return key, value

funcs = [key_then_value, items_lambda, items_itemgetter, value_then_key, selfmade]

data = {random(): random() for _ in range(10000)}

for f in funcs:
    print(f())

keys = list(data)[250::500]
def run():
    for key in keys:
        value = data[key]
        data[key] = -1.0
        f()
        data[key] = value

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e6 for t in sorted(times[f])[:5]]
    return f'{mean(ts):6.0f} ± {stdev(ts):2.0f} ms '
for _ in range(100):
    shuffle(funcs)
    for f in funcs:
        t = timeit(run, globals=globals(), number=1) / 1
        times[f].append(t / len(keys))
for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)

print('\nPython:', sys.version)

Attempt This Online!