Currently, min() and max() return only the minimum/maximum value, or in cases where the input is a mapping (like a dictionary), only the key. There are many situations where both the value and the associated key or object are needed. This leads to redundant code where users have to call the function twice or perform extra lookups.
Introduce a return_both keyword argument to min() and max(). When set to True, the function returns a tuple containing both the key (or element) and the value used for comparison.
min returns the minimum of an iterable. by default a dictionary will iterate over it’s keys, and thats why it picks the min of the keys, not values. if you want to return something else, pass in a different iterable like dict.items(), and then use the flexible key argument.
If you use min or max on the items() of a dictionary you get the key value pair.
You can also use the key parameter in min or max to compare the values as opposed to the keys:
Thanks everyone! I was thought I couldn’t be the first person with the problem, considering how general it’s. Sorry for the noise.
Thanks for the replies, it’s appreciated!
Edit: Maybe this snippet with a short explanation can be added somewhere in the docstring? What would be a fitting place, if any (considering it’s generalizable to min, max, sorted, and probably more)?
Note really. This allows (no matter how hard you would need to try to mess it up) for min_value to be assigned a value other than data[min_key]. Calling min on data.items() does not, as there is no opportunity to index data independently.
data.items() returns tuples of true key/value pairs. min_value = data[min_key] relies on the programmer to use the correct variable as the index.
Suppose I decided to rename min_key to smallest_key. Using min(data.items(), ...), I only have to make that change in one place; min_key = ...; min_value = data[min_key], there are two places to make that change.
One way how this is better is that it avoids the performance overhead of creating new objects while data.items() has to create a new tuple for each entry in the dict.
I think it only creates a new tuple object for each entry if the values are strictly decreasing, otherwise it reuses the tuple. But filling the tuple and running the lambda function still costs.
Sorry but I don’t get what you’re saying here. dict.itemscreates tuples unconditionally for all dict entries, which miniterates over unconditionally.
The slow bytecode execution of a lambda function is indeed also what slows down the latter code.
The gap closes significantly if it is switched to the C-implemented operator.itemgetter instead:
Oh and we’re talking about per-element costs, so better measure with more than just four elements. For example:
data = {random(): random() for _ in range(10000)}
print(timeit(stefan2, globals=globals(), number=1000))
print(timeit(brass75, globals=globals(), number=1000))
Good point. That has to be caused by the overhead of hashing every key by calling dict.get for all dict entries. With dict.items it’s a direct dump with no hashing needed.
One more thing about “stefan2”: I just walrussed the existing solution. Here’s something actually Stefan-ish
from operator import indexOf
from itertools import islice
def stefan2():
values = data.values()
value = min(values)
index = indexOf(values, value)
key = next(islice(data, index, None))
Seems a bit faster on average than the itemgetter solution on my test data (but the indexOf search can make it slower when the values aren’t just floats but objects that check equality more slowly).
That’s quite clever indeed. Whether it’s faster than the itemgetter solution depends on the index where the minimum value is found. If it’s near the start it gets to save a lot of time by short-circuiting. On average your solution is indeed faster when the minimum value occurs in the middle.
Relying on the index of the value to retrieve the key means that it is highly thread-unsafe though, unlike dict.items, which guarantees the correctness of every key-value mapping it yields.
Yes, middle is also what I did when I said “on average”. More elaborate now, with 20 different positions all over the range:
291 ± 1 ms selfmade
371 ± 1 ms value_then_key
486 ± 7 ms items_itemgetter
777 ± 9 ms key_then_value
994 ± 9 ms items_lambda
Python: 3.12.2 (main, Jun 12 2024, 09:13:57) [GCC 14.1.1 20240522]
(selfmade is new, doesn’t use the min function)
Code
from timeit import timeit
from operator import itemgetter, indexOf
from itertools import islice
from random import random, shuffle
from statistics import mean, stdev
import sys
def key_then_value():
value = data[key := min(data, key=data.get)]
return key, value
def items_lambda():
key, value = min(data.items(), key=lambda x: x[1])
return key, value
def items_itemgetter():
key, value = min(data.items(), key=itemgetter(1))
return key, value
def value_then_key():
values = data.values()
value = min(values)
index = indexOf(values, value)
key = next(islice(data, index, None))
return key, value
def selfmade():
it = iter(data.items())
key, value = next(it)
for k, v in it:
if v < value:
key = k
value = v
return key, value
funcs = [key_then_value, items_lambda, items_itemgetter, value_then_key, selfmade]
data = {random(): random() for _ in range(10000)}
for f in funcs:
print(f())
keys = list(data)[250::500]
def run():
for key in keys:
value = data[key]
data[key] = -1.0
f()
data[key] = value
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e6 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.0f} ± {stdev(ts):2.0f} ms '
for _ in range(100):
shuffle(funcs)
for f in funcs:
t = timeit(run, globals=globals(), number=1) / 1
times[f].append(t / len(keys))
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print('\nPython:', sys.version)