# Minmax function alongside min and max

Ruud van der Ham suggested:

``````if key is None:
key = lambda x:x
``````

Using an identity function `lambda x: x` is needlessly slow. At least
that’s what I found when I benchmarked it a few years ago, and I don’t
expect that the situation has changed enough for me to bother
benchmarking it again

I have to step away from the keyboard now, but when I return I’ll post
the version of minmax I’ve been using.

Steven, see my solution above.

I think we can just return default as such, which should be a tuple of the default minimum and maximum.

``````_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
args_len = len(args)

if args_len == 0:
if default is _sentinel:
raise ValueError("minmax expected 1 or more argument(s), got 0")
return default
elif args_len == 1:
seq = args[0]
else:
seq = args

it = iter(seq)

vmax = vmin = next(it, _sentinel)
if vmax is _sentinel:
if default is _sentinel:
raise ValueError("minmax arg is an empty sequence")
return default

if key is None:
for val in it:
if val > vmax:
vmax = val
elif val < vmin:
vmin = val
return vmin, vmax

fmax = fmin = key(vmax)

for val in it:
fval = key(val)

if fval > fmax:
fmax = fval
vmax = val
elif fval < fmin:
fmin = fval
vmin = val

return vmin, vmax
``````

Then we can say:

``````minimum, maximum = minmax(seq, default=(math.inf, -math,inf))
``````

EDIT Fixed an error in the code.

…naaaaaaaaaaaaaa! I don’t like it

Anyway, why do you need that strange `_sentinel` instead of a simple `None`? It’s good for the `next()`, but I don’t understand why it’s used also for `default`.

PS: `max(default=5)` returns error, not the default value.

Because one might want None as the default. This the same behaviour as min and max.

Thank you for pointing that out. I have fixed it.

Serhiy Storchaka wrote:

“I would want to see several examples. I have not found any use of this
pattern in the stdlib.”

Do you accept that people sometimes want the minimum or the maximum of
a data set? If your answer is “Yes”, then it shouldn’t be hard to
believe that sometimes people might want the minimum AND the maximum.

Essentially any time you want to know both the min and max, using
seperate function calls is wasteful.

(But not as wasteful as one solution I’ve seen, which is to sort the
data then return the first and last item.)

Calculating the min and max in a single pass, instead of two passes, may
be especially useful whenever:

• the cost of comparisons is high: a single pass can calculate both
values in approximately 25% fewer comparisons.

• the cost of iteration is significant (making two passes is twice as
expensive as making one pass);

• or the first pass exhausts the iterator, leaving nothing for the
second pass.

• the range of data
• the midrange
• figuring out line lengths for wrapping
• using a dict as a sparse array
• dealing with time intervals
• plus additional examples of working code found by Ava.

In addition, minmax is provided by the C++ Boost library:

https://www.boost.org/doc/libs/1_51_0/libs/algorithm/minmax/

Eight years ago, Raymond Hettinger posted a recipe:

http://code.activestate.com/recipes/577916-fast-minmax-function/

LINQ has accumulate, which can be used to calculate the minimum and
maximum:

although I’m unsure whether that works in a single pass or two passes. I
believe it is a single pass, but I may be mistaken.

Here are examples of people asking how to do this using Spark and numpy:

1 Like

Here is my implementation using @rhettinger 's algorithms to use on an average 1.5 comparisons per item. This version supports the keyword parameters `key` and `default`.

``````
import itertools
_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
args_len = len(args)

if args_len == 0:
if default is _sentinel:
raise ValueError("minmax expected 1 or more argument(s), got 0")
return default
elif args_len == 1:
seq = args[0]
else:
seq = args

it = iter(seq)

vmax = vmin = next(it, _sentinel)
if vmax is _sentinel:
if default is _sentinel:
raise ValueError("minmax arg is an empty sequence")
return default

if key is None:
for val0, val1 in itertools.zip_longest(it, it, fillvalue=vmin):
if val0 > val1:
val0, val1 = val1, val0
if val0 < vmin:
vmin = val0
if val1 > vmax:
vmax = val1
return vmin, vmax

fmax = fmin = key(vmax)

for val0, val1 in itertools.zip_longest(it, it, fillvalue=vmin):
fval0 = key(val0)
fval1 = key(val1)
if fval0 > fval1:
val0, val1 = val1, val0
fval0, fval1 = fval1, fval0
if fval0 < fmin:
fmin = fval0
vmin = val0
if fval1 > fmax:
fmax = fval1
vmax = val1

return vmin, vmax
``````

Thank you Ava! Seems your search skills are better than mine. Indeed, there are few use cases for `minmax()` in the stdlib and in the third-party code. Not much, but there are.

This one is not related because `min()` and `max()` are called with different arguments.

Some of these examples are not related too. But there are related.

We do not add all functions that might be used by some people. The question is how common is such problem. Ava found 4 cases in the stdlib. Most of your arguments are not applicable to these cases, because in all cases we iterate lists/tuples of strings/numbers. Although it might speed up two cases (`commonpath()` and `commonprefix()`) by few percents. Are 4 cases enough to add a new function in `itertools`? I left this on to Raymond.

If I can suggest, I think a good place is `cmath`.

`cmath` is for functions operating on, or returning, complex numbers. So it wouldn’t be a good fit for a `minmax` function.

There was also similar proposal with some discussion on an efficient implementation :

https://mail.python.org/pipermail/python-ideas/2010-October/008221.html

Ah excuse me, I never read the docs and I wrongly assumed that it was an accelerated version of `math` with C

Well, so I suggest `math`

PS: just for curiosity, why a different package for complex numbers? `math` should not simply used function overloading and duck typing?

Well, I know that Raymond Hettinger is a optimum programmer… but I do not understand at all the usefulness of `zip_longest`:

1. it zips the same iterator. So the length is the same, `zip` is sufficient
2. even if you `zip`, you’re zipping the same iterator… so `val0 is val1`. It’s impossible that `val0 > val1`
3. since `val0 is val1`, if `val0 < vmin`, it’s impossible that `val1 > vmax`, so it should be preceded by an `elif`, to accelerate the code.
4. If you change all of this, you simply return to your previous version of the code

Am I missing something?

PS: furthermore, my hope is this function will be translated in CPython, so the use of itertools is problematic.

The most important difference is in the handling of inputs of type `float` (or `int`). The vast majority of Python code does not expect to work with complex numbers, and so for example for most users it makes sense for `math.sqrt(-2.0)` to raise `ValueError`. Users who work in a domain where complex numbers make sense can use `cmath.sqrt` instead, which will return an appropriate complex output for `cmath.sqrt(-2.0)`.

(Yes, I know that the `**` operator already returns complex outputs for non-complex inputs in some cases. IMO, that was a (minor) design mistake, but we’re getting off-topic for this thread.)

Yes!
The two zipped iterators are not the same! The first is always one step ahead of the second. Therefore, the zip_longest function returns pairs in sequence. Clever trick, but it’s not magic …

If this is ever to be implemented in CPython as Python, then itertools is no problem at all, since it is part of the standard library. If it will be a C-function, it will be completely different anyway.

Being of stdlib of Python does not mean that exists a CPython API for `itertools`… I don’t know this. Anyway, I’ve done benchmark of my solution and the Hettinger’s solution, and the Hettinger’s solution, even if brilliant, is slower:

``````marco@buzz:~\$ python3.9 -m pyperf timeit "minmax(range(100000))" --rigorous --affinity 0 -s 'MY_CODE'
Mean +- std dev: 5.15 ms +- 0.16 ms
marco@buzz:~\$
marco@buzz:~\$ python3.9 -m pyperf timeit "minmax(range(100000))" --rigorous --affinity 0 -s 'HETTINGER_CODE'
Mean +- std dev: 6.35 ms +- 0.23 ms
``````

and I don’t understand how can it be faster zipping two iterators and do a triple comparison instead of simply iterate one object and do one or at max two comparisons…

A little modification: `max()` and `min()` suffers by a subtle bug: if the first elements are NaNs, NaN is returned.

Why? Because the first element is taken as the default value for max and min. After the functions check if the second element is greater or lesser, and if yes, they substitute the max or min value with the second value, and so on.

The problem is NaN returns always zero for every comparison. So the first element, a NaN, will be never greater or lesser than the other numbers in the sequence, and will be never replaced.

This is the code that fixes this bug:

``````_sentinel = object()

def minmax(*args, key=None):
args_len = len(args)

if args_len == 0:
fname = minmax.__name__
raise ValueError(f"{fname}() expected 1 argument, got 0")
elif args_len == 1:
seq = args[0]
else:
seq = args

it = iter(seq)

vmax = vmin = next(it, _sentinel)

if vmax is _sentinel:
fname = minmax.__name__
raise ValueError(f"{fname}() arg is an empty sequence")

first_comparable = False

if key is None:
for val in it:
if val > vmax:
vmax = val
first_comparable = True
elif not first_comparable and not val <= vmax:
# not comparable object, like NaN
vmax = vmin = val
continue
elif val < vmin:
vmin = val
first_comparable = True
else:
fmax = fmin = key(vmax)

for val in it:
fval = key(val)

if fval > fmax:
fmax = fval
vmax = val
first_comparable = True
elif not first_comparable and not fval <= fmax:
fmax = fmin = fval
vmax = vmin = val
continue
elif fval < fmin:
fmin = fval
vmin = val
first_comparable = True

return (vmin, vmax)

``````

This slows down the function, but only a little (5.57 ms ± 0.24 ms)

I hope this simple fix will be applied also to `min()` and `max()`.

I wouldn’t consider this to be a bug, or at least not one that should be fixed by altering the implementation of `min` or `max`. If the elements given to `max` or `min` aren’t totally ordered, then there isn’t a meaningful result to be given; consider this a case of undefined behaviour. `sort` and `sorted` have the same issue.

1 Like

`numpy` does not suffer by this bug. And sorry, it’s a bug and IMHO should be fixed. And the fix, as you can read, is very simple.

It’s not true. This happens only if the first elements have comparison operators that returns always `False`. If there’s a NaN in the middle of the iterator, there’s no problem. If this is not a bug… and it’s an old known bug.