# Minmax function alongside min and max

Ah excuse me, I never read the docs and I wrongly assumed that it was an accelerated version of `math` with C

Well, so I suggest `math`

PS: just for curiosity, why a different package for complex numbers? `math` should not simply used function overloading and duck typing?

Well, I know that Raymond Hettinger is a optimum programmer… but I do not understand at all the usefulness of `zip_longest`:

1. it zips the same iterator. So the length is the same, `zip` is sufficient
2. even if you `zip`, you’re zipping the same iterator… so `val0 is val1`. It’s impossible that `val0 > val1`
3. since `val0 is val1`, if `val0 < vmin`, it’s impossible that `val1 > vmax`, so it should be preceded by an `elif`, to accelerate the code.
4. If you change all of this, you simply return to your previous version of the code

Am I missing something?

PS: furthermore, my hope is this function will be translated in CPython, so the use of itertools is problematic.

The most important difference is in the handling of inputs of type `float` (or `int`). The vast majority of Python code does not expect to work with complex numbers, and so for example for most users it makes sense for `math.sqrt(-2.0)` to raise `ValueError`. Users who work in a domain where complex numbers make sense can use `cmath.sqrt` instead, which will return an appropriate complex output for `cmath.sqrt(-2.0)`.

(Yes, I know that the `**` operator already returns complex outputs for non-complex inputs in some cases. IMO, that was a (minor) design mistake, but we’re getting off-topic for this thread.)

Yes!
The two zipped iterators are not the same! The first is always one step ahead of the second. Therefore, the zip_longest function returns pairs in sequence. Clever trick, but it’s not magic …

If this is ever to be implemented in CPython as Python, then itertools is no problem at all, since it is part of the standard library. If it will be a C-function, it will be completely different anyway.

Being of stdlib of Python does not mean that exists a CPython API for `itertools`… I don’t know this. Anyway, I’ve done benchmark of my solution and the Hettinger’s solution, and the Hettinger’s solution, even if brilliant, is slower:

``````marco@buzz:~\$ python3.9 -m pyperf timeit "minmax(range(100000))" --rigorous --affinity 0 -s 'MY_CODE'
Mean +- std dev: 5.15 ms +- 0.16 ms
marco@buzz:~\$
marco@buzz:~\$ python3.9 -m pyperf timeit "minmax(range(100000))" --rigorous --affinity 0 -s 'HETTINGER_CODE'
Mean +- std dev: 6.35 ms +- 0.23 ms
``````

and I don’t understand how can it be faster zipping two iterators and do a triple comparison instead of simply iterate one object and do one or at max two comparisons…

A little modification: `max()` and `min()` suffers by a subtle bug: if the first elements are NaNs, NaN is returned.

Why? Because the first element is taken as the default value for max and min. After the functions check if the second element is greater or lesser, and if yes, they substitute the max or min value with the second value, and so on.

The problem is NaN returns always zero for every comparison. So the first element, a NaN, will be never greater or lesser than the other numbers in the sequence, and will be never replaced.

This is the code that fixes this bug:

``````_sentinel = object()

def minmax(*args, key=None):
args_len = len(args)

if args_len == 0:
fname = minmax.__name__
raise ValueError(f"{fname}() expected 1 argument, got 0")
elif args_len == 1:
seq = args[0]
else:
seq = args

it = iter(seq)

vmax = vmin = next(it, _sentinel)

if vmax is _sentinel:
fname = minmax.__name__
raise ValueError(f"{fname}() arg is an empty sequence")

first_comparable = False

if key is None:
for val in it:
if val > vmax:
vmax = val
first_comparable = True
elif not first_comparable and not val <= vmax:
# not comparable object, like NaN
vmax = vmin = val
continue
elif val < vmin:
vmin = val
first_comparable = True
else:
fmax = fmin = key(vmax)

for val in it:
fval = key(val)

if fval > fmax:
fmax = fval
vmax = val
first_comparable = True
elif not first_comparable and not fval <= fmax:
fmax = fmin = fval
vmax = vmin = val
continue
elif fval < fmin:
fmin = fval
vmin = val
first_comparable = True

return (vmin, vmax)

``````

This slows down the function, but only a little (5.57 ms ± 0.24 ms)

I hope this simple fix will be applied also to `min()` and `max()`.

I wouldn’t consider this to be a bug, or at least not one that should be fixed by altering the implementation of `min` or `max`. If the elements given to `max` or `min` aren’t totally ordered, then there isn’t a meaningful result to be given; consider this a case of undefined behaviour. `sort` and `sorted` have the same issue.

1 Like

`numpy` does not suffer by this bug. And sorry, it’s a bug and IMHO should be fixed. And the fix, as you can read, is very simple.

It’s not true. This happens only if the first elements have comparison operators that returns always `False`. If there’s a NaN in the middle of the iterator, there’s no problem. If this is not a bug… and it’s an old known bug.

In the general case of some arbitrary values without total ordering, I
would agree, but NANs are somewhat special.

The 2008 revision to IEEE-754 specifies that min and max should ignore
NANs:

``````min(NAN, x) = min(x, NAN) = x
``````

and similarly for max. Only in the case that both arguments are NANs
should a NAN be returned.

One might also argue for min/max functions that propogate NANs,
returning NAN if any of the arguments are NAN.

Python’s status quo, where the result depends on which argument is a
NAN, matches neither of these behaviours, and is surely not desired by
anyone.

I think that it is easy to fix: just skip over any leading NANs, then
proceed as normal from that point on. In pseudo-code:

``````def minNum(values):
while values[0] is a NAN:
del values[0]
if not values:
return NAN
return min(values)
``````

Making it efficient and robust is left as an exercise for the

IEEE 754 isn’t super helpful here. In the current version (the 2019 text), it doesn’t define anything called either `min` or `max`. It defines `minimum` and `maximum`, which propagate NaNs, and `minimumNumber` and `maximumNumber`, which treat NaNs like missing data. There’s nothing in IEEE 754 which provides guidance on which of those operations Python’s `min` and `max` should map to.

Independent of that, I don’t think it’s worth complicating the currently completely generic `min` and `max` (or `sorted`, or `list.sort`, which suffer from exactly the same issues) implementations with NaN-awareness, and I’m not even sure it’s possible to reliably check for NaNs across the variety of types that might present them (NumPy floats, `decimal.Decimal`, Python floats, gmpy2 NaNs, etc.).

Possibly we need a documentation update pointing out that `min`, `max`, `sorted` and `list.sort` only make sense in the presence of a total ordering, and that results are undefined otherwise.

@mdickinson Can you please see my code? The fix is really simple and works not only for NaNs, but for every object that return always `False` in comparisons:

``````    first_comparable = False

if key is None:
for val in it:
if val > vmax:
vmax = val
first_comparable = True
elif not first_comparable and not val <= vmax:
# not comparable object, like NaN
vmax = vmin = val
continue
elif val < vmin:
vmin = val
first_comparable = True
``````

@Marco_Sulla
I here present a version that handles nan as expected (and demonstrated in your code), but without the need to test for `first_comparable` for each value.
Also, this version supports key and default:

``````_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
args_len = len(args)

if args_len == 0:
if default is _sentinel:
raise ValueError("minmax expected 1 or more argument(s), got 0")
return default
elif args_len == 1:
seq = args[0]
else:
seq = args

it = iter(seq)

vmax = vmin = v0 = next(it, _sentinel)
if vmax is _sentinel:
if default is _sentinel:
raise ValueError("minmax arg is an empty sequence")
return default

if key is None:
while not vmax <= vmax: # this essentially a test for nan
vmax = vmin = next(it, _sentinel)
if vmax is _sentinel:
return v0, v0
for val in it:
if val > vmax:
vmax = val
elif val < vmin:
vmin = val
return vmin, vmax

fmax = fmin = key(vmax)

while not fmax <= fmax: # this is essentially a test for nan
vmax = vmin = next(it, _sentinel)
if vmax is _sentinel:
return v0, v0
fmax = fmin = key(vmax)

for val in it:
fval = key(val)

if fval > fmax:
fmax = fval
vmax = val
elif fval < fmin:
fmin = fval
vmin = val

return vmin, vmax
``````

I remember some discussion about adding one-pass statistics methods to the statistics module. Perhaps an API that allows computing combinations of min/max/mean etc. in a single pass would also work for the case here. It would make sense to define nan-handling there.

Marco, I think you are missing Mark’s point that it doesn’t matter how
“simple” the patch is (assuming it works: I see no test suite to
demonstrate that it does what you say it does). Mark’s argument is that
we should not “fix” this by changing the behaviour.

min and max currently make no promises about what they will do when
given NANs. Since they don’t guarantee what the result will be
when passed a NAN, the current result might be surprising but it’s not
buggy.

We have no consensus on what the correct behaviour should be. Should
they:

• return NAN?
• raise an exception?
• treat it as missing data and ignore it?
• something else?

If we make guarantees about NANs, do we have to also make guarantees
about other unordered objects?

How about objects which always return True for ordering comparisons?

Although I don’t completely agree with Mark, I am sympathetic to his
argument. As the author of the statistics module, I intentionally didn’t
make any promises what result you would get if you passed a NAN as
a data point.

Changing the behaviour of min or max will probably need a PEP.

@steven.daprano: Ok, let me show you an example:

``````>>> max((float("nan"), 0))
nan
>>> max((0, float("nan")))
0
``````

It’s the same sequence, the only difference is the position of their items. This is a bug. Point.

@ruud Mh, I don’t know if your change will work always. Essentially you’re testing equality, since `vmax` can’t be lesser than itself. But there are objects that does not support `__eq__`, like `numpy.ndarray`.

The only good reliable test IMHO is to test that `>` and `<=` returns both `False`.

Not true. This is the precise definition of “undefined behaviour”.

2 Likes

It’s not poop… It’s “made by the dog”

Seriously, you can’t have an undefined behavior only because the position of the elements in the iterable changes.

You can have an undefined behavior only if all the object in the iterable have an undefined behavior with the comparison operators.

Otherwise, I suppose IMHO that this is a… bug.

No, that’s not how it works. We can just say that the output of any operation with NaN is undefined and get away with it. CPython is the reference implementation and we make the rules. You could argue that it’s a stupid rule, but that’s it.

This is not the case, since if NaN is not the first object, the operation is not undefined, it gives you the correct answer. If the operation will give you an undefined behavior with NaN, or any other object like it, at any position, I would agree with you. But this is not the case.

The fact is: what is expected by the coder? For what I know, Python empathizes simplicity and readability, and that the code make the things “like a human expects”. This is not the case.