Minmax function alongside min and max

steven.daprano · December 14, 2019, 10:29pm

In the general case of some arbitrary values without total ordering, I
would agree, but NANs are somewhat special.

The 2008 revision to IEEE-754 specifies that min and max should ignore
NANs:

min(NAN, x) = min(x, NAN) = x

and similarly for max. Only in the case that both arguments are NANs
should a NAN be returned.

One might also argue for min/max functions that propogate NANs,
returning NAN if any of the arguments are NAN.

Python’s status quo, where the result depends on which argument is a
NAN, matches neither of these behaviours, and is surely not desired by
anyone.

I think that it is easy to fix: just skip over any leading NANs, then
proceed as normal from that point on. In pseudo-code:

def minNum(values):
    while values[0] is a NAN:
        del values[0]
    if not values:
        return NAN
    return min(values)

Making it efficient and robust is left as an exercise for the
reader

mdickinson · December 14, 2019, 10:48pm

IEEE 754 isn’t super helpful here. In the current version (the 2019 text), it doesn’t define anything called either min or max. It defines minimum and maximum, which propagate NaNs, and minimumNumber and maximumNumber, which treat NaNs like missing data. There’s nothing in IEEE 754 which provides guidance on which of those operations Python’s min and max should map to.

Independent of that, I don’t think it’s worth complicating the currently completely generic min and max (or sorted, or list.sort, which suffer from exactly the same issues) implementations with NaN-awareness, and I’m not even sure it’s possible to reliably check for NaNs across the variety of types that might present them (NumPy floats, decimal.Decimal, Python floats, gmpy2 NaNs, etc.).

Possibly we need a documentation update pointing out that min, max, sorted and list.sort only make sense in the presence of a total ordering, and that results are undefined otherwise.

Marco_Sulla · December 14, 2019, 11:01pm

@mdickinson Can you please see my code? The fix is really simple and works not only for NaNs, but for every object that return always False in comparisons:

    first_comparable = False
    
    if key is None:
        for val in it:
            if val > vmax:
                vmax = val
                first_comparable = True
            elif not first_comparable and not val <= vmax:
                # not comparable object, like NaN
                vmax = vmin = val
                continue
            elif val < vmin:
                vmin = val
                first_comparable = True

ruud · December 15, 2019, 10:19am

@Marco_Sulla
I here present a version that handles nan as expected (and demonstrated in your code), but without the need to test for first_comparable for each value.
Also, this version supports key and default:

_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
     args_len = len(args)
    
    if args_len == 0:
        if default is _sentinel:
            raise ValueError("minmax expected 1 or more argument(s), got 0")
        return default        
    elif args_len == 1:
        seq = args[0]
    else:
        seq = args
    
    it = iter(seq)

    vmax = vmin = v0 = next(it, _sentinel)
    if vmax is _sentinel:
        if default is _sentinel:
            raise ValueError("minmax arg is an empty sequence")
        return default
            
    if key is None:            
        while not vmax <= vmax: # this essentially a test for nan
            vmax = vmin = next(it, _sentinel)
            if vmax is _sentinel:
                return v0, v0
        for val in it:
            if val > vmax:
                vmax = val
            elif val < vmin:
                vmin = val
        return vmin, vmax
                
    fmax = fmin = key(vmax)
        
    while not fmax <= fmax: # this is essentially a test for nan
        vmax = vmin = next(it, _sentinel)
        if vmax is _sentinel:
            return v0, v0
        fmax = fmin = key(vmax)
    
    for val in it:
        fval = key(val)
        
        if fval > fmax:
            fmax = fval
            vmax = val
        elif fval < fmin:
            fmin = fval
            vmin = val

    return vmin, vmax

oscarbenjamin · December 15, 2019, 10:51am

I remember some discussion about adding one-pass statistics methods to the statistics module. Perhaps an API that allows computing combinations of min/max/mean etc. in a single pass would also work for the case here. It would make sense to define nan-handling there.

steven.daprano · December 15, 2019, 11:32am

Marco, I think you are missing Mark’s point that it doesn’t matter how
“simple” the patch is (assuming it works: I see no test suite to
demonstrate that it does what you say it does). Mark’s argument is that
we should not “fix” this by changing the behaviour.

min and max currently make no promises about what they will do when
given NANs. Since they don’t guarantee what the result will be
when passed a NAN, the current result might be surprising but it’s not
buggy.

We have no consensus on what the correct behaviour should be. Should
they:

return NAN?
raise an exception?
treat it as missing data and ignore it?
something else?

If we make guarantees about NANs, do we have to also make guarantees
about other unordered objects?

How about objects which always return True for ordering comparisons?

Although I don’t completely agree with Mark, I am sympathetic to his
argument. As the author of the statistics module, I intentionally didn’t
make any promises what result you would get if you passed a NAN as
a data point.

Changing the behaviour of min or max will probably need a PEP.

Marco_Sulla · December 15, 2019, 3:06pm

@steven.daprano: Ok, let me show you an example:

>>> max((float("nan"), 0))
nan
>>> max((0, float("nan")))
0

It’s the same sequence, the only difference is the position of their items. This is a bug. Point.

Marco_Sulla · December 15, 2019, 3:23pm

@ruud Mh, I don’t know if your change will work always. Essentially you’re testing equality, since vmax can’t be lesser than itself. But there are objects that does not support __eq__, like numpy.ndarray.

The only good reliable test IMHO is to test that > and <= returns both False.

pf_moore · December 15, 2019, 3:35pm

Not true. This is the precise definition of “undefined behaviour”.

Marco_Sulla · December 15, 2019, 3:40pm

It’s not poop… It’s “made by the dog”

Seriously, you can’t have an undefined behavior only because the position of the elements in the iterable changes.

You can have an undefined behavior only if all the object in the iterable have an undefined behavior with the comparison operators.

Otherwise, I suppose IMHO that this is a… bug.

tiran · December 15, 2019, 3:52pm

No, that’s not how it works. We can just say that the output of any operation with NaN is undefined and get away with it. CPython is the reference implementation and we make the rules. You could argue that it’s a stupid rule, but that’s it.

Marco_Sulla · December 15, 2019, 4:01pm

This is not the case, since if NaN is not the first object, the operation is not undefined, it gives you the correct answer. If the operation will give you an undefined behavior with NaN, or any other object like it, at any position, I would agree with you. But this is not the case.

The fact is: what is expected by the coder? For what I know, Python empathizes simplicity and readability, and that the code make the things “like a human expects”. This is not the case.

Marco_Sulla · December 15, 2019, 4:15pm

Furthermore, I think that if Python publicly admit in its documentation that max() and min()have an “undefined behavior” only if the first elements of the iterable does not have well-defined behavior of comparison operators, I think that any developers in the world will laugh

mdickinson · December 15, 2019, 5:04pm

Marco: please could you keep the discussion respectful?

Marco_Sulla · December 15, 2019, 5:11pm

About sorted() and list.sort() and any other function like them, I think the current behavior is correct. Indeed the result is completely undefined if the NaN changes its position, it’s not only a problem if it’s at the start.

Maybe Python could implement total ordering by default for numbers? The problem is total ordering requires that NaNs carry on a payload. I suppose that Python does not store this information, so it could be really a problem to implement it, and I don’t know how much is useful in real world.

IMHO min() and max() should ignore unorderable elements, and sorted() and list.sort() raise a warning.

mdickinson · December 15, 2019, 5:21pm

For sort and sorted, there’s some relevant recent discussion on the tracker (see issue 36095). The root cause of these issues really has nothing to do with sort, sorted, min or max; rather, it stems from the way comparisons on NaNs work. Changing those comparisons to implement a total order (or at least a total preorder), or to raise on comparisons involving NaNs, seems reasonable on the face of it. But that would involve significant breakage of existing code. It’s yet another of those situations where the answer to the question “Should we have done this differently?” is “Possibly, yes.”. But that’s not helpful, because the question actually facing us is “Should we change it now?”, and that’s a very different proposition.

Marco_Sulla · December 15, 2019, 5:34pm

@mdickinson I don’t think this change will break nothing:

raising a warning let the code run. Simply the coder will be informed of the problem. “Errors should never pass silently.”
ignoring first NaNs in max and min could fix some code, instead As I already said, probably the coders that encountered the problem have removed the NaNs from the iterable, or used numpy. Who never fixed it, it’s quite improbable that the code after max and min really expects NaN. So Python probably will fix that codes silently.

Anyway, there’s an alternative.

Python could implement total ordering only partially:

+NaN > number == True
number > -NaN == True
+NaN >= +NaN == False
+NaN <= +NaN == False
-NaN >= -NaN == False
-NaN <= -NaN == False

Pro: sorting will be no more undefined if NaN is present
Contro: max will return NaN if present, at any position. min could return -NaN, but they are very rare.

In these cases, I think max and min could raise a warning.

Anyway, in this case, this is a real change in Python, and should be handled by a PEP. That I have no time to open

Marco_Sulla · December 15, 2019, 6:05pm

@mdickinson: anyway, if you want my opinion, for what it’s worth, my check is more simple and works for any object that does not support ordering, not only for NaNs.

And IMHO it should be the default for max, min, sorted, list.sort, heap, bisect, and all the other folks because, as I said, it’s quite improbable that this change will break something, but, on the contrary, it will silently fix many old codes.

If someone wants total ordering, a math.total_ordering can be implemented and passed as key for sorting.

My 2 cents.

avayert · December 15, 2019, 6:07pm

@Marco_Sulla not only would this break the IEEE-574 NaN specification, this entire topic has nothing to do with adding minmax to Python. I respectfully ask you to open another thread if you want to continue the discussion on how NaNs should be handled in Python.

Marco_Sulla · December 15, 2019, 6:30pm

I forked the discussion here: https://discuss.python.org/t/2868