NaN breaks min, max and sorting functions. A solution

Marco_Sulla · December 15, 2019, 6:27pm

##############################################################

TL;DR: we are discussing about the problem that NaNs in max, min and sorting function causes to Python code. Examples:

>>> b = [float("nan"), 9, 0 ,1, 2]
>>> max(b)
nan
>>> min(b)
nan
>>> a = [5, float("nan"), 9, 0 ,1, 2]
>>> sorted(a)
[5, nan, 0, 1, 2, 9]

Generally, this problem is caused by any unorderable object, ie an object that returns always False with > and < operators.

My proposal is that min and max should ignore unorderable objects, and sorting functions should emit a warning if an unorderable object is in the iterable. This should not break any old code.

##############################################################

@mdickinson @oscarbenjamin @steven.daprano @pf_moore @tiran

I’m forking this discussion from the minmax topic. The discussion started from this post.

@avayert Replying to this post, I want to say that my proposal does not break IEEE 754 at all.

Indeed, IEEE 754 says that NaNs are unorderable. That means that they are not greater or lower of any other number, NaNs included. This means that min and max should never return them.

So, on the contrary, is Python that actually breaks IEEE 754, since, if the iterable starts with an unorderable object, like NaN, they returns NaN:

>>> b = [float("nan"), 9, 0 ,1, 2]
>>> max(b)
nan
>>> min(b)
nan

For sorting, currently, if a NaN is in the middle of a iterable, it remains in that position. For example:

>>> a = [5, float("nan"), 9, 0 ,1, 2]
>>> sorted(a)
[5, nan, 0, 1, 2, 9]

This means that 5 < nan, nan < 0 and 5 < 0. All wrong. So again, is Python that does not adhere to IEEE 754.

IMHO, to completely adhere to IEEE 754:

min and max should ignore unorderable objects
sorting functions should emit a warning if an unorderable object is in the iterable
a function math.total_ordering should be implemented, so it can be passed as key to sorting functions

This will not break any existing code.

Marco_Sulla · December 15, 2019, 8:20pm

Optionally, sorting functions could have also an unorderable=None parameter.

If unorderable is None, the result will be the current one.
If unorderable == "start", they will be moved at the start of the result.
If unorderable == "end", they will be moved at the end.
If unorderable == "remove", they will be not added to the result.

Mathematically, only "remove" is consistent with IEEE 754.

Paddy3118 · December 15, 2019, 10:29pm

Obviously not as evident as you presume, from the reactions to ypur example.

Given the “two different placements of NaN” example, I would think its the fault of trying to find the minimum of results containing NaN’s. I would be at fault for not accounting for NaN’s. Ignoring them, or any other action could depend on circumstance - wh’s to say that an arbitrary resolution is th one needed?

It might be better still to raise an exception unless an option is given to state how NaN’s should be treated - at least they won’t silently be ignored.

Marco_Sulla · December 15, 2019, 10:34pm

@Paddy3118 I quote myself:

IEEE 754 says that NaNs are unorderable. That means that they are not greater or lower of any other number, NaNs included. This means that min and max should never return them.

There’s nothing more to say… it’s a bug. This is math, not jurisprudence.

Math is math. Even if 1000 persons says 2+2=3, 2+2 is 4.

Paddy3118 · December 15, 2019, 10:51pm

It doesn’t mean that min and max should return any other value either.

It might be argued from your quote of IEEE754 that trying to order the un-orderable should rightly raise an exception in Python.

Amittedly thinking yourself right is a good starting point, but persuade others rather than harangue them for not seeing what is self evident to you. Maybe think of why others don’t think as you do? In forums such as this it is rarely becaue all the audience haven’t read your posts.

Marco_Sulla · December 15, 2019, 11:16pm

Are you serious? 0____o

min and max should not return any other value. They should return the minimum and the maximum. The only way they must return NaN is if the iterable contains only NaNs.

Can you read my posts with more attention, please? I’m proposing to raise a warning, not an exception.

Paddy, about sorting we can argue. But not for min and max. I can’t discuss about math, ok? I have not to persuade no one, the math is clear. Or do you think I have to persuade a flat earther that the earth is (more or less) a sphere?

Sorry if you feel my posts as “harangue”, but my intention is not to be aggressive to anyone. On the contrary, this discussion makes me laugh a lot

steven.daprano · December 15, 2019, 11:39pm

Marco said: “There’s nothing more to say… it’s a bug. This is math,
not jurisprudence.”

I’m glad you mentioned maths, since maths has lots and lots and lots to
say about values that don’t define a total order. Here are just
two examples. Marco, what are the maximums of:

(100+1j) and (1+100j) # complex numbers

⎡ 100 1   ⎤ and   ⎡ 1   100 ⎤  # matrices
⎣ 1   100 ⎦       ⎣ 100 1   ⎦

Marco: “Even if 1000 persons says 2+2=3, 2+2 is 4.”

Don’t change the subject. We’re not talking about addition in ℤ, we’re
talking about ordering comparisons of arbitrary objects in Python.

Marco_Sulla · December 15, 2019, 11:43pm

Also IEEE 754. And it says that NaNs are not orderable. Point.

About complex numbers, they are OT, since they are really not comparable in Python. They raises a good, old exception.

Or are you suggesting to put a try-except to skip also the objects that raises an error? Sorry but never. In that case min, max and sorting function will fail with an exception, so you’re already and immediately informed there’s something that does not work in your iterable.

This is really the most funny discussion I ever made…

steven.daprano · December 16, 2019, 12:19am

What are the mimumum and maximum of values that don’t define a total

order?

Marco, if you don’t understand the question, you should stop replying to

this thread now and go do some research until you understand the

question and can describe the differences between total order, partial

order, weak order, semiorder, cyclic order, greatest element, maximal

element, etc. To say nothing of oddball values that don’t fit into any

of these neat mathematical families.

And if you do understand the question, you will know that the answer is

“the minimum and the maximum may not have a single, well-defined value”.

When NANs are included, floats no longer make up a set with a total

order. That’s the end of the story.

Marco_Sulla · December 16, 2019, 12:40am

This is the end of the story:

The IEEE 754-2008 standard defines the functions minNum and maxNum giving the minimum and maximum of two inputs, respectively. These functions do not give a NAN output if one of the inputs is NAN and the other is not a NAN.1A forthcoming revision of the IEEE 754 standard defines two additional functions, named minimum and maximum, thatdo the same but with propagation of NAN inputs.

So we have simply to choose if apply minNum and maxNum logical or minimum and maximum:

If there’s a NaN, ignore it
If there’s a NaN, return it

PS: total_order is a function apart in IEEE 754.

@Paddy3118 And I am the guy that harangue?

steven.daprano · December 16, 2019, 1:01am

min and max don’t just operate on IEEE-754 values, they operate on
arbitrary objects that can do anything.

Marco, I’m done with this thread. Write a PEP, or move on.

Marco_Sulla · December 16, 2019, 1:08am

I perfectly know this. Can you please give me an example of object, that is not NaN, that returns False on every comparison operation instead of raising an exception, please?

Furthermore, if I want to write a PEP, it’s my choice. I don’t get orders from the first guy on a forum.

And there’s no PEP to write, this is a bug. I have not to propose any enhancement.

But if you want to write a PEP, feel free to do it, if you want

Marco_Sulla · December 16, 2019, 1:57am

Anyway, since the world is vast, I checked a niche programming language: Java

In Java, sorting will put NaNs at the end of the iterable, and max returns NaN always:

You know, I’m both a Python and a Java programmer, and I know a little of other languages. But I started with Python and I love it. And I was very angry when Java, C and C++ programmers make fun of me, calling me a “pyscripter”.

Well, I’m starting to seriously change my mind…

aeros · December 16, 2019, 10:07am

While it can be useful to gather evidence from the behavior of other languages, Python is definitely not Java. The behavior of another popular language with very different philosophies does not provide a sufficient argument.

aeros · December 16, 2019, 10:11am

If a rather obscure behavior that would almost never appear in any realistic production code affects how you fundamentally feel about Python, that’s rather unfortunate.

But, please leave that out of the discussion. It’s not particularly constructive.

Rhodri · December 16, 2019, 12:05pm

It really isn’t.

That’s nice for them. In Python, we have min and max functions that are under no obligation to work like minNum and maxNum. In practical terms, either you care about NaNs or you don’t. If you don’t, filtering them out is trivial. If you do, an exception makes as much sense as anything else.

It’s not maths. It’s Computational Mathematics, which is a whole different ballgame. We didn’t even touch it in the Maths part of my degree, and we held it at arms length and whimpered in the CompSci part.

Paddy3118 · December 16, 2019, 1:09pm

Ooh, ooh, please sir!
Non transitive dice in Python.

Any two die are comparable, but the normal meaning of minimum and maximum for the four die doesn’t apply.

Marco_Sulla · December 16, 2019, 8:10pm

Ok, so Python should not adhere to the IEEE 754 standard, should not take as example the more used language in the world, and it does not even follow practical common sense, that is a max or min function should not return different solution depending on the position of their parameters, and that a NaN in a iterable should not break at all the sorting.

Can I ask so what common accepted standard should use Python?

mjpieters · December 16, 2019, 9:00pm

While they don’t always return False on comparisons, sets are also not orderable, and also don’t raise exceptions. Comparisons between sets are meant to test for sub- and supersets, e.g.:

a = {42, 17}
{42} <= a  # True, a is a superset of {42}
{42, 18} <= a  # False, the extra element, 18, is missing from a
{11, 81} <= a  # False, no elements in common
{42, 17} < a   # False, the set on the left is not a subset, it is equal.
set() < set()  # False, the empty set is not a subset of the empty set

Rich comparisons in Python are entirely open to not even return booleans.

E.g.

numpy arrays broadcast comparisons and return a boolean array, not a single boolean. These test as True for every comparison as they are non-empty containers.
SQLAlchemy column objects (used to model SQL objects) return query filter objects, not booleans. These objects are probably going to test as True for every comparison (depends on their __bool__ implementation, I’d say).

etc.

We can trivially create our own class that returns False on every comparison, instead of raising an exception:

class Uncomparable:
    def __lt__(self, _): return False
    __gt__ = __le__ = __ge__ = __eq__ = __ne__ = __lt__

min() and max() use rich comparisons to pick one element from the inputs. If the inputs are not orderable, then the behaviour is undefined. The same applies to sorted(), list.sort(), bisect and heapq, and probably several other locations in the stdlib.

Bugs go in the issue tracker. You already found #36095, where Tim Peters has pointed out that what you are in fact doing is propose an enhancement, namely to have special handling for unorderable objects (an impossible task, as all the examples above implement the rich comparison API and there is no facility, whatsoever, to detect that they are not orderable). Such an change would definitely be PEP material.

Marco_Sulla · December 16, 2019, 10:02pm

Ok, let me rephrase the question: an objects that is not an NaN, that is not comparable and people in the real world will want to sort

Ok, seriously: the problem is with NaNs. Sets, numpy arrays, sqlalchemy columns, buggy custom objects… I’m quite sure no one wrote an iterable with this objects to sort them.

Yes, we can require that < should return True or False. This will remove the problem with numpy arrays and sqlalchemy columns. Yes, we can do my little trick, and buggy custom objects will be not a problem. About set, I think they can’t be sorted in any way, since their elements are not ordered. Anyway, who cares about sorting sets? Apart the set that contains all the sets…

I can admit I don’t care about any other unorderable object but the NaN.

PS: thank you for your constructive and civil response (as always). I was really tired about captious thinkings.