“It’s always something.”
Something else is going on with floats: NaNs make testing next to impossible, because they never compare “less than”, “greater than”, or “equal” to anything else - a NaN doesn’t even compare equal to itself.
What Python’s list.sort()
does with a NaN is more than less accidental. The same is true of just about any sorting algorithm that doesn’t make a special case of NaNs.
For example, in CPython, sorting this list leaves it unchanged:
[2000.0, math.nan, -2000.0]
Why? Because asking “nan < 2000?” and “-2000 < nan?” both return False
, so there’s no reason to believe it’s not already sorted.
So, in the case of float types, my test code doesn’t fill the buffer with random bits. If it did, about 1 in 2048 would be a NaN for the “d” type, and about 1 in 256 for the “f” type;
Instead it generates each value uniformly at random with absolute value less than 5e5.
No NaNs then, and all sorting algorithms should produce the same order.
But that has a different consequence: because of the relatively narrow range, there’s much “less than random” variation in the exponent bits. In an MSB radix sort that matters, because the exponent bits are at the most significant end, and so the first pass or two of the radix sort are correspondingly less effective at splitting the input into a large number of (smaller) buckets. Which means it’s harder to invoke the list.sort()
code, leaving more of the work for the (slower!) radix sort to do.
So I repaired that: filled the buffer with random bits even for float types. Then a pass over the result replaces those that turned out to be NaNs with fresh random bits, as often as needed to get a non-NaN. A typical “accept/reject” method.
As a result, with much better variation in the high-order bits, the float types benefit much more from the approach. For example, there’s a 40% “d” speedup already at 10 million elements.
However, the original is more realistic for real-life cases: very “large” or “small” magnitude floats are unusual. Most apps live in a relatively narrow dynamic range suitable for the domain, far from the edges of what’s representable. Small variation in the exponent bits is “a feature” of real life.
In a game that can’t be won, the important thing is to enjoy playing anyway