Making NaN a singleton

CharlieZhao95 · April 7, 2022, 6:57am

Before reading the source of CPython, I have a very subjective view that NaN should be a singleton. But the reality is not so. There is a simple example:

math.nan is math.nan  # True
float('nan') is float('nan')  # False

The same thing happens when the parameter is 'inf'.

It seems to me that such a result might be reasonable from a module design point of view. math.nan is a property of the math module, which is only initialized once when import math is executed, so we always get the same object from the math module.

But when we call float('nan') a new object is created each time. Because the buffer pool of float object is different from that of small integer.

I’m not sure if it is necessary to change the behavior of float() function. To solve this problem, we might need a buffer pool to hold those objects.

Thanks for any insights!

pf_moore · April 7, 2022, 8:24am

There are many NaN values, not a single one (lots of bit patterns can be used to represent different NaNs, as NaN values have a “payload”).

I don’t know how float(‘nan’) is calculated, so sure, maybe the specific value it returns could be cached, but it’s hard to see what value there would be. If you’re using object identity to compare NaNs, you’re probably wrong anyway

CharlieZhao95 · April 7, 2022, 10:45am

Thanks for your reply! I totally agree with you.

Not long ago, we even improved the documentation to suggest users to use math.isnan() instead of is or == when they want to check if a number is a NaN.

We may look at this problem from C and Python.

In C, we could get any NaNs from the function nan(const char* arg).

NaN in C

#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>

int main(void) { 
	double f1 = nan("1");
    uint64_t f1n; memcpy(&f1n, &f1, sizeof f1);
    printf("nan(\"1\")   = %f (%" PRIx64 ")\n", f1, f1n);
 
    double f2 = nan("2");
    uint64_t f2n; memcpy(&f2n, &f2, sizeof f2);
    printf("nan(\"2\")   = %f (%" PRIx64 ")\n", f2, f2n);
 
    double f3 = -nan("");
    uint64_t f3n; memcpy(&f3n, &f3, sizeof f3);
    printf("-nan(\"\") = %f (%" PRIx64 ")\n", f3, f3n);
    
    double f4 = nan("");
    uint64_t f4n; memcpy(&f4n, &f4, sizeof f4);
    printf("nan(\"\")   = %f (%" PRIx64 ")\n", f4, f4n);
	return 0;
}

/**
 * Output:
 * nan("1")   = nan (7ff8000000000001)
 * nan("2")   = nan (7ff8000000000002)
 * -nan("") = -nan (fff8000000000000)
 * nan("")   = nan (7ff8000000000000)
 */

But in Python, there is no such freedom, we can only get NaN through function float('nan'), math.nan or some special operations(such as 0 * float('inf')). The return value of float('nan') is a Python object, we should focus on its underlying implementation. I put some code here to illustrate the value of NaN in CPython.

NaN in Python

// float('nan') will call this function to get the value of float object.
double
_Py_parse_inf_or_nan(const char *p, char **endptr)
{
    ...
    else if (case_insensitive_match(s, "nan")) {
        s += 3;
        retval = negate ? -Py_NAN : Py_NAN;
    }
    ...
    return retval;
}

// Py_NAN: Value that evaluates to a quiet Not-a-Number (NaN).
#if !defined(Py_NAN)
#  if _Py__has_builtin(__builtin_nan)
     // Built-in implementation of the ISO C99 function nan(): quiet NaN.
#    define Py_NAN (__builtin_nan(""))
#else
     // Use C99 NAN constant: quiet Not-A-Number.
     // NAN is a float, Py_NAN is a double: cast to double.
#    define Py_NAN ((double)NAN)
#  endif
#endif

As we can see, the real value of float('nan') could be obtained by macro Py_NAN, a fixed value. So making NaN a singleton in CPython is available.

Maybe there is something incomplete, please correct me.

pf_moore · April 7, 2022, 10:58am

I fail to see any benefit, though. How would having a NaN singleton be of benefit to Python users? The memory savings would be trivial, and it is flat-out wrong to be checking for nans using object identity, so we shouldn’t do anything that encourages that. What problem are you actually trying to solve here?

steven.daprano · April 7, 2022, 12:20pm

There are 9007199254740990 possible NANs in 64-bit IEEE-754 floats.

If your floats are coming from an external data source, or a function written in C, Fortran, Java, Rust etc, then you might receive any one of those 9e15 NANs. Using is to test for a NAN is never correct. Even if we cached float(‘nan’) so that this was always true:

math.nan is float('nan')

it would still be wrong to test for NANs using is.

CPython currently doesn’t even cache 0.0 so what benefit is there in caching NANs, which are likely to be much rarer? (Other interpreters just as Jython, IronPython, PyPy, etc are free to cache as many, or as few, floats as they want.)

In any case, caching of ints and floats is an implementation detail. You should never rely on ints, or floats, to be cached. Doing that means you are no longer writing platform-independent Python code, but tying your code to a specific version of a specific intepreter on a specific OS.

So it seems to me that:

there is no good reason for the interpreter to cache floats (or it would already be doing it);
even if there was, it is wrong for users to rely on that cache;
and even if there was a float cache, testing for NANs with is is always wrong.

CharlieZhao95 · April 8, 2022, 3:00am

I in no way recommend that users compare NaNs using object identifiers.

The initial idea was just to keep the behavior of float('nan') and math.nan consistent to avoid confusion for those who found the difference. Based on this idea, I thought about whether I could use the cache to store those special float objects in CPython, such as nan , inf , 0.0 , etc. So here comes this discussion.

And now I got professional answers here.

Thanks all!