Gevent breaks floating point subnormals

Some popular libraries, including gevent, disable subnormals in floating point maths, which apparently disables them for the entire Python process when imported.

I can see from the bug report that this is Bad, as many floating point algorithms rely on subnormals or they will fail to converge, But I’m not quite sure how bad it is.

I presume that it’s not serious enough to deal with it in the interpreter, we can just let the various libraries get around to dealing with it in their own time. Yes?

Or maybe the interpreter should do something about this – but what? Is it worse for the interpreter to mess with CPU flags than for libraries to do it?

In the meantime, are these guaranteed to correctly check for the existence of subnormals?

assert math.nextafter(0.0, 1.0) != 0.0  # Python 3.9 and higher.
assert sys.float_info.min/2 != 0.0  # older versions

Thanks in advance.

Thanks also to Brendan Dolan-Gavitt for analysing the issue.

@tim.one @storchaka @mdickinson

1 Like

Very nice find; I wasn’t aware of this. It’s a beautiful example of a violation of the rule known to some as DNMWGSTDNBTY (“Do Not Mess With Global State That Does Not Belong To You”). Though technically, in this case the state is thread-local, not global.

No, I don’t think the CPython interpreter should change anything here (and it’s not clear there’s anything we could do about this even if we wanted to), though a PyPI package that allowed inspecting and modifying the MXCSR flags on Intel x64 would be a nice-to-have.

There’s no part of the interpreter that I’m aware of that critically depends on the DAZ and FTZ flags being set in a particular way. Note, though, that the str<->float conversions in CPython do critically depend on the processor being set to the correct precision: i.e., 53-bit precision rather than 64-bit precision, and we have to make a special effort to test for that and (if necessary) temporarily change the floating-point environment on every str<->float conversion. But the dtoa.c code that underlies those conversions deliberately avoids subnormals in intermediate calculations, so should be safe from anything really bad happening as a result of setting DAZ and/or FTZ.

Those should work, though for a quick-and-dirty check I’d just type 5e-324 at a prompt and see what’s printed:

Python 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 5e-324
5e-324
>>> import gevent
>>> 5e-324
0.0

I note that Python on macOS/Intel doesn’t seem to have the same issue, presumably because Clang handles these particular optimization flags differently from GCC.

Python 3.10.7 (main, Sep 10 2022, 07:37:51) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 5e-324
5e-324
>>> import gevent
>>> 5e-324
5e-324
1 Like

Just as a clarification, these flags don’t affect the existence of subnormals: the subnormals are still there. The gevent import only affects how operations on subnormals behave: DAZ tells the processor to treat any incoming subnormals to an operation as though they’re zeros, while FTZ says that an operation that would otherwise have produced a subnormal result should give a zero instead.

As a fun experiment, below, after importing gevent we can still create a subnormal x (the smallest positive subnormal, with value 2**-1074), via the struct module:

Python 3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gevent  # trigger the fast-math side-effects
>>> import struct
>>> x = struct.unpack('<d', struct.pack('<Q', 1))[0]

We haven’t done any floating-point operations in the course of creating x, so it’s an actual subnormal, and not 0.0. But because FTZ and DAZ affect floating-point operations, it behaves like 0.0 in almost anything we do to it:

>>> x > 0.0
False
>>> x
0.0
>>> x * 2**53
0.0
>>> x.hex()
'0x0.0p+0'

But reversing the struct operations, we find that x is not in fact zero:

>>> struct.unpack('<Q', struct.pack('<d', x))[0]
1

Addendum: here’s another way to see that x is not zero: we’ll apply str to it, but move it to a separate process first. Note that we have to be careful to set the multiprocessing context to spawn, else the child process will pick up the FPU state changes already present in the main process.

>>> import multiprocessing, concurrent.futures
>>> executor = concurrent.futures.ProcessPoolExecutor(mp_context=multiprocessing.get_context('spawn'))
>>> str(x)
'0.0'
>>> executor.submit(str, x).result()
'5e-324'
1 Like

The IEEE 32-bit float format has a rather small dynamic range, and the “gradual underflow” subnormals support can be important there. But the CPython core has no support for doing arithmetic in that format (although extensions may, like numpy).

The IEEE 64-bit float format is CPython’s float type on almost all platforms, and has much larger dynamic range. I expect that debugged apps and libraries almost never get near the subnormal range when calculating in that format.

Unlike Mark, I wouldn’t rely on that one. While I don’t know of any today, over the decades a number of platforms have had no support for subnormals at all, particularly in the 64-bit world. Indeed, I worked for a HW manufacturer (the now-defunct Kendall Square Research) whose FPU only supported 64-bit floats, and automatically & unconditionally flushed subnormals to zeroes. On that platform, subnormals weren’t part of the numerical universe, and the bit patterns for IEEE subnormals were just redundant ways of spelling 0 with the same sign bit.

So, on that platform, it would have made most sense for nextafter(0.0, 1.0) to return the smallest positive non-zero normal, since that was in fact the next representable value in the direction of 1.0, in the universe of numeric values the HW supported.

In contrast,

much more directly tests for “gradual underflow” support, which is the purpose of subnormals.

As to what Python can/should do about it, “nothing” is my answer too. Python didn’t create the potential problems, and has no more legitimate business messing with a thread’s FPU state than these libraries do.