Harmonizing the JSON serialization of non-finite float values with browsers and other languages

Previous discussion

This is an adapted version of a GitHub issue. Mark Dickinson suggested I create a thread here since GitHub is not the primary place of discussion.
For links to previous discussions and a corresponding PR, please refer to this issue. I can’t attach more than two links since I’m a new user.

Pitch

Currently, python’s default JSON serializer encodes values like NaN as-is, with the explanation being that many JS-based JSON libraries also do this, and that the corresponding parsers can handle such non-conforming input.
In reality, most major browsers do not support this type of encoding and even NodeJS(v14.16.0) converts such values to null.
The keyword argument allow_nan makes the serializer throw when encountering non-finite values when set to True, but I’d argue it is paramount to ensure compatibility with modern browsers, instead of just stopping execution. Changing the default behavior is of course not needed or possible at this point.

When implementing this feature, there are two main decisions to make.

Firstly, it has to be decided if allow_nan should be extended to take more datatypes like strings and callables, or if we should create a separate argument for this functionality.
Re-purposing allow_nan would make the control over such behavior centralized, however the name is very limiting.
It doesn’t say anything about other non-finite values, and without looking at the docs, one would think it only takes bool values.

Secondly, it has to be decided how far we want to take this feature.
Do we want to have pre-defined cases like as_is, throw, and to_null, or do we want to allow the user to pass their own callable? The latter is implemented by the linked PR. Having both options is also a possibility.
There has also been a discuss thread that suggested adding a general override for all built-in types, though it seems like the sentiment is against making the standard JSON serializer more complex than it already is.

Overall, each combination of decisions has its advantages and drawbacks. Since I wasn’t a part of such discussions before, I don’t have a preference.
All I want is to see this feature get implemented, and I can create a PR once consensus is reached.

As I understand it, the desire here is to have an option for Python to behave like JavaScript (specifically, ECMAScript’s JSON.stringify) does with respect to infinities and nans. For all the discussion of custom callables and other exciting directions that we could take things, this seems to be the concrete need that people have, primarily for interoperability purposes.

How about this for a concrete proposal, along the lines of the simplest-thing-that-could-possibly-work: If the allow_nan flag (which currently usually takes a value of either False or True in real-world uses) to json.dumps is given as None, then infinities and nans are converted to null in the JSON string output.

Everything else wouldn’t change: the default for allow_nan would remain True, and if it were some falsy value other than None, an exception would be raised.

In code, here’s the current behaviour:

>>> import json
>>> nan, inf = float("nan"), float("inf")
>>> json.dumps([1.2, nan, 5.6, inf])
'[1.2, NaN, 5.6, Infinity]'
>>> json.dumps([1.2, nan, 5.6, inf], allow_nan=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant

The proposed new behaviour would be to allow:

>>> json.dumps([1.2, nan, 5.6, inf], allow_nan=None)
'[1.2, null, 5.6, null]'

The only fly in the ointment: technically this is a backwards incompatible change, since None is falsy, so the current code raises if you give allow_nan=None.

Thoughts?

How about this for a concrete proposal, along the lines of the
simplest-thing-that-could-possibly-work: If the allow_nan flag (which
currently usually takes a value of either False or True in
real-world uses) to json.dumps is given as None, then infinities
and nans are converted to null in the JSON string output.

This is simple and minimal and evocative and doesn’t preclude more
exciting modes in the future, and as such I like it.

The proposed new behaviour would be to allow:

>>> json.dumps([1.2, nan, 5.6, inf], allow_nan=None)
'[1.2, null, 5.6, null]'

The only fly in the ointment: technically this is a backwards incompatible change, since None is falsy, so the current code raises if you give allow_nan=None.

I’ve no problem with that provided the docs cantain the usual “Changed
in 3.x: allow_nan=None means to present null instead of raising” kind
of remark, as is common with a lot of new features.

Cheers,
Cameron Simpson cs@cskk.id.au

I wouldn’t be surprised if someone out there was using allow_nan=None to mean allow_nan=False. We shouldn’t change the meaning silently.
Solutions I can think of:

  • Adding a new argument, like nonfinite=None. Initially it would only take None but it could be extended in the future, if necessary.

  • Using a specific value that’s very unlikely to be used currently, like allow_nan='null'. To allow future extensions, start emitting a DeprecationWarning when a non-bool is passed to allow_nan.

    This would technically be a backwards incompatible change, but it could get a SC exception.

2 Likes

Agreed. I find it hard to imagine that many people are deliberately writing dumps(some_input, allow_nan=None), but cases of dumps(some_input, allow_nan=allow_nan) where the allow_nan value happens to be None (perhaps because it’s been determined by some argument parsing framework like argparse or click) seem plausible.

I like the second option here: special value 'null' for the new behaviour, combined with a DeprecationWarning for things that aren’t 'null' and aren’t a bool.