The latest bugfix releases of all supported Python versions break printing or parsing large integers e.g.:
>>> import math
>>> math.factorial(1559)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Exceeds the limit (4300) for integer string conversion
This follows from issues in Sage and SymPy:
https://trac.sagemath.org/ticket/34506
As of Python versions 3.10.7, 3.9.14, 3.8.14, 3.7.14 (all pushed yesterday) this change applies to all supported versions of Python:
https://mail.python.org/archives/list/python-dev@python.org/message/B25APD6FF27NJWKTEGAFRUDNSVVAFIHQ/
The CPython issue where this change was discussed is here:
The issue and the release notes refer to this CVE:
However when I go that page I see no useful information at all. The only thing I can establish is that the CVE is “reserved”.
Problems with the change were pointed out in the issue but then the discussion was shutdown and it was suggested to “redirect further discussion to discuss.python.org.” so that’s what I’m doing here.
To be clear this is a significant backwards compatibility break. The fact that int(string)
and str(integer)
work with arbitrarily large integers has been a clearly intended and documented feature of Python for many years. I can’t see any way to reconcile this change or the process around it with the sort of statements in the Python backwards compatibility policy (PEP 387):
The only possibly applicable part I can find is this:
- The steering council may grant exceptions to this policy. In particular, they may shorten the required deprecation period for a feature. Exceptions are only granted for extreme situations such as dangerously broken or insecure features or features no one could reasonably be depending on (e.g., support for completely obsolete platforms).
Presumably here it is the word “insecure” that justifies a potential SC exception. The OP of the issue suggests that this supposed vulnerability has been known about for over 2 years though. During that time many releases of Python were made and nothing was done to address this. Now when a potential fix arrives how is it so urgent that it should be backported to all release branches and released within days?
From a process perspective I really don’t understand how the decisions were made that lead to breaking changes being committed, dissenting views ignored, and then the changes pushed out two days later to every version of Python simultaneously to fix an issue that apparently was not urgent for the previous two years.
There are also technical problems with the way that this has been done. Consider this from the perspective of SymPy: every call to str
or int
is now a potential bug that might fail for large integers and there are lots of such calls:
$ git grep 'int(' | wc -l
10003
What exactly can SymPy replace those calls to int
or str
with?
The documentation says that the limit can be configured with an environment variable, a command line flag or with a setter function but all of these have the effect of setting a global variable somewhere. For Sage that is probably a manageable fix because Sage is mostly an application. SymPy on the other hand is mostly a library. A library should not generally alter global state like this so it isn’t reasonable to have import sympy
call sys.set_int_max_str_digits
because the library needs to cooperate with other libraries and the application itself. Even worse the premise of this change in Python is that calling sys.set_int_max_str_digits(0)
reopens the vulnerability described in the CVE so any library that does that should then presumably be considered a vulnerability in itself. The docs also say that max_str_digits
is set on a per interpreter basis but is it threadsafe? What happens if I want to change the limit so that I can call str
and then change it back again afterwards?
There should at minimum be alternative functions that can be used in place of int
and str
for applications that do want to work with large integers. Those alternative functions should just work and not depend in any way on any global flags so that they can function as dropin replacements for the previous functionality that has existed for many years. This is a basic consideration with a compatibility break: what is the replacement code that should be used downstream to achieve the precise equivalent of the previous behaviour?
Of course some things can’t be fixed by providing alternative functions and a clear example of that is int.__str__
which might be called indirectly by any number of other functions. There is no way for SymPy etc to work around the fact that large integers just can’t be printed any more:
>>> print(10**10000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Exceeds the limit (4300) for integer string conversion
On the other hand though why am I even talking about making alternative functions to do what the previous functions already did? The alternative functions should be the new functions like safe_int
and safe_str
that are not susceptible to these problems. The existing functions int
and str
should just be left as they were and had been for many years.
I find it hard to see this as a real security fix. If code is doing int(gigabyte_long_untrusted_string)
then isn’t it obvious that that might be slow? Why are massive untrusted strings being fed into something like int
that clearly does nontrivial processing (check the length first?)? Couldn’t there just be an option like int(string, maxdigits=100)
? Isn’t this just something to be fixed in parsing libraries?
I can think of many other ways to address the potential security concerns but a global flag that breaks integer/string conversions would never have made it into my shortlist. There is one very simple way to address the security concern while not breaking integer/string conversions: this feature of limiting the size of integers could be optional and disabled by default. The application that actually wants this can enable the limit.
Please reconsider this change and do not allow it to become established as the status quo.