Sure, and until you demonstrate otherwise, that function remains int
. Either by setting sys.set_int_max_str_digits(0)
once at the start of your application, or with some future keyword argument like int(string, maxdigits=0)
, or with a context manager.
The simplest, low-impact solution to this issue for application writers will surely be to set the environment variable PYTHONINTMAXSTRDIGITS=0
or to call sys.set_int_max_str_digits(0)
once at the start of their application.
I suppose that there will still be some overhead: internally, the int()
function will contain a test that it didn’t have before, but I doubt that it will cause a significant performance regression. In any case, the onus is on somebody to demonstrate that performance regression before we consider acting on it.
Before deciding that we need a new builtin that duplicates the old int
(i.e. one which performs string to int conversions but with no length check) it will be necessary to demonstrate that new int
with length check has such a severe performance regression that we cannot live with it.
As far as library authors go, things are a little murkier.
Regardless of how we spell the old int()
with no length check (setting the global variable, using a context manager, using a new parameter to int()
, or using a hypothetical new builtin), libraries should not do that as it may reintroduce the DOS vulnerability into applications that have not taken steps to mitigate it with their own length checks.
Since Python doesn’t have any concept of tainted and untainted strings, libraries in general must assume that all strings are potentially from an untrusted source, and always leave it up to the caller (the application) to set the string max digits.
That means that even if we introduce a new builtin that ignores the length check, libraries should never use it.
Instead, the pattern should be:
-
the application sets the global, or uses a context manager:
with int_check(maxdigits=0):
result = library.function(string) # calls int(string)
-
the application explicitly passes on a parameter to the library to use:
obj = library.SomeClass(arg, maxdigits=0)
result = obj.method(string) # calls int(string, maxdigits=self.maxdigits)
By the way, it is now nearly seven weeks since this vulnerability was made public. There are surely hundreds, or thousands, of web servers around the world still running older, unpatched versions of Python. Do we have any examples of concrete exploits in the wild for this threat yet? A threat so urgent that the security team sat on it for over two years before rushing out a breaking change without community consultation or discussion.
There’s large, and then there’s LARGE. Outside of integer maths applications like Sage nearly all applications will consider a 100 digit number to be inconceivably “large”, and a thousand digit number to be invariably some sort of data corruption, let alone the default setting of 4300. The total number of subatomic particles in the entire universe can be written out using just 80 digits. If we include photons (light) we just need 90 digits.
It is only maths geeks like you, and me, that worry about being able to handle million digit numbers.