Int/str conversions broken in latest Python bugfix releases

Those definitely seem like a good starting point.

In hindsight, perhaps the process we should have used here would be to write a PEP? It would not have been published until after the patches, unless the SC required it, but at least we would have written up a standard form document that would likely have included short descriptions of the rejected alternatives.

8 Likes

Actually, it can be! I don’t run server farms, so I don’t have natural sympathy for mitigating “DoS attacks”. Nevertheless I respect that it’s an important application area for Python.

Contrarily, I expect that people running server farms lack “natural” sympathy for application areas they don’t work in.

In the terminal I usually use for running interactive Python, a decimal string that fills a full screen is about 10**6500. How much of a screen an int output fills gives a very helpful visual & visceral instant sense for the number’s magnitude. “Oh - the last but one output filled a half a screen, but the latest one only about a quarter - the number of bits in the error has been cut in half!”. No mental arithmetic needed. Same underlying reason a graph or histogram can yield effortless insights.

I wouldn’t say that’s a major use case, but it is a use case. The output base is irrelevant to this, but decimal is what we get by default so it’s what I usually settle for. Doesn’t really matter to me if it takes a few seconds for the output to get generated either.

And I can certainly live with the new scheme - I probably won’t even bother to turn it off until it first bites me (at about 2/3rds of a screen’s worth of output :wink:).

2 Likes

That would have been better. The most important thing missed off of Neil’s list above from my perspective is about the process used here. The way that this change was carried out makes me question what I can trust from Python itself for compatibility in future if longstanding fundamental features of the core are not necessarily stable even across patch releases. The precedent set actually concerns me more than the change itself and that was my main motivation for starting this thread rather than just accepting the change and moving on.

While I understand the desire to fix this in private I don’t think that there was a real need for that in this case. Having the discussion in the open would probably have resulted in a better fix. For example the provided compatibility mechanism (a global flag) is actually not a very useful mechanism for those who are adversely affected: it makes sense from the security perspective of wanting to limit things but not really from the perspective of someone making software that actually uses the affected features. In an open PEP discussion these issues would have been thrashed out (presumably at interminable length :wink: ).

Some of the comments above indicate that the compatibility policy generally prohibits adding new APIs in a patch release and that this affected the thinking behind the change (e.g. why is there no large_int function provided as an alternative to int). That idea can’t stand up though when breaking compatibility which is much more significant than adding new APIs. If breakage is going to happen in a patch release then anything that mitigates that (including new APIs) should also be on the table.

I’d like to see PEP 387 updated or replaced to give better clarity about how to handle things in the situation where the SC does grant an exception. Basic things like providing a like for like replacement for an affected function should be considered in all cases such as this.

11 Likes

Here are some other questions that I think could be good to add to the FAQ

  • Why is str(integer) being limited? The CVE only mentions int(text).
  • Why is this an opt-out feature instead of an opt-in?
  • Why do big integers in the source code cause syntax error?
  • Is the Decimal module affected by this change?
9 Likes

You likely won’t see this, sorry - the SC override is the massive escape hatch to all other process. This is intentional and by design, and will remain that way unless we get a steering council that chooses to tie their own hands.

At the very least, repeating the point on this thread is not going to achieve anything - report it on GitHub - python/steering-council: Communications from the Steering Council and they’ll discuss it at one of their meetings.

2 Likes

Personally, even at this “late date” I think a PEP would be better than a FAQ. If nothing else, it provides a standard location where people can find it and a standard reference for future release notes.

7 Likes

There’s a section in the docs that can be referenced: https://docs.python.org/3/library/stdtypes.html#integer-string-conversion-length-limitation

(It can probably also be improved from some of the discussion that’s taken place here.)

Apparently we should also try and get the CVE entries updated, though I understand that’s been pretty painful. Another reason we want to take control of that process ourselves rather than relying on “donated” CVE IDs.

But either of those ought to be a sufficient reference point - the coming FAQ is really just a thread summary for newcomers here. A PEP now would just be documentation, which the documentation ought to cover.

3 Likes

Speaking about those docs, maybe this isn’t the right place to discuss this, but the sentence “Even the best known algorithms for base 10 have sub-quadratic complexity” makes no sense. Something having sub-quadratic time complexity is a good thing, and not a bad thing. Sub-quadratic includes everything below quadratic. For example linear algorithms are sub-quadratic.

3 Likes

I believe that was supposed to say super-linear. The good algorithm is n*log(n)^2.

My attempt at a FAQ has been created as a github issue. I thought that’s a decent way to do it so we can revise.

3 Likes

Then you’d have to update the docs for all versions which have been modified. (Maybe that’s fine…)

Sorry, I had that the wrong way round. For int to str conversion the check is O(1) and can be seen here:

That doesn’t actually show the number of digits, though, does it?

I think the doc is just wrong, the number of digits is not shown there (anymore?). But I can’t test it.

I believe you’re right - the quoted doc is obsolete. Here under a fresh build of CPython’s main branch:

>>> i = int('2' * 4300)
>>> len(str(i))
4300
>>> i_squared = i*i
>>> len(str(i_squared))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Exceeds the limit (4300) for integer string conversion

The number of digits is not shown now.

2 Likes

It’s been like that forever, and essentially remains so even if you supply your own memory-allocation functions (GMP will ignore any “error-value” return they may deliver).

Hi Tim! This is really not at all the point of this thread, but since it’s come up now: I recently learned how some projects that don’t want to crash on memory allocation issues (or that have a GC and want to move big integers) use GMP. They do the output size computation and allocation themselves and then afterwards call GMP’s low level API. Here’s the Racket code for example. This has lots of disadvantages obviously, since a lot of the GMP logic has to now reside in the language implementation.

2 Likes

Additional suggestion for the FAQ:

  • What is the recommended way for libraries to adapt their code, if they require the old behavior and don’t wish to alter global state?

Example: perhaps int(Decimal(string))?

Which leads to another question (though not a frequently asked one :slight_smile: ): Will Decimal <-> int conversions be limited in a future point-release for Python 3.7 – 3.10?

3 Likes

First one is not needed, as Python does not use so-called semver (and is older than it!)

2 Likes

While there are no guarantees, I doubt they’ll become limited. int <-> string are ubiquitous in web apps & libraries of all kinds, and across all the programming languages used for web programming. For examples, loading numbers from JSON, and converting an HTTP Content-Length header, are supported by all common language environments.

Decimal is much more a Python thing. There’s no universally deployed web infrastructure relying on it or on its concepts. If some popular web app is developed in Python that uses Decimal and can be provoked into problem Decimal <-> int cases, it should suffice to ask the authors of that app to change their code to check for them explicitly. At this point, it’s a real stretch to imagine we’d ever see even two such apps :stuck_out_tongue_winking_eye:.

That’s intractable for str <-> int, though, because those show up across some millions of lines of code across some thousands of apps and libraries. Nobody will ever find them all, and new cases would be introduced faster than old ones could be hardened.

3 Likes

Whilst thats a solid answer, i develop web apps, with Python / Django, and we use decimal for calculations for accuarcy and quantization, however, generally display to the user as an integer or string, which is more universally native, particularly in javascript. At the moment, i would have no concerns for what we do, but i might consider raising this with the team on the basis that python could potentially break reasonable code without advance notice, so its could it be worth notifying people of changes, and point them to the documentation which explains why the change has been made and also offer solutions to common use cases that would be affected, and also where someone can get some guidance should the comkon use cases not suffice. I can think of a couple of ways to defeat this, for example, what if you did the same test, but quantized the decimal to Decimal("1") and then casting it to an int() Im not suggesting you do the leg work and test it, but im curious if that would make a difference as it wasnt illustrated in in you test example you showed me earlier on in this post.

3 Likes

Once again: please keep this discussion on the specific topic of int/str conversion speed, and the implementation of that. That includes questions you’d like answered in a potential FAQ; such questions should be about int/str and should not include further commentary.

(This is not a reference to the post above me, that’s just the linear nature of these threads.)

At this point we have a maintainer writing an FAQ, multiple statements from maintainers on the reasoning and scope of the change, and acknowledgement that further work is happening on a faster implementation. If users continue to go off topic, I will be closing the thread as it seems to have served its purpose in communicating the issue and spurring further development.

6 Likes