Categories of issues when upgrading Python from 3.9 to 3.10

At the language summit, Mark Shannon and Guido expressed that they wished upgrading Python caused no breakages. (The topic was the C API, but I think the sentiment was more general.) I mentioned at the time that we tried to keep track of issues we ran into while upgrading Python from 3.9 to 3.10 within Google (my coworker @yilei did most of the work involved). During a few stolen moments over the last week I’ve tried to compile a list of categories of the breakages. I didn’t quite get all the way through the list of internal issues, but I think I looked at 75% of them. I’m also sure that lots of problems were fixed without filing issues. Still, I think this is a fairly representative list.

This does not include fixes to most (open-source) third-party packages, because the fix in pretty much all of those was just “update the package”, which were usually just too out of date. (Keeping third-party packages up to date is an ongoing and unrelated struggle.)

It’s also worth noting that we run tests with C assertions enabled, which is fairly uncommon outside of Google (by default, assertions are enabled only in a --with-pydebug build, which most people don’t test with). This exposed quite a few bugs in C extensions over the years.

The list of categories below is distilled from ~100 issues that I looked at, but a lot of them caused multiple (sometimes many) test failures. Some only broke tests (like anything involving mocks), but most broke real code. I don’t think most of the breakages are unreasonable, although we could perhaps have avoided some of them if we’d approached things more carefully. I’m posting this list to provide some insight in why, in real-world upgrades across large amounts of code, things still break.

Changes to things that are obviously internal:

  • Code objects created from manually transformed bytecode/lnotab needing updates to the new formats.
  • Tests mocking things in the _bootlocale module (in order to emulate environments with specific locale settings) failing, because _bootlocale was removed.
  • Tests checking refcounts (to detect leaks) where the refcount behaviour changed.

Documented deprecations:

  • Uses of PyArg_ParseTuple without PY_SSIZE_T_CLEAN.
  • Uses of the asyncio loop argument that was removed.
  • Uses of the collections.abc ABCs from the old collections names.
  • Uses of the removed parser and symbol modules
  • Incompatiblities involving importlib’s migration from find_module to find_spec.

Obviously bugs in user code:

  • Uses of C API functions without the GIL held (detected by new assertions)
  • Uses of the C API without initialising the interpreter.
  • C functions not checking for errors correctly (and triggering asserts when API functions are called with exceptions set).
  • Unittest.mock assrt_has_calls and other methods that are now errors instead of silently passing.
  • Awaitables being passed to the wrong asyncio loop (tests accidentally or ignorantly reusing awaitables between runs that set up/tear down loops)

Sometimes-hard-to-avoid issues in tests:

  • Golden output tests with floats.
  • Golden output tests for argparse’s –help which included the repr of built-in types.
  • Internal changes to modules affecting how/when mocks of builtins or internals are called.

Bugs in Python:

Changes in Python:

  • New file length limitation causing OverflowError: line number table is too long while importing the module. (The modules were generated code.)
  • PyLong_AsLong() and other C APIs no longer accepting floats like ‘0.0’.
  • Complex mocking/wrapping/meta-programming broken by staticmethod/classmethod changes (bpo-43682).
  • typing.Optional[] and typing.Union[] changing from having an empty __name__/_name to ‘Optional’ and ‘Union’ broke tests that tried to verify type annotations.
  • Changes to class annotations in dataclasses broke meta-programming that tried to generate class annotations.
  • frame->f_lasti meaning change (from byte to instruction offset) broke code interacting with it.
  • Tests mocking module_under_test.__builtins__[‘open’] instead of builtins.open broke.
  • New attribute on code objects broke custom caching/hashing/reuse of code objects.
  • Hash values of NaN changing broke code inadvertently assuming different NaNs hashed the same.
12 Likes

At least for these 3, in our (Meta) effort to upgrade our monorepo to 3.10, we’ve came across hundreds (if not thousands) of instances of each of these.
This is true for both internal code and the thousands of open-source third-party libraries we use. While for third-party, “upgrade the package” would have helped, this is not an easy thing to do in a monorepo.

Some other issues we have seen (non-exhaustive):

  • urllib.parse silently removing “invalid characters”
  • urllib.parse changing behavior when parsing URLs of the form “foo:443” (“foo” being the schema vs “foo” being part of the netloc or something of that sort)
  • tools and libraries that assume that the first 3 characters of the version string (e.g. “3.1” for 3.10) is all they need to care about
  • changes in the signature of traceback.format_exception (etype became position only)
  • change in the behavior of frozen dataclasses inheriting from non-frozen dataclasses (or the other way around) when the base class is abstract
  • Namedtuple multiple inheritance (never really worked, became a TypeError at some point)
  • Transition of asyncio APIs for Task to asyncio module (e.g. all_tasks)
  • decode/encode string methods removal
  • something something mock specs? (my memory is fuzzy)

Many of those have been deprecated for a loooong time, so it’s mostly on us, but it doesn’t change the reality that we still get bit.

8 Likes

By decode/encode I think @itamaro meant base64 encodestring/decodestring

Also I have found that regenerating .c files from .pyx means lots of third-party projects are compatible again with 3.10. That some times involves pulling said .pyx files from their github repos. Because updating third-party packages means API changes, which means a whole new set of changes that I would love to be out of scope of a simple python version upgrade.

3 Likes

Yeah I think that Python Version upgrades automatically mean third-party package upgrades makes them much harder.

2 Likes

‘golden output’ as a synonym for ‘correct output’ is new to me. I am familiar with ‘gold-standard’ procedures, which are assumed to give ‘correct output’ and used to test new procedures, typically easier or cheaper or faster (and possibly expected to be tolerably less accurate). Did we intentionally change float represenations? For error messages, testing for the presence of a key word is usually considered good enough, though it will not guard against careful sabotage or accidents that garble messages.

ah I remembered another common issue - there was a change that made paths on codeobjects to always be absolute paths. this broke a bunch of tests we had with “golden output”, but more severely - it broke some components of our sampling profiler that made some assumptions about how the paths looked like.

1 Like

Looking ahead to the 3.12 release, we can anticipate that it will be common to encounter the new SyntaxWarning for invalid escape sequences.

ISTM that most people don’t remember all the valid escape sequences for regular strings and regexes so they mostly have followed the rule: when in doubt, add an escape.

The other place I’m encountering the warnings is in large docstrings which contain plain text including DOS-style path names or in ASCII art. The fix is usually to make it a raw string, but it is an irritant.

1 Like

The removal of distutils could also trigger a number of failures to install dependencies. Not sure how bad that is going to be.

Yup, I’ve encountered multiple cases of backslashed paths in docstrings as past developers had tried to explain some oddity of behavior on Windows, as I’ve been trying to squash warnings. “Irritant” is one word for it … :slight_smile:

1 Like