At the language summit, Mark Shannon and Guido expressed that they wished upgrading Python caused no breakages. (The topic was the C API, but I think the sentiment was more general.) I mentioned at the time that we tried to keep track of issues we ran into while upgrading Python from 3.9 to 3.10 within Google (my coworker @yilei did most of the work involved). During a few stolen moments over the last week I’ve tried to compile a list of categories of the breakages. I didn’t quite get all the way through the list of internal issues, but I think I looked at 75% of them. I’m also sure that lots of problems were fixed without filing issues. Still, I think this is a fairly representative list.
This does not include fixes to most (open-source) third-party packages, because the fix in pretty much all of those was just “update the package”, which were usually just too out of date. (Keeping third-party packages up to date is an ongoing and unrelated struggle.)
It’s also worth noting that we run tests with C assertions enabled, which is fairly uncommon outside of Google (by default, assertions are enabled only in a --with-pydebug
build, which most people don’t test with). This exposed quite a few bugs in C extensions over the years.
The list of categories below is distilled from ~100 issues that I looked at, but a lot of them caused multiple (sometimes many) test failures. Some only broke tests (like anything involving mocks), but most broke real code. I don’t think most of the breakages are unreasonable, although we could perhaps have avoided some of them if we’d approached things more carefully. I’m posting this list to provide some insight in why, in real-world upgrades across large amounts of code, things still break.
Changes to things that are obviously internal:
- Code objects created from manually transformed bytecode/lnotab needing updates to the new formats.
- Tests mocking things in the
_bootlocale
module (in order to emulate environments with specific locale settings) failing, because_bootlocale
was removed. - Tests checking refcounts (to detect leaks) where the refcount behaviour changed.
Documented deprecations:
- Uses of
PyArg_ParseTuple
withoutPY_SSIZE_T_CLEAN
. - Uses of the
asyncio
loop
argument that was removed. - Uses of the
collections.abc
ABCs from the oldcollections
names. - Uses of the removed
parser
andsymbol
modules - Incompatiblities involving
importlib
’s migration fromfind_module
tofind_spec
.
Obviously bugs in user code:
- Uses of C API functions without the GIL held (detected by new assertions)
- Uses of the C API without initialising the interpreter.
- C functions not checking for errors correctly (and triggering asserts when API functions are called with exceptions set).
- Unittest.mock
assrt_has_calls
and other methods that are now errors instead of silently passing. - Awaitables being passed to the wrong
asyncio
loop (tests accidentally or ignorantly reusing awaitables between runs that set up/tear down loops)
Sometimes-hard-to-avoid issues in tests:
- Golden output tests with floats.
- Golden output tests for
argparse
’s–help
which included the repr of built-in types. - Internal changes to modules affecting how/when mocks of builtins or internals are called.
Bugs in Python:
- C assertion error from the runtime while expecting a SyntaxError · Issue #100050 · python/cpython · GitHub
- Assertion errors/stale cache in typeobject.c when tp_version_tag is not cleared. · Issue #99293 · python/cpython · GitHub
Changes in Python:
- New file length limitation causing
OverflowError: line number table is too long
while importing the module. (The modules were generated code.) -
PyLong_AsLong()
and other C APIs no longer accepting floats like ‘0.0’. - Complex mocking/wrapping/meta-programming broken by
staticmethod
/classmethod
changes (bpo-43682). -
typing.Optional[]
andtyping.Union[]
changing from having an empty__name__
/_name
to‘Optional’
and‘Union’
broke tests that tried to verify type annotations. - Changes to class annotations in dataclasses broke meta-programming that tried to generate class annotations.
-
frame->f_lasti
meaning change (from byte to instruction offset) broke code interacting with it. - Tests mocking
module_under_test.__builtins__[‘open’]
instead ofbuiltins.open
broke. - New attribute on code objects broke custom caching/hashing/reuse of code objects.
- Hash values of
NaN
changing broke code inadvertently assuming differentNaN
s hashed the same.