Significant increase in .pyc sizes in Python 3.11

Hello,

One of Gentoo users has noticed that CPython 3.11.3 as installed on Gentoo is significantly larger than CPython 3.10.11. We’ve been able to determine that this is primarily because .pyc files became much larger.

To confirm this, I’ve measured:

$ qlist -e dev-lang/python:3.10 | grep '.py$' | xargs du -ab  | awk '{ sum += $1 } END {print sum }'
29822961
$ qlist -e dev-lang/python:3.11 | grep '.py$' | xargs du -ab  | awk '{ sum += $1 } END {print sum }'
30864264
$ qlist -e dev-lang/python:3.10 | grep '.opt-2.pyc$' | xargs du -ab  | awk '{ sum += $1 } END {print sum }'
22449729
$ qlist -e dev-lang/python:3.11 | grep '.opt-2.pyc$' | xargs du -ab  | awk '{ sum += $1 } END {print sum }'
46039037

So while .py files distributed by CPython grew roughly by 3.5%, the size of .opt-2.pyc grew 105%!

When poking around, I’ve been pointed to PEP 657. However, it claims an estimated increase of 22% which is 5 times less than what we’re observing.

Is this change expected, or are we perhaps hitting some bug?

Have you made sure of what is being measured? For example, could it be that more of the files have been bytecode-compiled with optimizations (to produce .opt-2.pyc) than before?

I’ve just done that and the numbers suggest that all new .pyc files correspond to new .py files:

$ qlist -e dev-lang/python:3.10 | grep '.py$' | wc -l
1714
$ qlist -e dev-lang/python:3.11 | grep '.py$' | wc -l
1742
$ qlist -e dev-lang/python:3.10 | grep '.opt-2.pyc$' | wc -l
1685
$ qlist -e dev-lang/python:3.11 | grep '.opt-2.pyc$' | wc -l
1712

I have also looked at individual file sizes and these indicate the issue more prominently:

$ qlist -e dev-lang/python:3.10 | grep '.opt-2.pyc$' | xargs du --apparent -b | sort -nr | head -n 10
473503  /usr/lib/python3.10/pydoc_data/__pycache__/topics.cpython-310.opt-2.pyc
239694  /usr/lib/python3.10/test/__pycache__/test_typing.cpython-310.opt-2.pyc
230367  /usr/lib/python3.10/test/__pycache__/test_descr.cpython-310.opt-2.pyc
211754  /usr/lib/python3.10/test/__pycache__/test_socket.cpython-310.opt-2.pyc
195871  /usr/lib/python3.10/test/test_email/__pycache__/test_email.cpython-310.opt-2.pyc
188780  /usr/lib/python3.10/test/__pycache__/datetimetester.cpython-310.opt-2.pyc
184531  /usr/lib/python3.10/test/__pycache__/test_inspect.cpython-310.opt-2.pyc
168452  /usr/lib/python3.10/test/__pycache__/_test_multiprocessing.cpython-310.opt-2.pyc
164277  /usr/lib/python3.10/test/__pycache__/test_argparse.cpython-310.opt-2.pyc
157918  /usr/lib/python3.10/test/__pycache__/test_io.cpython-310.opt-2.pyc
$ qlist -e dev-lang/python:3.11 | grep '.opt-2.pyc$' | xargs du --apparent -b | sort -nr | head -n 10
734405  /usr/lib/python3.11/test/__pycache__/test_typing.cpython-311.opt-2.pyc
469286  /usr/lib/python3.11/pydoc_data/__pycache__/topics.cpython-311.opt-2.pyc
468125  /usr/lib/python3.11/test/__pycache__/test_socket.cpython-311.opt-2.pyc
457927  /usr/lib/python3.11/test/__pycache__/test_descr.cpython-311.opt-2.pyc
456857  /usr/lib/python3.11/test/__pycache__/datetimetester.cpython-311.opt-2.pyc
384725  /usr/lib/python3.11/test/__pycache__/_test_multiprocessing.cpython-311.opt-2.pyc
367678  /usr/lib/python3.11/test/__pycache__/test_enum.cpython-311.opt-2.pyc
364673  /usr/lib/python3.11/test/__pycache__/test_inspect.cpython-311.opt-2.pyc
360672  /usr/lib/python3.11/test/__pycache__/test_decimal.cpython-311.opt-2.pyc
353031  /usr/lib/python3.11/test/test_email/__pycache__/test_email.cpython-311.opt-2.pyc

test_typing.py grew from 180K to 280K but the .pycs (each) grew from 235K to ~720K. test_socket.py grew from 247K to 250K but .pycs grew from ~210K to ~460K.

1 Like

It’s possibly due to “PEP 657: Fine-grained error locations in tracebacks”. The location tables take extra space in the compiled .pyc file. There are a number of other bytecode changes in 3.11.x but I suspect the PEP 657 makes the biggest size change. I’m guessing though, haven’t confirmed that.

Yes, that was OP’s initial lead; but the size increases seem to be more than what OP expected. The PEP claims

As an illustrative example to gauge the impact of this change, we have calculated that including the start and end offsets will increase the size of the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB.

but OP is seeing them increase by more than 100%.

My guess: the PEP calculation is based on .pyc generation at normal optimization level. At full optimizations (.opt-2.pyc which I assume corresponds to the -OO setting), docstrings would be removed, but apparently the error-location tracking info is not - so the same amount of new info is being attached to a smaller base.

Honestly, now that I think about it, that sounds like an oversight to me. People who select full optimization for bytecode are, I would imagine, fully expecting to lose debugging info for the sake of saving space. The setting should probably cause that information to be removed, assuming the overall design of the new feature makes that possible.

Actually, I see roughly the same growth for all optimization levels (and the differences being minute at best). I’ve chosen .opt-2.pyc for the paste arbitrarily because I wanted all the results for the same optimization level.

Also, since I’m getting confused at this point: are we two (i.e. the user originally reporting the problem and myself) the only people experiencing larger .pyc sizes? Is the problem perhaps specific to Gentoo somehow?

This is a known issue, and I believe the situation should be at least somewhat improved in Python 3.12: see `.pyc` files are larger than they need to be · Issue #99554 · python/cpython · GitHub. It is due to PEP 659 and PEP 657 (note that the one PEP compounds the impact of the other PEP, which is why there’s such a significant increase in size). It seems unlikely to me that there’s a Gentoo-specific problem here.

Cc. @brandtbucher, who has been working on this.