One of Gentoo users has noticed that CPython 3.11.3 as installed on Gentoo is significantly larger than CPython 3.10.11. We’ve been able to determine that this is primarily because .pyc files became much larger.
To confirm this, I’ve measured:
$ qlist -e dev-lang/python:3.10 | grep '.py$' | xargs du -ab | awk '{ sum += $1 } END {print sum }'
29822961
$ qlist -e dev-lang/python:3.11 | grep '.py$' | xargs du -ab | awk '{ sum += $1 } END {print sum }'
30864264
$ qlist -e dev-lang/python:3.10 | grep '.opt-2.pyc$' | xargs du -ab | awk '{ sum += $1 } END {print sum }'
22449729
$ qlist -e dev-lang/python:3.11 | grep '.opt-2.pyc$' | xargs du -ab | awk '{ sum += $1 } END {print sum }'
46039037
So while .py files distributed by CPython grew roughly by 3.5%, the size of .opt-2.pyc grew 105%!
When poking around, I’ve been pointed to PEP 657. However, it claims an estimated increase of 22% which is 5 times less than what we’re observing.
Is this change expected, or are we perhaps hitting some bug?
Have you made sure of what is being measured? For example, could it be that more of the files have been bytecode-compiled with optimizations (to produce .opt-2.pyc) than before?
test_typing.py grew from 180K to 280K but the .pycs (each) grew from 235K to ~720K. test_socket.py grew from 247K to 250K but .pycs grew from ~210K to ~460K.
It’s possibly due to “PEP 657: Fine-grained error locations in tracebacks”. The location tables take extra space in the compiled .pyc file. There are a number of other bytecode changes in 3.11.x but I suspect the PEP 657 makes the biggest size change. I’m guessing though, haven’t confirmed that.
Yes, that was OP’s initial lead; but the size increases seem to be more than what OP expected. The PEP claims
As an illustrative example to gauge the impact of this change, we have calculated that including the start and end offsets will increase the size of the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB.
but OP is seeing them increase by more than 100%.
My guess: the PEP calculation is based on .pyc generation at normal optimization level. At full optimizations (.opt-2.pyc which I assume corresponds to the -OO setting), docstrings would be removed, but apparently the error-location tracking info is not - so the same amount of new info is being attached to a smaller base.
Honestly, now that I think about it, that sounds like an oversight to me. People who select full optimization for bytecode are, I would imagine, fully expecting to lose debugging info for the sake of saving space. The setting should probably cause that information to be removed, assuming the overall design of the new feature makes that possible.
Actually, I see roughly the same growth for all optimization levels (and the differences being minute at best). I’ve chosen .opt-2.pyc for the paste arbitrarily because I wanted all the results for the same optimization level.
Also, since I’m getting confused at this point: are we two (i.e. the user originally reporting the problem and myself) the only people experiencing larger .pyc sizes? Is the problem perhaps specific to Gentoo somehow?
This is a known issue, and I believe the situation should be at least somewhat improved in Python 3.12: see `.pyc` files are larger than they need to be · Issue #99554 · python/cpython · GitHub. It is due to PEP 659 and PEP 657 (note that the one PEP compounds the impact of the other PEP, which is why there’s such a significant increase in size). It seems unlikely to me that there’s a Gentoo-specific problem here.