Reduce size and improve compression of .pyc files

rth · April 1, 2023, 10:07am

When shipping Python source code for the browser the download size is rather important.

Recently in Pyodide we have started distributing py-compiled packages for Python 3.11, which are faster to import. However, the problem is that they are rather larger and less well-compressible than the original .py sources. Since the CDN and browsers use compression, typically gzip or Brotli, what matters is mostly the size after the compression rather than the initial size.

For instance, for the standard library in Python 3.11 (only .py files, excluding tests, and some modules we unvendor)

original size (8.5 MB), gzip compressed (2MB, x4.25 compression ratio), brotli compressed (1.4 MB, x6 compression ratio)
py-compiled size (9.8 MB), gzip compressed (3.7M, x2.6 compression ratio), brotli compressed (2.5 MB, x3.9 compression ratio)

Here, there are main two issues,

That py-compressed files are larger than the original, but from what I understand there was a significant increase in py3.11 and some of it will be fixed for py3.12 cpython#99443
A more problematic point is that standard compression algorithms are not very good at compressing .pyc files, which is a bit surprising given that .pyc contains less absolute information than the source (since comments and docstrings are stripped from what I understand, but I imagine there must be other badly compressible data)

So I was wondering if anyone has ideas of what could be done to improve the compressibility of .pyc files with gzip and brotli compression? We can’t change the compression parameters, since those are fixed by the environment we are in.

malemburg · April 1, 2023, 11:54am

Have you tried compiling the .pyc files without line debug infos, e.g. using PYTHONNODEBUGRANGES (1. Command line and environment — Python 3.11.2 documentation) ?

CAM-Gerlach · April 1, 2023, 4:42pm

You can also compile your .pycs with the -OO option or the PYTHONOPTIMIZE=2 env var to strip aseert statements, __debug__ blocks and docstrings. Not sure it will make a huge difference, but it should help some (especially the often rather lengthy stdlib docstrings).

This isn’t actually that surprising at least to me, because the standard compression algorithms you mention are generally best at compressing text. The original source text files much better match the cases for which the text compression algorithms you mention are designed and optimized for than the binary bytecode files, and contain more easily-recognizable and higher-level patterns with which such algorithms can work with.

kknechtel · April 2, 2023, 9:41am

Docstrings are not stripped by default, since they can by default be introspected and used (by the built-in help and by directly looking for a __doc__ property). Stripping them requires the -OO flag to Python (which also implies -O, which will remove asserts and statements dependent on __debug__).

They consist mainly of compiled bytecode, which is fundamentally binary and should not be expected to be very compressible. Try hex-dumping one to get a sense of it.

If you have specifically noticed increase size in 3.11, this is most likely because of new additional metadata for remembering what part of the source code corresponds to each opcode, so that tracebacks can highlight the appropriate part of the line (after fetching the source from the corresponding .py). Of course, this is useless to you if you are not shipping .py files. As mentioned, generating this debug info can also be disabled.

rth · April 5, 2023, 8:07pm

Thank you all for your responses and suggestions!

Here are the sizes of the stlib when those two options are applied additionally during py-compilation,

Processing \ Compression	None	.gz	.br
Original	8.91	2.07	1.45
pyc	10.16	3.8	2.6
pyc: OO	8.69	3.26	2.21
pyc: nodebug-ranges	9.35	2.78	1.96
pyc: nodebug-ranges + OO	7.88	2.26	1.58

and the corresponding compression ratios,

Processing \ Compression	None	.gz	.br
Original	1	4.3	6.1
pyc	1	2.7	3.9
pyc: OO	1	2.7	3.9
pyc: nodebug-ranges	1	3.4	4.8
pyc: nodebug-ranges + OO	1	3.5	5

So indeed applying the -OO option and the PYTHONOPTIMIZE=2 env var does help significantly, particularly after compression. It’s still larger then the original source compressed however.

By curiosity, I also tried to apply of bunch of other compression algorithms on the bytecode (xz, lzma, zstd) but the performance was similar to brotli at best.

pf_moore · April 5, 2023, 8:57pm

So I guess the takeaway from this is that if download size is the key factor, compressed source is best. Obviously, if you want faster load times, shipping .pyc files is better as they are pre-compiled and so you skip the compile step on import. As with most things, it’s a trade-off.

steve.dower · April 5, 2023, 10:47pm

ISTR some benchmarks showing that disabling .pyc writing and loading from .py files is fairly comparable on most systems. You don’t start getting real performance wins until you merge all the .pyc into a single file and prearrange data structures so you don’t need to unmarshal anything. (Not that this has any reason to hold true for WASM. Looking forward to new benchmarks here )

rth · April 5, 2023, 10:59pm

If the choice is between shipping .py and .pyc it’s indeed a bit of a size / load time tradeoff.
If one accepts a post-processing step when installing files the situation might be slightly better.

For instance, if we look at parsing / compilation time for the stdlib with the following script,

from pathlib import Path
import ast

for path in Path('stdlib').glob("**/*.py"):
    code = path.read_text()                                                   
    ast_tree = ast.parse(code)                                                
    compiled_code = compile(ast_tree, filename="<string>", mode="exec")

on my laptop I get 0.67 s for ast.parse and 0.42s for compile (while using py_compile directly takes around 1s). So, I was thinking,

if we serialize ast to json, and compress that with brotli we get 1.2MB which is less than the original source compressed, while also preserving e.g. docstrings, and avoiding the parsing part of the py-compilation. Unless it’s a bad idea to distribute AST code (assuming the Python version is fixed)?
I hear that ruff is quite fast for linting, from what I understand in part thanks to the RustPython parser. For instance, it takes 250ms of user time to process the stdlib, including loading files. Could one potentially use RustPython for parsing code and CPython for compiling it? Though of course if it only saves 10% percent of load time that effort would not be worth it.

The other direction is specialized pre-compressors. As you rightly mentioned classical compression algorithms do not work well with bytecode. However specialized bytecode compressors do exist, for instance PAQ family of compressors had some for x86 executables. Also, WebAssembly defines a binary code format that works for the web. They achieve this by multi-layer compression where there is a first domain specific compressor (layer 1)

Layer 1 provides structural compression on top of layer 0, exploiting specific knowledge about the nature of the syntax tree and its nodes. The structural compression introduces more efficient encoding of values, rearranges values within the module, and prunes structurally identical tree nodes.

on top of which more standard compression algorithms are applied such as gzip, brotli. It probably doesn’t help that there is no standard spec for marshal output format. But I guess it’s not impossible to design some custom compressors (possibly that would read headers for a given Python version) for people who enjoy those kinds of problems?

rth · April 5, 2023, 11:05pm

ISTR some benchmarks showing that disabling .pyc writing and loading from .py files is fairly comparable on most systems.

Browser runtime is a bit weird, where CPython execution time is currently several times slower than native and the filesystem is in memory but implemented in JavaScript.

So for instance, if we take the stdlib which takes ~1s to py-compile on my laptop single threaded. It would take several seconds in a browser + add packages and that starts to be noticeable. Here are some benchmarks where py-compiled sources were between 1.5x and 2x faster to import than py. with writing of .pyc disabled, though that was for Python 3.10.