If one wanted to reduce the size on disk of stdlib what do you think would be the best approach?
The use case is Pyodide where a REPL with CPython interpreter + stdlib currently takes ~6.4MB to download (and 4 to 5 s to load). A fair amount of that is due to the size of pure Python files in the stdlib. It’s gzip compressed, but there is very likely still some overhead from extracting individual .py files (and importing them without
.pyc), so reducing the size would help.
There are potentially several approaches,
- Use a minifier such as python-minifier. There the question is how much minification is too much. For instance I imagine it might be better to preserve local variable names for tracebacks.
- Only ship
.pycfiles. This does reduce size and possibly would help performance. This post from 10 years ago suggests that it would likely be very brittle, I’m not sure how up to up to date that analysis is. Also anyone knows the performance impact of not writing .pyc files?
- Remove some of the infrequently used modules (and package them as standalone packages). Related to a long thread about the stdlib here. Then one can’t really say that stdlib is included though. We are already removing some stdlib modules that don’t make sense the browser but the cost/benefit for removing more is not clear.
Another constraint is that a fair amount of use is interactive, so keeping docstrings, for instance is still very useful.
Are there other things that could be attempted (when building CPython from sources)? Any feedback would be much appreciated.