In Pyodide, we started to test vendoring standard libraries and packages in a compiled format (.pyc) in order to reduce the load time (+ download size).
For standard libraries, we are packing them into a zip file (using PyZipFile, adapted from upstream CPython wasm build thanks to @tiran), and for packages, we are thinking of compiling all .py files inside the wheel archive into a .pyc file. So far, it works nicely.
However, there is a problem: the file path shown in the exception traceback is incorrect. This happens because the file path for the traceback is compiled into the .pyc file during the compile time.
So if the .pyc file is installed in a different directory during runtime, the file path in the traceback is not very useful and it is confusing.
For example, @tiran 's CPython WASM REPL (which ships standard libraries in a zip file) shows:
>>> import pathlib
>>> pathlib.__file__
'/usr/local/lib/python311.zip/pathlib.pyc'
>>> pathlib.Path("a/b").write_text("1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/python-wasm/cpython/Lib/pathlib.py", line 1078, in write_text
File "/python-wasm/cpython/Lib/pathlib.py", line 1044, in open
FileNotFoundError: [Errno 44] No such file or directory: 'a/b'
/python-wasm/cpython/Lib/pathlib.py in the traceback is not very useful.
Therefore, I want to discuss: “If we only ship .pyc files without corresponding .py files, what would be the most proper path to show in the exception traceback?”
Here’s what I use in eGenix PyRun, which comes with a PYC compiled stdlib frozen into the binary:
>>> import pathlib
>>> pathlib.__file__
'<pyrun>/pathlib.py'
>>> pathlib.Path("a/b").write_text("1")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "<pyrun>/pathlib.py", line 1152, in write_text
File "<pyrun>/pathlib.py", line 1117, in open
FileNotFoundError: [Errno 2] No such file or directory: 'a/b'
This approach has been working fine for years. Since PyRun does not include the source code, the source location is not meaningful and by using the prefix <pyrun> we make sure that stdlib traceback tools such as linecache fail early.
The broader question is why can’t we display the actual path detected at runtime? I understand for .pyc files in __pycache__ that’s not useful, but for the use-case of distributing py-compiled modules only, it would much less confusing to users. Particularly when such a distribution is used for teaching Python.
How much of a performance difference are you seeing?
I think Meta made some proposal at some point about having a mechanism to point to the source in a separate directory, but I don’t remember where that ended up. I don’t have a link handy, though, so this is from memory.
Otherwise the closest current proposal around this is probably “__pysource__” file layout for installed modules . Beyond that I don’t think this comes up enough for anyone to have proposed a mechanism that has been accepted.