Proper file paths to show in exception for py-compiled files?

In Pyodide, we started to test vendoring standard libraries and packages in a compiled format (.pyc) in order to reduce the load time (+ download size).

For standard libraries, we are packing them into a zip file (using PyZipFile, adapted from upstream CPython wasm build thanks to @tiran), and for packages, we are thinking of compiling all .py files inside the wheel archive into a .pyc file. So far, it works nicely.

However, there is a problem: the file path shown in the exception traceback is incorrect. This happens because the file path for the traceback is compiled into the .pyc file during the compile time.

So if the .pyc file is installed in a different directory during runtime, the file path in the traceback is not very useful and it is confusing.

For example, @tiran 's CPython WASM REPL (which ships standard libraries in a zip file) shows:

>>> import pathlib
>>> pathlib.__file__
'/usr/local/lib/python311.zip/pathlib.pyc'
>>> pathlib.Path("a/b").write_text("1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python-wasm/cpython/Lib/pathlib.py", line 1078, in write_text
  File "/python-wasm/cpython/Lib/pathlib.py", line 1044, in open
FileNotFoundError: [Errno 44] No such file or directory: 'a/b'

/python-wasm/cpython/Lib/pathlib.py in the traceback is not very useful.

Therefore, I want to discuss: “If we only ship .pyc files without corresponding .py files, what would be the most proper path to show in the exception traceback?”

Thank you!

cc: @rth

Here are some ideas discussed in Pyodide:

For zipped standard libraries,

  • show relative path only (e.g. pathlib.py). It seems like the embeddable package for Windows shows in this way.
  • show the actual path inside a zip file (e.g. /lib/python311.zip/pathlib.py)
  • prepend some fake path (e.g. /lib/python3.11/pathlib.py)

For packages (wheels), maybe just show relative paths, as theoretically users can install packages into any arbitrary directories.

Here’s what I use in eGenix PyRun, which comes with a PYC compiled stdlib frozen into the binary:

>>> import pathlib
>>> pathlib.__file__
'<pyrun>/pathlib.py'
>>> pathlib.Path("a/b").write_text("1")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<pyrun>/pathlib.py", line 1152, in write_text
  File "<pyrun>/pathlib.py", line 1117, in open
FileNotFoundError: [Errno 2] No such file or directory: 'a/b'

This approach has been working fine for years. Since PyRun does not include the source code, the source location is not meaningful and by using the prefix <pyrun> we make sure that stdlib traceback tools such as linecache fail early.

Thanks for the feedback!

The broader question is why can’t we display the actual path detected at runtime? I understand for .pyc files in __pycache__ that’s not useful, but for the use-case of distributing py-compiled modules only, it would much less confusing to users. Particularly when such a distribution is used for teaching Python.

How much of a performance difference are you seeing?

I think Meta made some proposal at some point about having a mechanism to point to the source in a separate directory, but I don’t remember where that ended up. I don’t have a link handy, though, so this is from memory.

Otherwise the closest current proposal around this is probably “__pysource__” file layout for installed modules . Beyond that I don’t think this comes up enough for anyone to have proposed a mechanism that has been accepted.

According to the benchmark by @rth (pyodide#3166, pyodide#3253):

Initial load time

Method Firefox (106.0.5) Chrome (106, console opened) Chrome (106, console closed)
.py 1335 ± 50 ms 1427 ± 32 ms 877 ± 70 ms
.pyc (+ zipimport) 852 ± 90 ms 618 ± 10 ms 461 ± 5 ms

Package load time

Package Load + import time (.py) (s) Load + import time (.pyc) (s) Speed up (x)
pandas 3.7 ± 0.1 (Firefox ) / 4.5 ± 0.01 (Chrome) 2.4 ± 0.005 (Firefox)/ 2.2 ± 0.005 (Chrome) x1.5 (Firefox) / x2.0 (Chrome)
sympy 2.8 ± 0.02 (Firefox ) / 5.0 ± 0.05 (Chrome) 1.3 ± 0.02 (Firefox ) / 1.7 ± 0.01 (Chrome) x2.1 (Firefox) / x2.9 (Chrome)
scikit-learn, scipy, numpy, joblib 7.87 ± 0.15 (Firefox ) / 6.2 ± 0.1 (Chrome) 6.65 ± 0.2 (Firefox ) / 4.6 ± 0.6 (Chrome) x1.2 (Firefox) / x1.3 (Chrome)
2 Likes