Here’s a proposal to fix several niggles we found when distributing Python libraries in Fedora. What do you think?
For modules loaded directly from bytecode cache (
*.pyc) files, Python will
look for corresponding source in a
The existing ability to load modules from
*.pyc files only is
unchanged, but conceptually it becomes a special case of a “pyc-first”
Most pure Python code is installed as a source file (
*.py), combined with a
bytecode cache file (
__pycache__/*.pyc), which is created/updated ahead of
time or on demand.
This layout is designed for rapid iteration. Each time a module is imported,
Python assumes the source might have changed: if a bytecode cache is present,
Python normally checks whether it still corresponds to the source.
PEP 552 introduced an “unchecked” mode, in which this check is skipped.
However, this causes updates to the source to be silently ignored, possibly
confusing users that aren’t aware of this rarely used mode.
The remaining checking modes have their own disadvantages.
In both, the best case scenario (the cache is present and fresh), Python must
access at least two files (the source and the cache). Further:
- In the timestamp-based mode, the source file’s last-modification time is
used as part of the cache key, causing issues with reproducible builds
as described in PEP 552 .
- In the hash-based mode, the entire source file is read and hashed.
This is potentially a slow operation. [XXX data needed.]
Another way to install Python modules is to not install the source,
and use the
*.pyc file directly in place of the
(removing Python version tag from the filename and moving the file
out of the
This layout has two main issues:
- The Python version tag is not used, meaning that modules using
this layout are only usable by a specific version, and
- the source is not available, making it hard to debug (tracebacks
inspectmodule don’t show code; file is unreadable to the
The first issue is usually not relevant, as most installations are tightly
tied to a specific interpreter. [XXX any examples where this isn’t the case?]
This PEP proposes to solve the second issue by allowing installers to
distribute the source file alongside the file with the bytecode.
The new file layout is optimized for “installed libraries”: third-party
libraries installed on a user’s system.
This can include the Python standard library.
We assume that these files will most likely not be edited after installation.
Python will only consult the bytecode file (
*.pyc) when loading
a module, and not check whether a
*.py file was edited.
We assume than retreiving a module’s source is useful, but it is not a
performance-sensitive operation. It is used when displaying tracebacks
This makes it more palatable for distributors to use the resource-intensive
“checked hash” bytecode files and enjoy their benefits (explained in PEP 552).
On the other hand, we believe that Python should remain “hackable”: if a
source file is available, it should be possible to modify it and use the
result – for example, to add a few
some quick-and-dirty debugging (in a throwaway virtual environment, of course),
or even to explore the standard library by breaking it.
The proposed file layout makes this relatively straightforward: when the
*.py) file is moved out of the
Python will ignore the bytecode file and load the source instead, producing
a cache in
__pycache__. (This is the existing behavior when both a
*.pyc are present for a given name.)
We hope that users who’d like to do this, but aren’t familiar
with the proposed mechanics, will notice the extra directory, search the Web
__pysource__ and find relevant instructions.
The proposed layout makes it easy to omit the source files, which will be
useful in resource-constrained environments (e.g. minimal Linux containers).
Omiting them should not affect non-debug functionality.
Adding the sources to an installation that omits them involves only creating
directories and copying source files to the right places, which is relatively
easy even for non-Python-specific tools (like Linux package managers).
This PEP does not propose that any particular distributor or installer
(including Python’s build system) should immediately switch to the new layout.
The PEP will be implemented when
importlib supports reading the layout
and stdlib tools like
py_compile can generate it. Switching to it should be
a separate decision – although one that might not need a PEP.
importlib.machinery.SourcelessFileLoader, the loader that handles
*.pyc files, will be renamed to
The old name will remain as an alias for the foreseeable future,
DeprecationWarning. However, third-party linters and code-quality
tools are encouraged to treat the old name as suboptimal.
get_source_filename method of
be changed to return the expected location of an auxiliary source file, e.g.
get_source method of
check if the auxiliary source file corresponds to the bytecode file
(as returned by
This check is done at the time of the call. There is no check that the
source file corresponds to an in-memory module loaded by the
BytecodeFileLoader. For example, if both
changed after a module is loaded, tracebacks will show lines of the updated
source, which might not correspond to the running code.
The same “gotcha” applies to current handling of
compileall modules will gain arguments and CLI
options for compiling to the new layout.
[XXX: This needs fleshing out. The original source needs to be moved. Need to ensure that compilation is still idempotent.]
The following follows naturally [XXX verify this!] from the changes above, but will
be tested separately.
python -m inspect CLI will retreive source for modules using the new
layout (if the
__pysource__/*.py file is available and current).
Tracebacks will show source lines for modules using the new layout
__pysource__/*.py file is available and current).
The proposal is backwards compatible.
However, once an installer (including Python’s build process) switches to the
new layout, tools that are not prepared for it may stop working.
This affects tools like IDEs, debuggers, API doc generators, etc. if they
either don’t use
inspect, or use these modules from a
different version of Python than the code they are handling.
Even in that case, the failure – not being able to retreive source code
for a third-party module – is usually a quality-of-life issue rather than
a serious flaw.
The proposal adds source code information to modules that can already be
loaded and executed.
This change does not affect code that users write directly.
Most teaching materials can stay unchanged.
Authors of existing installer tools should read this PEP.
Authors of future installer tools should read documentation that will be added.
Searching for the
__pysource__ directory name in Python’s documentation
should yield relevant documentation.
We hope that people exploring the libraries installed on their system will
naturally reach relevant docs by searching for
See XXX’s above.
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.