Hi all,
In short: Have resolution of paths in pyvenv.cfg be relative to the pyvenv.cfg file.
Why? This allows a venv to be relocatable enough that the basic Python bootstrapping process gets far enough to hand over execution to application code that can handle the rest. (Spoiler: this is actually possible today! But is brittle and relies on some specific code paths being taken). By having a well-defined location (the pyvenv.cfg location) that paths are relative to, it allows constructing the surrounding environment to respect that.
The context here is build systems such as Bazel impose a couple of constraints when building (i.e. creating the files and directory structure a program will use at runtime) programs. In the context of Python, this means creating the equivalent of what one sees after running e.g. python -v venv myvevnv
; e.g., a pyvenv.cfg file (to indicate a venv), a bin/python symlink, and lib/<version>/site-packages
directory with third party packages.
(for context, Iâm a maintainer of rules_python, which allows Bazel to build Python programs)
Systems like Bazel, however, impose two constraints:
- Absolute paths arenât allowed. This is because the machine that generates files (performs the building) may differ from who runs it. For example, a cluster of remote build worker machines may collaborate and generate the pyvenv.cfg, bin/python, and site-packages etc files. Those other machines wonât have the same absolute path to files as your local machine.
- The build outputs, i.e. the venv directory tree, must be assumed to be read-only. This is because Bazel relies on the immutability to cache artifacts and share them to local/remote build processes or consuming users.
The first means the typical behavior of writing an absolute path to pyvenv.cfgâs home key isnât an option. Writing a relative path doesnât work because, currently, such paths get interpretered relative to the processâs CWD (which seems nonsensical and I canât imagine intentional).
The second mean fixing up the pyvenv.cfg file in place isnât possible, and thus requires e.g. creating a venv at runtime in a temporary directory (which has its own problems and is generally a headache best avoided).
Like I alluded to earlier, there is a workaround today, though: donât write the home key at all in pyvenv.cfg. This happens to lead Python down a route where it recognizes itâs in a venv and sets up its home, sys.prefix, sys.executable, etc appropriately. Mostly, anyways. The key part is Python home is found and the venv-site packages is found, which allows application code to hook into the startup process (via site-packages pth files) and fix up one or two oddities (I forget exactly, but itâs just like a 1 line fixup to something in sys).
The net effect of this trick is the venv is relocatable (or more specifically, the directory tree Bazel generates is, which contains the venv and anything else necessary for the program). i.e. one can cp -r
it to arbitrary other machines (within the confines of platform compatibility (OS, CPU, etc)).
Anyways, this is fairly brittle. It relies on bin/python3
being a symlink, which makes it tough to work on Windows, and symlinks-to-symlinks-to-symlinks can cause oddities.
So, back to my idea: Treat a relative path for the home
key in pyvenv.cfg as relative to the pyvenv.cfg file. When one knows where the venv is relative to the python install (such as in the case of Bazel), one can then write a stable relative path for the home key. Now Python can directly find its home like normal without requiring symlinks.
Right now, relative paths in pyvenv.cfg are relative to the processâs CWD. But, that doesnât seem to be intentional. Reading the getpath.py code, I get the impression it is assuming home is always absolute. It also doesnât make sense â the CWD could be anywhere, which makes relative paths largely useless.