Interpreter independent isolated/virtual environments

Warning: very long post.

I finally found the time to finish this draft with my recent thoughts on the topic. Thankfully the thread hasn’t grow too much recently.

To me, virtual environment has two main problems I wish could be resolved: portability, and inspectability.

Portability: There is not a reliable way to move a virtual environment to another location, even within the same machine. This is historically not a huge issue (although a minor footgun to novices), but increasingly problematic with the rise of containers and distributed deployment. The inability to move means users must configure the production machine with the necessary build tools to populate an environment, with performance and provisioning drawbacks. It would be immensely useful if it is possible to create and populate a virtual environment somewhere else (e.g. CI), and push that to production, like how statically-linked binaries can be copied directly. (Yes, I am aware there are multiple workarounds to achieve a similar result, e.g. multi-stage builds, replicating the exact filesystem structure. But those are all inconvenient hoops to jump through.)

Inspectability: A virtual environment’s internal structure is defined by the base interpreter it is created against, and the structure cannot be reliably determined without invoking that base interpreter. This means it is impossible to cross-provision (a term I invented analogous to cross-compilation) a runtime environment. Scripts vs bin is the least of the problems. You can’t even know where to install the packages. What value should I use for lib/pythonX.Y/site-packages? No way to tell without running the actual interpreter.

Both problems raised here would be resolved by the proposal, since it would remove the need of a fixed prefix in pyvenv.cfg, and an environment can take any form by specifying environment variables. But I would prefer a less drastic approach, that keeps more of the current virtual environment architecture.

__PYVENV_LAUNCHER__ is actually almost doing what’s needed here. This environment variable is used by the “command redirector” on macOS and Windows to carry the base interpreter information to the actual executable. But we can easily fake that interaction: (examples in Bash on macOS, the same strategy works on Windows with different shell commands)

$ cd /<WORKINGDIR>
$ mkdir -p fake-env/lib/python3.9/site-packages
$ echo 'include-system-site-packages = false' > fake-env/pyvenv.cfg
$ __PYVENV_LAUNCHER__=$PWD/fake-env/bin/python python3.9 -c '
> import sys
> for p in sys.path:
> print(repr(p))
> '
''
'/<PYTHONHOME>/3.9/lib/python39.zip'
'/<PYTHONHOME>/3.9/lib/python3.9'
'/<PYTHONHOME>/3.9/lib/python3.9/lib-dynload'
'/<WORKINGDIR>/fake-env/lib/python3.9/site-packages'

No absolute paths, no symlinked executables. Completely movable as long as you set the correct environment variable. (Entry point script is out of scope here, but that’s solvable with packaging tooling improvements.)

The problem is, this does not work on Linux (and other Unix-like systems except macOS). Would it be viable to add this environment variable universally?

The introspection problem is more difficult, but I feel it can potentially be solved with tooling support. The nt scheme can be used by default, with a tool that automatically creates symlinks to “trick the interpreter”. Something like: (continuing from the previous example)

$ tree fake-env
fake-env
├── bin -> scripts/
├── include
│   └── python3.9 -> fake-env/include
├── lib
│   ├── python3.9
│   │   └── site-packages -> fake-env/lib/site-packages
│   └── site-packages
├── pyvenv.cfg
└── scripts

This would be enough for pip to almost work (except entry point script shebangs, which I think can be amended by pip also using __PYVENV_LAUNCHER__ to override sys.executable):

$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m ensurepip
...
$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m pip install -U pip
...
Successfully installed pip-20.2.4
$ ls fake-env/lib/site-packages/
__pycache__ easy_install.py pip pip-20.2.4.dist-info pkg_resources setuptools setuptools-49.2.1.dist-info
$ ls fake-env/scripts
easy_install-3.9 pip pip3 pip3.9

Now all we need is a cross-platform command redirector that sets the environment variable automatically.

Would this be a worthwhile direction to pursue? This would be much less invasive than both PEP 582 and the proposal here, retaining much of the PEP 405 structure and most of the existing tools around it.

4 Likes