Interpreter independent isolated/virtual environments

Two things to consider here:

  1. It’s probably not safe to trust the nominal version of packages that get installed. In the majority of cases it would be safe, but there are lots of ways to pip install some patched version of a library that look identical to installing that version from PyPI. You would want to make sure that your shared reference actually refers to an identical version. You could probably do something like build-0.0.1-{short hash}-{n}, where short hash would be the first 8 digits of a hash of the wheel’s contents, and the full hash would be recorded somewhere on disk for disambiguation in the event of a short hash collision (hence the -n).

  2. How would you clean up these packages? Right now, virtual environments are self-contained, so when I’m done with one I just do rm -rf <venv>. That seems like it would rule out any sort of reference-counted solution. Maybe the best you could do would be to have venvs that use the shared installation area record that they used it, and have a separate python -m venv.gc command that tries to find the original venvs and verify which packages are still in use.

    This kind of thing seems like it can be handled reasonably well for basic venvs created by build for its own purposes (since it would create and destroy them as desired and can handle the reference counting), but I don’t think it would work as a general-purpose mechanism.

Presumably any new solution could not only support wheels. Sdists that require compilation sometimes embed venv paths in (unsigned) binaries, IIRC.

The pip download cache is an improvement. I’m not sure when/how TUF checks signatures on sdists/bdists/wheels drawn from a local cache that’s writeable by the user?

It would be fair to only support wheels, because in most cases pip first builds a wheel, then installs that wheel. In the long term, I believe the plan is to make the wheel building phase required and drop the legacy mode.

This whole thing would need support from the installer in the first place anyway, so you can easily require a wheel as an intermediate artifact. You wouldn’t get any caching benefits if you repeatedly installed an sdist that doesn’t have reproducible builds, but considering that it would basically fall back to the old behavior anyway and you can work around it by caching your own wheel builds via something like devpi, that’s not a huge concern.

I suppose that the way to test this is to install something (?) with c extensions by having pip build and install the wheel from an sdist, mv the venv/virtualenv to a new path, change the necessary shebangs, and see what fails.

It may be that wheel building has fixed this issue of paths being embedded in binaries (how could wheels work otherwise).

Warning: very long post.

I finally found the time to finish this draft with my recent thoughts on the topic. Thankfully the thread hasn’t grow too much recently.

To me, virtual environment has two main problems I wish could be resolved: portability, and inspectability.

Portability: There is not a reliable way to move a virtual environment to another location, even within the same machine. This is historically not a huge issue (although a minor footgun to novices), but increasingly problematic with the rise of containers and distributed deployment. The inability to move means users must configure the production machine with the necessary build tools to populate an environment, with performance and provisioning drawbacks. It would be immensely useful if it is possible to create and populate a virtual environment somewhere else (e.g. CI), and push that to production, like how statically-linked binaries can be copied directly. (Yes, I am aware there are multiple workarounds to achieve a similar result, e.g. multi-stage builds, replicating the exact filesystem structure. But those are all inconvenient hoops to jump through.)

Inspectability: A virtual environment’s internal structure is defined by the base interpreter it is created against, and the structure cannot be reliably determined without invoking that base interpreter. This means it is impossible to cross-provision (a term I invented analogous to cross-compilation) a runtime environment. Scripts vs bin is the least of the problems. You can’t even know where to install the packages. What value should I use for lib/pythonX.Y/site-packages? No way to tell without running the actual interpreter.

Both problems raised here would be resolved by the proposal, since it would remove the need of a fixed prefix in pyvenv.cfg, and an environment can take any form by specifying environment variables. But I would prefer a less drastic approach, that keeps more of the current virtual environment architecture.

__PYVENV_LAUNCHER__ is actually almost doing what’s needed here. This environment variable is used by the “command redirector” on macOS and Windows to carry the base interpreter information to the actual executable. But we can easily fake that interaction: (examples in Bash on macOS, the same strategy works on Windows with different shell commands)

$ cd /<WORKINGDIR>
$ mkdir -p fake-env/lib/python3.9/site-packages
$ echo 'include-system-site-packages = false' > fake-env/pyvenv.cfg
$ __PYVENV_LAUNCHER__=$PWD/fake-env/bin/python python3.9 -c '
> import sys
> for p in sys.path:
> print(repr(p))
> '
''
'/<PYTHONHOME>/3.9/lib/python39.zip'
'/<PYTHONHOME>/3.9/lib/python3.9'
'/<PYTHONHOME>/3.9/lib/python3.9/lib-dynload'
'/<WORKINGDIR>/fake-env/lib/python3.9/site-packages'

No absolute paths, no symlinked executables. Completely movable as long as you set the correct environment variable. (Entry point script is out of scope here, but that’s solvable with packaging tooling improvements.)

The problem is, this does not work on Linux (and other Unix-like systems except macOS). Would it be viable to add this environment variable universally?

The introspection problem is more difficult, but I feel it can potentially be solved with tooling support. The nt scheme can be used by default, with a tool that automatically creates symlinks to “trick the interpreter”. Something like: (continuing from the previous example)

$ tree fake-env
fake-env
├── bin -> scripts/
├── include
│   └── python3.9 -> fake-env/include
├── lib
│   ├── python3.9
│   │   └── site-packages -> fake-env/lib/site-packages
│   └── site-packages
├── pyvenv.cfg
└── scripts

This would be enough for pip to almost work (except entry point script shebangs, which I think can be amended by pip also using __PYVENV_LAUNCHER__ to override sys.executable):

$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m ensurepip
...
$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m pip install -U pip
...
Successfully installed pip-20.2.4
$ ls fake-env/lib/site-packages/
__pycache__ easy_install.py pip pip-20.2.4.dist-info pkg_resources setuptools setuptools-49.2.1.dist-info
$ ls fake-env/scripts
easy_install-3.9 pip pip3 pip3.9

Now all we need is a cross-platform command redirector that sets the environment variable automatically.

Would this be a worthwhile direction to pursue? This would be much less invasive than both PEP 582 and the proposal here, retaining much of the PEP 405 structure and most of the existing tools around it.

4 Likes

So __PYVENV_LAUNCHER__ actually behaves slightly differently on Windows vs macOS, but you should find that PYTHONPATH works fine.

I don’t see how it would be less invasive than PEP 582 though? Other than making everyone come up with a name and also learn how to set environment variables (and presumably a special option for pip to inform it which directory to install into), which you can already do today with PYTHONPATH (and --target). PEP 582 deliberately said nothing about scripts, leaving that entirely in the hands of the tools that generate them. Your proposal is similar, in that most of the work belongs to pip, rather than CPython.

Yeah, I intentionally skipped those details since the post is long enough without diving into the subtle implementation differences. Some unification would be needed if we’re to promote __PYVENV_LAUNCHER__ to a universally usable variable, instead of an implementation detail.

The problem with PYTHONPATH is it does not allow removing an existing sys.path entry. You can see from the example above that __PYVENV_LAUNCHER__ triggers site configuration so the system site-packages directory is not visible (unless system-site-packages is true in pyvenv.cfg). I would expect many existing virtual environment users refuse to switch if whatever is proposed to replace it does not offer this feature.

I probably used a wrong word, “disruptive” would’ve been better. PEP 582 proposed a new environment structure that is incompatible with schemes in virtual environments. This would break most existing tooling that expect a virtual environment, an adoption cost that is likely too high for most people. By mimicking the virtual environment scheme, tools can almost work as-is, and an environment can be quite easily “upgraded” to a real PEP 503 virtual environment simply by putting the Python executables and pyvenv.cfg configurations back. I expect this would ease the transition a lot.

Touching venv at all breaks people… as one of the few people to have touched it in the last few years, I know :slight_smile:

(e.g. adding __PYVENV_LAUNCHER__ broke people’s assumptions that subprocess.Popen("python") would do the same thing as subprocess.Popen(sys.executable))

Everything around Python Packaging will hit Hyrum’s Law:

Yeah, I’m not imagining __PYVENV_LAUNCHER__ won’t break anything; it’s less disruptive, not undisruptive. Also, that environment variable already managed to break macOS and Windows (in that order), why should Linux folks get a free pass :wink:

We could get the Python Launchers to set __PYVENV_LAUNCHER__ appropriately (I was already looking into this for the Python Launcher for UNIX as a possible thing down the road).

So there’s the __PYVENV_LAUNCHER__ idea and PEP 582 proposal being bandied about.

What I’m hearing from @uranusjr and the __PYVENV_LAUNCHER__ proposal is it has the nice effect of keeping the general directory structure. The hope is that will break less code than PEP 582 while letting people transition their tooling over. The worry, though, is there will be breakage (how easy that will be to diagnose or how widespread it would be I don’t think any of us can claim to know). It will also require an update to CPython and such to make __PYVENV_LAUNCHER__ do the right thing on all OSs.

For PEP 582, it views it as a virtue that it’s a new approach as it makes it opt-in and won’t actually lead to unexpected interactions. But it does put the onus on pip and other installers to know what --local or some such flag means for __pypackages__. The other concern has been about the fact that the global site-packages is left on sys.path.

I will say that my time with the Python extension for VS Code has shown that people will do stuff that, to put it in kindly, are unwise. As such, I would argue that a simple solution is the most important thing here and to me that’s PEP 582 (said by the person who is not a pip maintainer :wink:).

Having said that, I would advocate coming up with an environment variable or a marker file in __pypackages__ that could be set to get PEP 582 to leave off site-packages to simulate virtual environments even more. Then it would be a simple matter of generating shell scripts to somehow to launch an interpreter with that environment variable set. We could also have the Python Launcher(s) always set that environment variable when __pypackages__ is found so that’s the default experience for people in that regard (assuming there’s no marker file). I think asking experienced users who are going to care about isolation to set an environment variable or touch/write a simple file isn’t asking too much while users who don’t care and are too new to Python to be stressing over this simply won’t notice.

1 Like

I just remembered another problem with PYTHONPATH. The site-packages directory is placed after stdlib paths in normal environments, but PYTHONPATH entries are put at the beginning of sys.path. This is problematic for backport packages that use the same name as stdlib packages, such as eum34 and typing.

To make PEP 582 work (no matter how we decide to lay out the environment), the interpreter will need to grow a new environment variable to do the correct thing. That can be __PYVENV_LAUNCHER__, PYTHONPATH, or an entirely new setting. They all have advantages and drawbacks, but personally I feel PYTHONPATH is the one with the most problems.

To make PEP 582 work, you need to launch a script file (adjacent to the __pypackages__ folder) or have your CWD be the one containing __pypackages__ if you want to resolve -m modules.

These are the workflow change that upset people: it broke "-m from anywhere on disk", and also “python subdir/script.py”. Adding an environment variable to specify this directory would be possible, and would fix both of these cases, but should not be the core workflow.

Most users are going to python path/to/the/script.py and expect path/to/the/__pypackages__ to be used (probably via some shell script in /usr/bin), or are going to hit F5 to launch their project from a top-level script, because most users are not like us. Which is why we all think venv et al. is fine, while the rest of the world thinks it’s disastrously overcomplicated.

Unfortunately, it’s really hard to convince people that they’re not part of the majority, which is why so many packaging topics get stalled on pushback from very highly skilled and learned individuals (otherwise known as “outliers”). There are less than 5 million users on this server (lowest recent estimate for the size of the Python userbase) - we’re all outliers here. It’s okay if we design things that don’t meet our needs, because they’re not for us. </ rant>

2 Likes

This overcomplicated thing was the main reason for me to pick up to work on PEP 582. It is already very easy to introduce Python to newcomers. But when we have to start installing dependency modules, and debug errors on multiple OS , the happy story changes very quickly.

Most users are going to python path/to/the/script.py and expect path/to/the/__pypackages__ to be used

This currently works in the PoC at GitHub - kushaldas/pep582

1 Like