Interpreter independent isolated/virtual environments

Superficially, I’m a pretty strong -1. Setuptools in the past tried to allow multiple versions to co-exist peacefully, and it was never very popular, had a number of issues (I believe) and is generally considered a failed experiment. I’d want to see strong evidence that any new proposal along those lines had learned from the setuptools experience, or had a very clear parallel in other languages so that we could learn from them how they had addressed the issues, before being comfortable with such a proposal.

I’m also worried about any proposal that has a “shared directory” which only describes the proposal in terms of Unix. Where would such a directory go in Windows? How would it interact with the multiple versions of Python (user install, system install, store distribution, nuget package)? You seemed up to this point to be pushing to make all the various directories configurable, but now you’re talking about a new directory altogether?

I think “having multiple versions co-exist” is a different topic, and should be posted separately if you want to pursue it. Let’s not have this thread end up with so many things being discussed that no-one can follow it - it’s hard enough already…

I’m happy to bounce ideas around. But they will need fleshing out at some point, and they need to integrate with the Python ecosystem. They also need use cases to justify them - as @steve.dower said, new features have to be better than “the devil you know” at solving real world problems, or they just change the issues people have to fight with, rather than fixing them…

That is fair.

I believe setuptools’s approach was significantly different from this one. The approach that setuptools took, unless I am mistaken, required the users to request specific versions at runtime. Projects needed to explicitly support this, they needed to modify their code to use the custom import method. There are many issues with this, but the main ones for me are adoption inertia, if nobody starts adopting this other people would probably also not adopt and it is required a big force to get people adopting in the first place, the 2nd one is that I think it would actually complicate things more as different packages start requiring different versions at runtime.

My proposal, as opposed to this, does not require any modifications to the source code and does not have the runtime dependency problem. It just adds a way to select what is available in the import statement. All this makes it so that it has a 0-effort adoption.

Anyway, I can work on this. This was something that I wanted to do anyway, just maybe not exactly in this format. If anyone wants to join me, please let me know!

The directory could go on the same folder as site-packages.

Wouldn’t really matter much. Any Python could run an environment, the only thing is that the there should be a compatible version of the requested packages. On pure Python, everything would be compatible, on binary packages, there should be a version compatible with the interpreter you are using.

Yes, because I thought it fit well your vision. I was planning a project that did this, but having this as a native option in the interpreter provides better UX, so I just threw it out in the air.

Sure. I can bring this up once I have a PoC.

This would have the same benefits as the original proposal. But we still have to get a PoC working so that we can evaluate if it has any unexpected drawbacks.

Thanks. I think I understand your suggestion better now. I still have reservations (for example, I’ve no idea how installers would need to change to handle this) but feel free to try the idea out and we can see how well it works in practice.

As has been already mentioned, most of this can be done without needing changes to core Python - the only hard bit is setting up sys.path but for a prototype I don’t see anything wrong with having a chunk of boilerplate at the top of the script that sets up sys.path manually. That should be good enough to test out the ideas.

1 Like

I also have my reservations, but I am optimistic :blush:

I’d imaged they could put the dependencies in the dependency pool and just install a .pth file in site-packages. We could also fully embrace the new environments by making site-packages an environment (site-packages.json) and letting it be the default. But that comes later.

Yeah, I was thinking of have a simple wrapper that does all the parsing and path setup, acting like the proposed interpreter.

I’ll probably hold on until we have https://github.com/pradyunsg/installer usable (not necessarily stable).

I think it’s an interesting idea. Certainly an import hook could do it, and you could assemble the packages into the format you want by hand at first.

I’m not entirely clear which problem it’s solving, though I can certainly think of a few. But I’m hard pressed to come up with any that are bad enough now to justify revising the entire approach.

Also, a second order effect of this kind of approach that scares me is packages caring less about breaking compatibility with their previous versions. I really want the ecosystem to get better at this as a whole, and I think encouraging packages by making it easier/obvious/cheap/official to just keep multiple versions floating around would work against this.

  • pip and virtualenv (and virtualenvwrapper) predate the __pycache__ directory that was added in order to solve for .pyc/.pyo collisions when sharing the same source directory between versions of the Python interpreter https://www.python.org/dev/peps/pep-3147/
  • a subprocess to launch a different version of the interpreter (according to a config file that must be read before any interpretor invocation) is suboptimal, IMHO. .pth files are also a dirty hack.
  • pip caches (all downloaded versions of all) packages in the pip cache directory
  • debian removing venv is annoying and unnecessary. Blocking sudo pip install would be more reasonable.
  • Actual package managers set appropriate permissions; such that the app cannot overwrite itself or its dependencies.
  • I suspect that package-install-time compilation into a shared site-packages directory will be problematic. TIL that the --relocatable flag in virtualenv has been removed. Changing the shebangs was never a fully-sufficient solution anyway due to paths getting compiled into things. Wheels may fix that now?

I don’t have need for this: having tox create appropriately-isolated envs in order to run tests with different interpreter versions (and pypy) is all I ever need to do. But nonetheless I wish you luck with this endeavor.

1 Like

Two things to consider here:

  1. It’s probably not safe to trust the nominal version of packages that get installed. In the majority of cases it would be safe, but there are lots of ways to pip install some patched version of a library that look identical to installing that version from PyPI. You would want to make sure that your shared reference actually refers to an identical version. You could probably do something like build-0.0.1-{short hash}-{n}, where short hash would be the first 8 digits of a hash of the wheel’s contents, and the full hash would be recorded somewhere on disk for disambiguation in the event of a short hash collision (hence the -n).

  2. How would you clean up these packages? Right now, virtual environments are self-contained, so when I’m done with one I just do rm -rf <venv>. That seems like it would rule out any sort of reference-counted solution. Maybe the best you could do would be to have venvs that use the shared installation area record that they used it, and have a separate python -m venv.gc command that tries to find the original venvs and verify which packages are still in use.

    This kind of thing seems like it can be handled reasonably well for basic venvs created by build for its own purposes (since it would create and destroy them as desired and can handle the reference counting), but I don’t think it would work as a general-purpose mechanism.

Presumably any new solution could not only support wheels. Sdists that require compilation sometimes embed venv paths in (unsigned) binaries, IIRC.

The pip download cache is an improvement. I’m not sure when/how TUF checks signatures on sdists/bdists/wheels drawn from a local cache that’s writeable by the user?

It would be fair to only support wheels, because in most cases pip first builds a wheel, then installs that wheel. In the long term, I believe the plan is to make the wheel building phase required and drop the legacy mode.

This whole thing would need support from the installer in the first place anyway, so you can easily require a wheel as an intermediate artifact. You wouldn’t get any caching benefits if you repeatedly installed an sdist that doesn’t have reproducible builds, but considering that it would basically fall back to the old behavior anyway and you can work around it by caching your own wheel builds via something like devpi, that’s not a huge concern.

I suppose that the way to test this is to install something (?) with c extensions by having pip build and install the wheel from an sdist, mv the venv/virtualenv to a new path, change the necessary shebangs, and see what fails.

It may be that wheel building has fixed this issue of paths being embedded in binaries (how could wheels work otherwise).

Warning: very long post.

I finally found the time to finish this draft with my recent thoughts on the topic. Thankfully the thread hasn’t grow too much recently.

To me, virtual environment has two main problems I wish could be resolved: portability, and inspectability.

Portability: There is not a reliable way to move a virtual environment to another location, even within the same machine. This is historically not a huge issue (although a minor footgun to novices), but increasingly problematic with the rise of containers and distributed deployment. The inability to move means users must configure the production machine with the necessary build tools to populate an environment, with performance and provisioning drawbacks. It would be immensely useful if it is possible to create and populate a virtual environment somewhere else (e.g. CI), and push that to production, like how statically-linked binaries can be copied directly. (Yes, I am aware there are multiple workarounds to achieve a similar result, e.g. multi-stage builds, replicating the exact filesystem structure. But those are all inconvenient hoops to jump through.)

Inspectability: A virtual environment’s internal structure is defined by the base interpreter it is created against, and the structure cannot be reliably determined without invoking that base interpreter. This means it is impossible to cross-provision (a term I invented analogous to cross-compilation) a runtime environment. Scripts vs bin is the least of the problems. You can’t even know where to install the packages. What value should I use for lib/pythonX.Y/site-packages? No way to tell without running the actual interpreter.

Both problems raised here would be resolved by the proposal, since it would remove the need of a fixed prefix in pyvenv.cfg, and an environment can take any form by specifying environment variables. But I would prefer a less drastic approach, that keeps more of the current virtual environment architecture.

__PYVENV_LAUNCHER__ is actually almost doing what’s needed here. This environment variable is used by the “command redirector” on macOS and Windows to carry the base interpreter information to the actual executable. But we can easily fake that interaction: (examples in Bash on macOS, the same strategy works on Windows with different shell commands)

$ cd /<WORKINGDIR>
$ mkdir -p fake-env/lib/python3.9/site-packages
$ echo 'include-system-site-packages = false' > fake-env/pyvenv.cfg
$ __PYVENV_LAUNCHER__=$PWD/fake-env/bin/python python3.9 -c '
> import sys
> for p in sys.path:
> print(repr(p))
> '
''
'/<PYTHONHOME>/3.9/lib/python39.zip'
'/<PYTHONHOME>/3.9/lib/python3.9'
'/<PYTHONHOME>/3.9/lib/python3.9/lib-dynload'
'/<WORKINGDIR>/fake-env/lib/python3.9/site-packages'

No absolute paths, no symlinked executables. Completely movable as long as you set the correct environment variable. (Entry point script is out of scope here, but that’s solvable with packaging tooling improvements.)

The problem is, this does not work on Linux (and other Unix-like systems except macOS). Would it be viable to add this environment variable universally?

The introspection problem is more difficult, but I feel it can potentially be solved with tooling support. The nt scheme can be used by default, with a tool that automatically creates symlinks to “trick the interpreter”. Something like: (continuing from the previous example)

$ tree fake-env
fake-env
├── bin -> scripts/
├── include
│   └── python3.9 -> fake-env/include
├── lib
│   ├── python3.9
│   │   └── site-packages -> fake-env/lib/site-packages
│   └── site-packages
├── pyvenv.cfg
└── scripts

This would be enough for pip to almost work (except entry point script shebangs, which I think can be amended by pip also using __PYVENV_LAUNCHER__ to override sys.executable):

$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m ensurepip
...
$ __PYVENV_LAUNCHER__=fake-env/bin/python python3.9 -m pip install -U pip
...
Successfully installed pip-20.2.4
$ ls fake-env/lib/site-packages/
__pycache__ easy_install.py pip pip-20.2.4.dist-info pkg_resources setuptools setuptools-49.2.1.dist-info
$ ls fake-env/scripts
easy_install-3.9 pip pip3 pip3.9

Now all we need is a cross-platform command redirector that sets the environment variable automatically.

Would this be a worthwhile direction to pursue? This would be much less invasive than both PEP 582 and the proposal here, retaining much of the PEP 405 structure and most of the existing tools around it.

4 Likes

So __PYVENV_LAUNCHER__ actually behaves slightly differently on Windows vs macOS, but you should find that PYTHONPATH works fine.

I don’t see how it would be less invasive than PEP 582 though? Other than making everyone come up with a name and also learn how to set environment variables (and presumably a special option for pip to inform it which directory to install into), which you can already do today with PYTHONPATH (and --target). PEP 582 deliberately said nothing about scripts, leaving that entirely in the hands of the tools that generate them. Your proposal is similar, in that most of the work belongs to pip, rather than CPython.

Yeah, I intentionally skipped those details since the post is long enough without diving into the subtle implementation differences. Some unification would be needed if we’re to promote __PYVENV_LAUNCHER__ to a universally usable variable, instead of an implementation detail.

The problem with PYTHONPATH is it does not allow removing an existing sys.path entry. You can see from the example above that __PYVENV_LAUNCHER__ triggers site configuration so the system site-packages directory is not visible (unless system-site-packages is true in pyvenv.cfg). I would expect many existing virtual environment users refuse to switch if whatever is proposed to replace it does not offer this feature.

I probably used a wrong word, “disruptive” would’ve been better. PEP 582 proposed a new environment structure that is incompatible with schemes in virtual environments. This would break most existing tooling that expect a virtual environment, an adoption cost that is likely too high for most people. By mimicking the virtual environment scheme, tools can almost work as-is, and an environment can be quite easily “upgraded” to a real PEP 503 virtual environment simply by putting the Python executables and pyvenv.cfg configurations back. I expect this would ease the transition a lot.

Touching venv at all breaks people… as one of the few people to have touched it in the last few years, I know :slight_smile:

(e.g. adding __PYVENV_LAUNCHER__ broke people’s assumptions that subprocess.Popen("python") would do the same thing as subprocess.Popen(sys.executable))

Everything around Python Packaging will hit Hyrum’s Law:

Yeah, I’m not imagining __PYVENV_LAUNCHER__ won’t break anything; it’s less disruptive, not undisruptive. Also, that environment variable already managed to break macOS and Windows (in that order), why should Linux folks get a free pass :wink:

We could get the Python Launchers to set __PYVENV_LAUNCHER__ appropriately (I was already looking into this for the Python Launcher for UNIX as a possible thing down the road).

So there’s the __PYVENV_LAUNCHER__ idea and PEP 582 proposal being bandied about.

What I’m hearing from @uranusjr and the __PYVENV_LAUNCHER__ proposal is it has the nice effect of keeping the general directory structure. The hope is that will break less code than PEP 582 while letting people transition their tooling over. The worry, though, is there will be breakage (how easy that will be to diagnose or how widespread it would be I don’t think any of us can claim to know). It will also require an update to CPython and such to make __PYVENV_LAUNCHER__ do the right thing on all OSs.

For PEP 582, it views it as a virtue that it’s a new approach as it makes it opt-in and won’t actually lead to unexpected interactions. But it does put the onus on pip and other installers to know what --local or some such flag means for __pypackages__. The other concern has been about the fact that the global site-packages is left on sys.path.

I will say that my time with the Python extension for VS Code has shown that people will do stuff that, to put it in kindly, are unwise. As such, I would argue that a simple solution is the most important thing here and to me that’s PEP 582 (said by the person who is not a pip maintainer :wink:).

Having said that, I would advocate coming up with an environment variable or a marker file in __pypackages__ that could be set to get PEP 582 to leave off site-packages to simulate virtual environments even more. Then it would be a simple matter of generating shell scripts to somehow to launch an interpreter with that environment variable set. We could also have the Python Launcher(s) always set that environment variable when __pypackages__ is found so that’s the default experience for people in that regard (assuming there’s no marker file). I think asking experienced users who are going to care about isolation to set an environment variable or touch/write a simple file isn’t asking too much while users who don’t care and are too new to Python to be stressing over this simply won’t notice.

1 Like

I just remembered another problem with PYTHONPATH. The site-packages directory is placed after stdlib paths in normal environments, but PYTHONPATH entries are put at the beginning of sys.path. This is problematic for backport packages that use the same name as stdlib packages, such as eum34 and typing.

To make PEP 582 work (no matter how we decide to lay out the environment), the interpreter will need to grow a new environment variable to do the correct thing. That can be __PYVENV_LAUNCHER__, PYTHONPATH, or an entirely new setting. They all have advantages and drawbacks, but personally I feel PYTHONPATH is the one with the most problems.

To make PEP 582 work, you need to launch a script file (adjacent to the __pypackages__ folder) or have your CWD be the one containing __pypackages__ if you want to resolve -m modules.

These are the workflow change that upset people: it broke "-m from anywhere on disk", and also “python subdir/script.py”. Adding an environment variable to specify this directory would be possible, and would fix both of these cases, but should not be the core workflow.

Most users are going to python path/to/the/script.py and expect path/to/the/__pypackages__ to be used (probably via some shell script in /usr/bin), or are going to hit F5 to launch their project from a top-level script, because most users are not like us. Which is why we all think venv et al. is fine, while the rest of the world thinks it’s disastrously overcomplicated.

Unfortunately, it’s really hard to convince people that they’re not part of the majority, which is why so many packaging topics get stalled on pushback from very highly skilled and learned individuals (otherwise known as “outliers”). There are less than 5 million users on this server (lowest recent estimate for the size of the Python userbase) - we’re all outliers here. It’s okay if we design things that don’t meet our needs, because they’re not for us. </ rant>

2 Likes

This overcomplicated thing was the main reason for me to pick up to work on PEP 582. It is already very easy to introduce Python to newcomers. But when we have to start installing dependency modules, and debug errors on multiple OS , the happy story changes very quickly.

Most users are going to python path/to/the/script.py and expect path/to/the/__pypackages__ to be used

This currently works in the PoC at https://github.com/kushaldas/pep582

1 Like