Interpreter independent isolated/virtual environments

FFY00 · October 6, 2020, 1:01am

Hi all, I would like to propose adding a way to run isolated/virtual environments completely independently of the interpreter. This means, we would be able to run a virtual environment from any interpreter, allowing for control

Before we get into it, I wanted to present the current and past approaches.

Python 2

In Python 2, we create isolated environments by setting the PYTHONHOME environment variable.

PYTHONHOME   : alternate <prefix> directory (or <prefix>:<exec_prefix>).
               The default module search path uses <prefix>/lib/pythonX.X.

Setting this changes the modules search path, allowing us to isolate the system defaults and control what exactly goes into sys.path.

Drawbacks

Setting PYTHONHOME not only implcts purelib/platlib (site-packages), but also stdlib/platstdlib (standard library). This means that if we want to have a custom environment, we need to copy or symlink the standard library to our environment, making it more difficult to create environments, and making the environments heavier.

Python 3

Python 3 solved the PYTHONHOME approach’ issues with PEP 405. PEP 405 introduces a new first-step when calculating the prefix, pyvenv.cfg. pyvenv.cfg is a file that should live alongside, or in the parent directory of, the Python executable and stores the configuration for a virtual environment.

You can read more about it in the PEP, but this is how a normal config looks like:

home = /usr
implementation = CPython
version_info = 3.8.5.final.0
virtualenv = 20.0.23
include-system-site-packages = false
base-prefix = /usr
base-exec-prefix = /usr
base-executable = /usr/bin/python

PEP 405 also introduces the venv module, which is capable of creating such environments.

Improvements

This approach no longer needs to copy/symlink the standard library to the environment, which can be expensive in some cases.

Drawbacks

Even though it does not need to copy/symlink the standard library, it now needs to copy/symlink the interpreter.

The introduction of pyvenv.cfg as the first step of the search breaks the old-style environments. PYTHONHOME can no longer be relied upon because if the interpreter happens to by from a virtual environment, pyvenv.cfg is present and takes over.

There are some cases where the interpreter can’t be copied and symlinks are not available, like the Windows Store Python. In the Windows Store for eg. AFAIK this issue is solved by patching venv to instead of copying or symlinking the interpreter, putting a wrapper in its place that will call the original interpreter. I might be a little fuzzy on the details, so feel free to correct me, but the point is that it needs special handling.

Proposal

So, I would like to propose introducing an environment variable to ignore pyvenv.cfg, making PYTHONHOME isolated environments reliable again, and introducing a few environments variables to configure the sysconfig paths.

$ python -m sysconfig
Platform: "linux-x86_64"
Python version: "3.8"
Current installation scheme: "posix_prefix"

Paths:
	data = "/usr"
	include = "/usr/include/python3.8"
	platinclude = "/usr/include/python3.8"
	platlib = "/usr/lib/python3.8/site-packages"
	platstdlib = "/usr/lib/python3.8"
	purelib = "/usr/lib/python3.8/site-packages"
	scripts = "/usr/bin"
	stdlib = "/usr/lib/python3.8"
...

The idea here would be to be able to just take any interpreter and be able to reliably run an isolated environment from it.

A simple example of how it would look like:

$ PYTHONPLATLIB=~/env/myenv PYTHONPURELIB=~/env/myenv python -m sysconfig
Platform: "linux-x86_64"
Python version: "3.8"
Current installation scheme: "posix_prefix"

Paths:
	data = "/usr"
	include = "/usr/include/python3.8"
	platinclude = "/usr/include/python3.8"
	platlib = "~/env/myenv"
	platstdlib = "/usr/lib/python3.8"
	purelib = "~/env/myenv"
	scripts = "/usr/bin"
	stdlib = "/usr/lib/python3.8"
...

Improvements

Firstly, it makes creating virtual environments much easier and drops the requirement of venv for cross-platform usage. Creating/running isolated environments in a cross-platform way would be dead simple. It also makes isolated environments a little bit more lightweight.

This enables us to simply and easily run subprocesses from the current executable in an isolated environment. This is something that would be really helpful in https://github.com/FFY00/python-build.

Drawbacks

Introducing this does have a drawback, you can no longer rely on a Python executable to run with a specific environment. This mainly impacts console scripts (I am not sure what else it does impact, hence the thread ), but this can be fixed by just clearing the environment variables, which would actually be pretty easy to implement.

So, any thoughts? Is there anything I am missing? Does anyone have ideas on how to improve this? Please let me know

pf_moore · October 6, 2020, 8:43am

I suspect this would need to be raised on the python-ideas (and ultimately python-dev) mailing list, rather than here, as it’s more related to core functionality than to packaging.

From a packaging point of view, my immediate question is why add yet another way to do things? If I assume that you want pip install to work in these new environments, then pip will need to know about them. And other tools like pkg_resources and importlib.metadata will need updating, etc. That’s a lot of additional code, to implement something that already exists.

I don’t see why we need a whole new mechanism here. If venv has issues, can’t we just fix them?

brettcannon · October 6, 2020, 10:31pm

I agree with Paul that this isn’t a package thing and more of a Python stdlib / CPython interpreter sort of thing (although I am interested in a solution so that the Python launcher could be used to launch from within a virtual environment to avoid having to copy any files over in virtual environment construction).

For those interested in how this all works, see:

github.com

python/cpython/blob/044a1048ca93d466965afc027b91a5a9eb9ce23c/Lib/site.py#L477-L526


def venv(known_paths):
    global PREFIXES, ENABLE_USER_SITE

    env = os.environ
    if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
        executable = sys._base_executable = os.environ['__PYVENV_LAUNCHER__']
    else:
        executable = sys.executable
    exe_dir, _ = os.path.split(os.path.abspath(executable))
    site_prefix = os.path.dirname(exe_dir)
    sys._home = None
    conf_basename = 'pyvenv.cfg'
    candidate_confs = [
        conffile for conffile in (
            os.path.join(exe_dir, conf_basename),
            os.path.join(site_prefix, conf_basename)
            )
        if os.path.isfile(conffile)
        ]

This file has been truncated. show original

pf_moore · October 7, 2020, 8:00am

Just to clarify, I’m not against simplifying the creation of virtual environments (far from it!) but if we can simplify the process, I’d rather see the simplified approach be the one way of doing it, rather than having two ways with non-obvious trade-offs.

I’d also say that while the existing venv is complicated, the complexity was introduced over time as fixes for edge cases (Mac framework builds, Windows Store python, …) and any new isolation mechanism needs to take those into account.

FFY00 · October 7, 2020, 3:24pm

Sorry! I wasn’t clear. I am going to propose this to python-ideas but first I wanted to get feedback from the packaging group since it is something that will mainly impact packaging.

I totally understand your sentiment, and I somewhat feel the same, but I also want to fix the issues I presented resolved. So I am a bit conflicted but I don’t see a better solution
The main issue here being we being forced to use venv for reliable cross-platform usage, which I think it’s not healthy.

I will not necessarily use pip install for them. I do have some projects planned and was thinking of using GitHub - pradyunsg/installer: A fork for making that other thing better. once it’s ready. But if I wanted to use pip, I could too as this should be able to achieve the same as pyvenv.cfg but without being dependent on the interpreter.

So, we would need to fix packages that generate entrypoint scripts. This my list so far:

setuptools
pkg_resources
installer

Can you clarify what needs to be fixed in importlib.metadata?

Well, the main problem here IMO is us being dependent on venv. Other than that there is also the complexity we need to introduce on venv, forcing it to implement alternative ways to handle these edge cases. But there is also the impossibility of spinning up a virtual environment with sys.executable, I am forced to create a venv, this would be really helpful in python-build.

If you are worried about adding a whole new mechanism, what if we drop the pyvenv.cfg parsing from the interpreter and instead of copying the interpreter, we make venv place a wrapper script that will parse pyvenv.cfg, set up the environment properly and call the interpreter? That seems like a pretty good idea, actually

pf_moore · October 7, 2020, 4:05pm

How can that be a problem? venv is a stdlib module and a core mechanic, so it’s absolutely OK to depend on it.

You need to explain your use case better. Is this in the context of python-build, or some other project?

I’ve no real understanding of what you mean by “run an environment”. You don’t run environments, you run interpreters. And environments are pretty tightly tied to interpreters - installed packages are potentially specific to a particular Python version and implementation (basically, look at wheel compatibility tags to get an idea of granularity). I think you probably know most of this, so I’m confused as to what you are trying to achieve here.

Can you clarify what needs to be fixed in importlib.metadata ?

Honestly, I’ve no idea. I was just pointing out that there’s a lot of places that introspect Python environments, and you’ll need to make sure they work with your new mechanism. Whereas if you use venv, you can assume that most places already know how to handle venvs, so there’s nothing extra to do.

steve.dower · October 7, 2020, 6:01pm

You might want to look at the discussion around PEP 582 to make sure you’re handling the cases raised there.

I’m generally in favor of the idea, but I also understand that very few people mind the status quo enough to want to change their workflow, and so they just won’t.

That said, if you have bugs that impact using PYTHONPATH for this with a base (not venv) interpreter, please file them.

FFY00 · October 8, 2020, 12:45am

Except it is not present by default in environments like Debian, it needs to be installed separately, and unlinke normal packages, can’t be installed via pip. IMHO that is really bad, but we’ve recently discussed this with the Debian maintainer and they are not willing to change. Their reasons do make sense, but is very unfortunate
This also means people in more exotic systems need to patch venv to work there.

There is also venv being, arguably, time-consuming.

What do you think of my proposal here?

Run an interpreter with a custom environment. And I’d argue most pure wheels are not tied to the interpreter but there are some cases where that is true.

But putting all this aside, I still think it’s a good idea and would remove some complexity from venv.

I had a look and I don’t think we have issues there.

bernatgabor · October 8, 2020, 5:08am

What makes you so confident that they’ll not patch out this solution the same way they do with venv (ensurepip). I see no guarantees to that

@pf_moore always wanted to fix this on the pip side, make if fast enough that it’s not time consuming.

The part that’s patched out by Debian is installing the seed packages: pip + setuptools. Where would this via env-var created virtual environment get those? (On debian you can do python -m venv --without-pip and that is blazing fast, and works without python3-venv module, but having a isolated python environment without pip is just marginally useful).

pf_moore · October 8, 2020, 7:37am

Strong +1 on this. The “problem” with isolated environments is essentially getting a viable installer in there.

One trick that pipx uses is to have a “shared” virtualenv containing pip, and then it drops a pipx_shared.pth file into any virtual environments it creates, that points to the “shared” files. Have you explored that sort of approach?

(Again, it addresses the “creating environments is slow” and “debian don’t ship some stuff” points, but it’s not clear if it solves @FFY00’s use case, because I’m still not clear what that is…)

virtualenv or venv. I still don’t see why that’s not sufficient.

True, but that implies that “pure wheels only” is an acceptable constraint for your use case. That’s fine, but it just confirms that I don’t get what your use case is yet.

I’d need to know how you expect that wrapper script to work (specifically on Windows, where it has to be an exe - the OS treats exe files specially in ways that mean wrappers that are not exes don’t work in some edge cases).

steve.dower · October 8, 2020, 9:06am

Really? You read the whole discussion thread on this site? Note that all the arguing burned me out before I updated the PEP text. I would be very surprised if this rough idea didn’t in any way overlap with or address the existing PEP, especially since you seem to have had the same insights that led me to propose it in the first place.

There’s also an implementation floating around that someone did, though I forget the name of it now.

I want to encourage you, because I think this is absolutely the right approach. I also want to discourage you, because you’ll be “threatening” people’s workflows, and they’ll get very defensive about it. Good luck, but be prepared

takluyver · October 8, 2020, 10:41am

I’m not sure I’ve fully understood the implications, but does this mean that specifying a path/to/env/bin/python would no longer be enough to run Python properly inside that environment? That’s a guarantee that I’ve absolutely relied on for all kinds of things.

In many cases, what I want is to start a Python subprocess in the same environment as the current one, using sys.executable. That would presumably still work in most cases, because environment variables are inherited by default. But there are also cases that don’t fit that pattern:

Telling users to use an explicit path to Python to ensure code runs in a given environment, e.g. recommending path/to/python -m pip.
Recording the value of sys.executable somewhere to be run not as a child of the current process (e.g. in Jupyter kernelspecs)

pf_moore · October 8, 2020, 11:03am

You can see this happening here already. I think there’s a lot to be said for approaches like this, but backward compatibility and “not breaking people’s workflows” is a big deal. So being very clear on how a proposal interacts with the “current approach”, as well as being very clear about what issues the proposal solves, is really important. Hence my insistence on understanding what the use case is here.

FFY00 · October 8, 2020, 12:10pm

I think I am gonna drop this as I think my use case is not compelling enough for this. There are workarounds that could be used to achieve what I originally wanted. I though that + simplifying venv was enough but as it seems, it isn’t.

If anyone thinks simplifying venv is a compelling enough point on its own, please let me know, I would be happy to continue pursuing this.

Anyways, I’d like to thank everyone for their feedback.

brettcannon · October 8, 2020, 5:56pm

I believe you’re thinking of Single file implementation of PEP582 from @kushaldas.

I think people do, but it’s a matter of how to make it very much an opt-in thing so it won’t break preexisting workflows. I mean I can’t even get the virtual environment that make venv creates in CPython’s documentation directory renamed out of people’s fear it will break someone’s workflow.

Steve isn’t exaggerating: people are extremely protective of their workflows, almost like a superstition or ritual before playing a sport where if someone’s workflow changes they will code worse or something. Some of the most angry responses we have received for the Python extension from VS Code hasn’t been from things breaking, but when we made a change that would force folks to do two clicks in setting up their environment for the first time.

FFY00 · October 8, 2020, 6:42pm

I think my proposal to Paul would work without breaking anything, because it would keep the outside behavior the same.

So I will reiterate the proposal with it described.

The idea would be to introduce an environment variable for each sysconfig path. So:

PYTHONDATA (data)
PYTHONINCLUDE (include)
PYTHONPLATLIB (platlib)
PYTHONPLATSTDLIB (platstdlib)
PYTHONPURELIB (purelib)
PYTHONSCRIPTS (scripts)
PYTHONSTDLIB (stdlib)

(another naming scheme could be proposed, but this was what I was thinking of)

We now have a problem of a Python interpreter not guaranteeing that it will run in a certain environment, because that depends on the environment variables.

To solve this I propose that we remove the pyvenv.cfg parsing from the interpreter and change venv so that instead of copying/symlinking the Python interpreter to the environment, it puts a wrapper in its place. The wrapper would unset all environment variables, search for pyvenv.cfg and parse it. This would maintain backward compatibility, the public behavior would be the same.

takluyver:

I’m not sure I’ve fully understood the implications, but does this mean that specifying a path/to/env/bin/python would no longer be enough to run Python properly inside that environment? That’s a guarantee that I’ve absolutely relied on for all kinds of things.

In many cases, what I want is to start a Python subprocess in the same environment as the current one, using sys.executable . That would presumably still work in most cases, because environment variables are inherited by default. But there are also cases that don’t fit that pattern:

Telling users to use an explicit path to Python to ensure code runs in a given environment, e.g. recommending path/to/python -m pip .

Recording the value of sys.executable somewhere to be run not as a child of the current process (e.g. in Jupyter kernelspecs)

With my proposal above, we reinstitute the original behavior. /env/bin/python would be a wrapper and responsible to clear the environment variables and make sure it runs under the /env environment.

That is fine, we can have a binary that does the environment cleanup and pyvenv.cfg parsing.

Would this proposal be good enough? Am I missing anything that could lead to breakage?

AFAIK all public behavior would still be the same, please let me know if that is not true.

bernatgabor · October 8, 2020, 6:59pm

The part I’m missing is that how would this environment get a pip?

pf_moore · October 8, 2020, 7:14pm

I agree. Pip is full of special cases and weird features that seem to have no reason for existing except that they (probably) support some one guy’s workflow somewhere. But if we removed them, you’d hear the screams on the moon

The problem for me with this is that superficially, it seems like it would add further cases for pip to cover. Maybe these could be handled in a way that minimised the impact (I assume the environment variables would affect the information returned from sysconfig, for example) but that’s something that needs to be confirmed rather than assumed.

The biggest problem pip has is the combinatorial explosion of interactions among the various options that can be specified. How would these variables interact with --user, --root, --prefix, or --target? If the user specifies PYTHONSCRIPTS, how would pip’s warning that “the place scripts are installed to is not on your PATH” need to change?

There’s a lot to think through here. It may work, but IMO that’s quite an optimistic assumption. I definitely don’t have the time to work through all the possible ways that might affect users, so I’m going to just suggest a couple of possible issues as strawman examples of “things you should think about”.

There is lots of code in the wild that works out where scripts live by doing “env base plus bin for Unix and Scripts for Windows”. That’s hard coded all over the place. I think the existence of PYTHONSCRIPTS makes that code no longer valid (it would need to check the environment variable as well).
Pip has code to locate the include directory, because sysconfig didn’t expose that location. That code now needs to change.

There’s also the fact that anyone who (for whatever reason) has an environment variable called PYTHONDATA or one of the other values, will have broken their system Python.

How does that interact with the launcher for Windows? It sounds like we’ll end up with every time someone runs a Python script, that causes the launcher to run (via the file association) which launches the wrapper in the active venv, which runs the core interpreter. Or would your proposed wrapper functionality be incorporated into the launcher?

I’ve given some examples above. Also, while I have no problem with helping you find potential issues, the responsibility is ultimately going to be on you (assuming this proposal gets as far as being written up as a PEP) to do the bulk of the research, I’m afraid.

Sorry, that’s a bit of a disorganised brain-dump. I don’t have time at the moment to organise it better, I’m afraid.

pf_moore · October 8, 2020, 7:34pm

I’m not sure that’s relevant to the use case here (of course, I don’t really know what the use case is, as we now seem to be discussing a more general “simplifying venv because that’s a good thing in itself” proposal) but either just a normal ensurepip/get-pip.py, or a .pth file to a shared pip installation, seem like reasonable approaches.

vsajip · October 8, 2020, 8:18pm

What about existing code that manages venvs and e.g. updates pyvenv.cfg, parses it to determine e.g. base python location, prompt to use for the venv and so on?

What is complicated about it, exactly? What exactly are the pain points? Remember, venv creation is pretty fast apart from the need to install pip. If you install scripts into a venv, what would their shebang lines point to?

Seems like it could be a big, potentially breaking change - to build confidence that it might work, it would make sense for the proposer to build a prototype / proof of concept, as we did when implementing PEP 405. Such a prototype could perhaps be tested in many scenarios more readily than having to think about all potential edge cases in advance.