Interpreter independent isolated/virtual environments

steve.dower · October 9, 2020, 12:51am

Yes, we in fact already have exactly this since I replaced the symlink/deep copy on Windows with the redirector we have since 3.7.

There was an earlier one. I think it was pythonloc? @cs01 did it (he also did pipx, which is what I was stuck on this morning while also knowing it was the wrong one )

It’s mostly good enough most of the time, but it’s already not perfect. But the only issues you’re likely to hit are people relying on environment variables directly (including PATH!) instead of using the sys or sysconfig modules.

pf_moore · October 9, 2020, 7:35am

But I think this proposal would end up with an extra layer, wouldn’t it? Anyway, I acknowledge that trying to argue “processes are expensive on Windows” is probably a lost cause these days, but it does add complexity (trying to work out which python process is your actual stuck application can be frustratingly difficult, for example).

steve.dower · October 9, 2020, 9:04am

They’re definitely expensive, but unavoidable if we want to preserve the semantics of how venvs work/are used. All the attempts to change this in a way that might help performance and reliability have failed in favour of maintaining existing workflows, so yeah, the performance argument is pointless (though likely not for the reason you thought )

FWIW, you can still symlink venvs on Windows instead of using the redirector, but it has some additional rough edges because of how/when Windows will resolve symlinks.

pf_moore · October 9, 2020, 9:14am

That might be an argument for @FFY00’s proposal - rather than changing existing workflows, this is a new approach that offers higher performance/reliability (at the cost of not necessarily supporting all existing workflows). That may actually be what he was originally suggesting, and I just didn’t get it. But the latest proposal seems to be about modifying venv, so maybe not.

But from experience with PEP 582, it’s likely to be really hard to distinguish the two mechanisms cleanly. It’s a “there should be one obvious way” type of problem - Python packaging is full of multiple ways to do things with no “obvious approach”. Maybe we need to recruit someone Dutch

FFY00 · October 9, 2020, 3:36pm

I don’t see why ensurepip or any of the other ways wouldn’t work.

--user would still work the same, this approach would have the same as venv or virtualnv would. It wouldn’t impact root, --root since it just prefixes the end directory. --prefix would work as expected as long as the environment you are installing to follows the normal directory structure, I think it’s fair to say “if you are using a custom directory structure, pip will not be able to install there when using --prefix”, you’d want to use --target in this case. --target would also remain the same, AFAIK it has no impact on this, it is a way pip provides to install directly to a custom directory.

That is unfortunate. I’d say that this is maybe the time to get those people to move to sysconfig.get_path('scripts'). I will add this to the potential breakage list.

Then pip should probably move to sysconfig.get_path('scripts')

Yes, that would be in the launcher.

Thank you, and no worries I really appreciate the help so far.

Would be in the wrapper.

Then that gives me more confidence. Could you tell what mechanism you used to achieve this?

So the original proposal for my use-case was that I wanted to introduce a mechanism to run interpreters with a custom environment that do not actually depend on the interpreter itself. Right now if I want to run an environment I am forced to use venv and have it create a “special” python (at myenv/bin/python) for me to use. I find this a bit wasteful in processing and time when I just want to say “run python with site-packages at myfolder”. It also originates issues in tools that use sys.executable to spawn subprocesses (eg. Allow calling hooks from a custom Python interpreter · Issue #92 · pypa/pyproject-hooks · GitHub). Some other use-case that gets more complicated with this is if I want to run a custom environment with another executable from Python, I need to invoke target-python -m venv --without-pip to create the environment and then only then I can run it. I am unhappy that I cannot create virtual environments myself and need to use venv. Ultimately, given the amount of scrutiny this was receiving, I thought this was not strong enough to survive so I decided to drop it and focus solely on the point of improving venv.

I think the proposal enables more performant virtual environments, and enables reliability, not in any o the main platforms, but in more exotic environments that may not be getting the same attention. Right now venv behaves differently on some platforms/scenarios, the goal was to solve that problem by adopting an approach that drops any dependency on the Python interpreter binary itself and shifts those mechanisms to the outside, making things more portable.

Anyway, I have mixed feelings because agree with you that we should strive to not break anyone’s use-case, but I also think things can, and should, be improved. Unfortunately Python is too big that every time we change something minor we get people complaining

bernatgabor · October 9, 2020, 3:54pm

But ensurepip would not be available just as much on Debian as it’s today with venv. So the new way and your proposed way will have the exact same issue on Debian. The sole difference would be your proposal would be operated via env-vars while currently it’s a pyenv.cfg plus a symlink/copy to the python interpreter.

FFY00 · October 9, 2020, 3:58pm

Yes, we can forget the Debian point, I didn’t realize --without-pip still worked.

pf_moore · October 9, 2020, 8:33pm

I’ll have to take your word on this for now. At some point I’d want to see at least a proof of concept, because I don’t share your optimism, to be honest - but it’s pointless worrying about it in the abstract. Let’s park this question for now.

I do this quite often in Powershell scripts, so sysconfig isn’t available (the performance cost of starting up Python to query sysconfig isn’t always acceptable, either). I agree that at some point we have to stop blocking all progress because of “somebody’s workflow”. However, I think that the script directory is a legitimate special case, as it holds executables and those need to be located from the shell, not just from Python.

Maybe it’s enough to just have a stdlib base and an “environment” base, with a standard layout within the base? Is having variables for every component an over-generalisation?

pf_moore · October 9, 2020, 8:58pm

A Different Approach

I’m wondering whether the bigger problem here is that the idea of an “environment” is far less well-defined than we assume. After all, what’s available to import is affected by a multitude of things, but what pip can install to is constrained very differently. And what can be introspected via tools like pkg_resources and importlib.metadata is different again.

So maybe we should go to the root of the problem, and properly define an “environment” object, that captures what it actually means to be a Python environment (virtual or not).

For me, in an ideal world,

An environment is a set of Python packages that can be queried and managed as a unit by Python packaging tools, and can be added to or removed from sys.path as a unit.
An environment contains additional files as well as Python packages - executables and header files are common, but other files like documentation and data files (sysconfig.get_path('data')) can be present.
Non-Python tools may need to be able to locate those additional files.
The stdlib, sitelib and usersite are standard environments available by default in a Python interpreter.
Python satisfies imports from a group of environments - which is essentially defined by what’s in sys.path.

Environments would need to be exposed as both Python objects (for manipulation in code) and as some sort of external representation, so that non-Python tools like shell scripts can handle them. An environment would be all about a set of packages, and wouldn’t be tied to a specific interpreter. (A virtual environment is basically an environment in the sense I’m defining it, linked with a particular interpreter).

The Python runtime would need a way to say what environments would be available on startup - that mechanism could include a way to say to omit the built in environments, as well as a way to add extra environments.

Pip and other installers could work in terms of installing into an environment, existing mechanisms like --user and --target would become specialised (and eventually deprecated and removed) ways of handling particular types of environment.

Mechanisms like .pth files would be a way to “link” environments. They become part of an “environment API” rather than a special case.

This is a much bigger change than you’re proposing, but the intention is that it’s a unifying mechanism. It would definitely require a PEP, and would need a lot of attention to backward compatibility and migration, but by approaching the problem as a “clean up of interpreter mechanisms” type of change I think there would be a better chance of gaining support, rather than it looking like “rethink number 27 of virtual environments”.

I’ve thought that something like this would be useful for quite some time now (mostly in the context of unifying the idea of “places pip can install into”) but never really fleshed it out. And I definitely don’t have the time to take it forward myself (although I’d love to be part of any group that did). So please, take it as a possibility and nothing more.

brettcannon · October 9, 2020, 9:03pm

Speaking from the Python Launcher for UNIX side (which is waiting on me having time to have add implicit venv recognition/usage), it’s actually both. If this is done with a shell script then it’s a matter of naming it appropriately so the Launcher picks up on it and can run it (i.e. when VIRTUAL_ENV is set).

But there also isn’t anything stopping the Launcher from parsing the pyvenv.cfg file directly and completely forgoing the proposed shell script.

To me, the way I’m reading this proposal is having a way to specify the various paths Python uses via environment variables which seems like a potentially reasonable thing (although how does this interact with -I?). Then the appropriate code in site would read those environment variables, if set, instead of using pyvenv.cfg. And if we wanted we could make the executable placed in bin/Scripts set those environment variables instead of having site read pyvenv.cfg, and potentially even deprecate the reading of pyvenv.cfg from site if we take this that far (but once again, -I might make that level of deprecation not feasible).

github.com

python/cpython/blob/9975cc5008c795e069ce11e2dbed2110cc12e74e/Lib/site.py#L513-L524


      
          sys.prefix = sys.exec_prefix = site_prefix
          
          # Doing this here ensures venv takes precedence over user-site
          addsitepackages(known_paths, [sys.prefix])
          
          # addsitepackages will process site_prefix again if its in PREFIXES,
          # but that's ok; known_paths will prevent anything being added twice
          if system_site == "true":
              PREFIXES.insert(0, sys.prefix)
          else:
              PREFIXES = [sys.prefix]
              ENABLE_USER_SITE = False

steve.dower · October 10, 2020, 12:15am

You’ve basically just described sys.path (which is great! Because I agree with you, AND it’s already implemented and works well ), and all that’s missing is for the packaging tools to respect it rather than relying on sysconfig

I know it’s more complex than that last point, but that’s essentially all that all of these proposals have been about. The “environment” is what sys.path starts as, and sysconfig provides a queryable guess as to what it will look like, since the actual logic is not in that module (it’s in getpath.c/getpathp.c).

The logic to select a venv sys.path vs a non-venv sys.path is based on argv[0] (just another way that CPython has made itself difficult to host…), but other than that there’s nothing special about a venv at all: it’s just an alternate initial sys.path.

The “magic” comes from the packaging tools, primarily pip. IIUC, pip uses some hybrid of sysconfig/distutils.sysconfig/sys.executable/pip.ini/argv to determine which of the entries in sys.path it should write to. And over time this has gotten very complicated, but it’s still the primary workflow driver in this whole thing. What pip does is what users will do, because Python doesn’t care as long as sys.path is useful.

Unfortunately, what nobody but CPython can do is change how the default sys.path is calculated. Right now the options are few, and are also convoluted. I was hoping that the new initialisation API would straighten it out, but we rushed into the current one instead of designing something that would be really useful right about now

So we’re kind of stuck with:

pip (et al.) needs to design/support any new environment structure
CPython needs to provide enough support for the new env to be specified
CPython (rightly) won’t commit to an unproven design
pip (et al.) (legitimately) can’t experiment without CPython changes

Plus, the more I talk with people using these tools in various ways, the more convinced I become that “the devil you know” is preferable to forcing change. This is why I backed off from fixing the PATH issues on Windows by renaming “py.exe” to “python.exe” - the issues would still be there, but different, undocumented, harder to diagnose, and harder to fix. Whereas the current issues are annoying, but at least are well understood and easy to find info about them. Virtual environments are similar, IMHO.

pf_moore · October 10, 2020, 9:16am

You’re mostly correct but a sys.path entry is more like a “directory” (ignoring the path hook mechanisms, which I’m allowed to do as I invented them ) whereas an environment is a bit more than that (include, data and script directories). Much of the historical complexity in packaging tools comes from the need to infer locations for those other bits from a single directory.

So you could argue that my “environment” abstraction is a “directory with metadata saying where to put associated files” and not be too far wrong. And with importlib.metadata, we’re starting to get mechanisms to fit that sort of information back onto path items.

Yes, this is the crux of the issue. If it weren’t for this one point, we could experiment with everything in this discussion in userspace, with no need to involve core CPython until there’s a fully proven proposal (and maybe not even then).

pradyunsg · October 10, 2020, 11:26am

Is there any example other than Debian here?

edit: to elaborate a bit, the primary motivation here seems to be wanting to work around the breaking up of Python that Debian does on principle that Debian users don’t need “unnecessary stuff included”. I do agree that a change is needed here, but I really think the place where the change needs to be made is not Python packaging tools or Python itself either.

FFY00 · October 10, 2020, 2:55pm

pf_moore:

An environment is a set of Python packages that can be queried and managed as a unit by Python packaging tools, and can be added to or removed from sys.path as a unit.

An environment contains additional files as well as Python packages - executables and header files are common, but other files like documentation and data files ( sysconfig.get_path('data') ) can be present.

Non-Python tools may need to be able to locate those additional files.

The stdlib, sitelib and usersite are standard environments available by default in a Python interpreter.

Python satisfies imports from a group of environments - which is essentially defined by what’s in sys.path .

Environments would need to be exposed as both Python objects (for manipulation in code) and as some sort of external representation, so that non-Python tools like shell scripts can handle them. An environment would be all about a set of packages, and wouldn’t be tied to a specific interpreter. (A virtual environment is basically an environment in the sense I’m defining it, linked with a particular interpreter).

The Python runtime would need a way to say what environments would be available on startup - that mechanism could include a way to say to omit the built in environments, as well as a way to add extra environments.

Okay, I like this. If are gonna possibly break people’s workflows we could take that opportunity to introduce major improvements here.

Your proposal here actually does tie in pretty well with some projects I was planning. For eg. I was already planning a project to abstract Python environment into Python objects and provide an easy to use, simple API to manage them in code.

(beware: very opinionated and big proposal here)

If we think on a high level, what we are managing boils down to specific versions of packages/modules.

How would you feel about possibly introducing a shared package directory (eg. /usr/lib/python3.8/packages) which could hold different versions of multiple packages.

$ ls /usr/lib/python3.8/packages
pip-20.1.1
pip-20.2.3
build-0.0.1
build-0.0.2
build-0.0.3
build-0.0.3.1
build-0.0.4
packaging-20.4
toml-0.10.1

And have environments just being a list of packages, making it easily modified in Python code and by external tools.

$ cat myenv.json
{
    "pip": "20.2.3",
    "build": "0.0.3.1",
    "packaging": "20.4",
    "toml": "0.10.1"
}

$ PYTHONENVIRONMENT=myenv.json python
>>> import build
>>> build.__version__
0.0.3.1
>>> import setuptools
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'setuptools'

This isn’t as simple as just injecting something to sys.path, although it could be implemented that way. There are a lot of things to figure out here, and I am not gonna annoy anyone by going into details now.

What I’d like to know if this would be a possible direction you could see things moving forward. It is a very different approach to what we do currently, but I think could simplify things.

@FFY00 runs away before people start throwing stones…

Debian is not the primary point here, you can ignore it

pf_moore · October 10, 2020, 3:56pm

Superficially, I’m a pretty strong -1. Setuptools in the past tried to allow multiple versions to co-exist peacefully, and it was never very popular, had a number of issues (I believe) and is generally considered a failed experiment. I’d want to see strong evidence that any new proposal along those lines had learned from the setuptools experience, or had a very clear parallel in other languages so that we could learn from them how they had addressed the issues, before being comfortable with such a proposal.

I’m also worried about any proposal that has a “shared directory” which only describes the proposal in terms of Unix. Where would such a directory go in Windows? How would it interact with the multiple versions of Python (user install, system install, store distribution, nuget package)? You seemed up to this point to be pushing to make all the various directories configurable, but now you’re talking about a new directory altogether?

I think “having multiple versions co-exist” is a different topic, and should be posted separately if you want to pursue it. Let’s not have this thread end up with so many things being discussed that no-one can follow it - it’s hard enough already…

I’m happy to bounce ideas around. But they will need fleshing out at some point, and they need to integrate with the Python ecosystem. They also need use cases to justify them - as @steve.dower said, new features have to be better than “the devil you know” at solving real world problems, or they just change the issues people have to fight with, rather than fixing them…

FFY00 · October 10, 2020, 6:50pm

That is fair.

I believe setuptools’s approach was significantly different from this one. The approach that setuptools took, unless I am mistaken, required the users to request specific versions at runtime. Projects needed to explicitly support this, they needed to modify their code to use the custom import method. There are many issues with this, but the main ones for me are adoption inertia, if nobody starts adopting this other people would probably also not adopt and it is required a big force to get people adopting in the first place, the 2nd one is that I think it would actually complicate things more as different packages start requiring different versions at runtime.

My proposal, as opposed to this, does not require any modifications to the source code and does not have the runtime dependency problem. It just adds a way to select what is available in the import statement. All this makes it so that it has a 0-effort adoption.

Anyway, I can work on this. This was something that I wanted to do anyway, just maybe not exactly in this format. If anyone wants to join me, please let me know!

The directory could go on the same folder as site-packages.

Wouldn’t really matter much. Any Python could run an environment, the only thing is that the there should be a compatible version of the requested packages. On pure Python, everything would be compatible, on binary packages, there should be a version compatible with the interpreter you are using.

Yes, because I thought it fit well your vision. I was planning a project that did this, but having this as a native option in the interpreter provides better UX, so I just threw it out in the air.

Sure. I can bring this up once I have a PoC.

This would have the same benefits as the original proposal. But we still have to get a PoC working so that we can evaluate if it has any unexpected drawbacks.

pf_moore · October 10, 2020, 7:35pm

Thanks. I think I understand your suggestion better now. I still have reservations (for example, I’ve no idea how installers would need to change to handle this) but feel free to try the idea out and we can see how well it works in practice.

As has been already mentioned, most of this can be done without needing changes to core Python - the only hard bit is setting up sys.path but for a prototype I don’t see anything wrong with having a chunk of boilerplate at the top of the script that sets up sys.path manually. That should be good enough to test out the ideas.

FFY00 · October 10, 2020, 9:03pm

I also have my reservations, but I am optimistic

I’d imaged they could put the dependencies in the dependency pool and just install a .pth file in site-packages. We could also fully embrace the new environments by making site-packages an environment (site-packages.json) and letting it be the default. But that comes later.

Yeah, I was thinking of have a simple wrapper that does all the parsing and path setup, acting like the proposed interpreter.

I’ll probably hold on until we have GitHub - pradyunsg/installer: A fork for making that other thing better. usable (not necessarily stable).

steve.dower · October 10, 2020, 11:25pm

I think it’s an interesting idea. Certainly an import hook could do it, and you could assemble the packages into the format you want by hand at first.

I’m not entirely clear which problem it’s solving, though I can certainly think of a few. But I’m hard pressed to come up with any that are bad enough now to justify revising the entire approach.

Also, a second order effect of this kind of approach that scares me is packages caring less about breaking compatibility with their previous versions. I really want the ecosystem to get better at this as a whole, and I think encouraging packages by making it easier/obvious/cheap/official to just keep multiple versions floating around would work against this.

westurner · October 11, 2020, 5:46am

pip and virtualenv (and virtualenvwrapper) predate the __pycache__ directory that was added in order to solve for .pyc/.pyo collisions when sharing the same source directory between versions of the Python interpreter https://www.python.org/dev/peps/pep-3147/
a subprocess to launch a different version of the interpreter (according to a config file that must be read before any interpretor invocation) is suboptimal, IMHO. .pth files are also a dirty hack.
pip caches (all downloaded versions of all) packages in the pip cache directory
debian removing venv is annoying and unnecessary. Blocking sudo pip install would be more reasonable.
Actual package managers set appropriate permissions; such that the app cannot overwrite itself or its dependencies.
I suspect that package-install-time compilation into a shared site-packages directory will be problematic. TIL that the --relocatable flag in virtualenv has been removed. Changing the shebangs was never a fully-sufficient solution anyway due to paths getting compiled into things. Wheels may fix that now?

I don’t have need for this: having tox create appropriately-isolated envs in order to run tests with different interpreter versions (and pypy) is all I ever need to do. But nonetheless I wish you luck with this endeavor.