PEP 704 - Require virtual environments by default for package installers

The discussion has been focused mainly on pip, but the PEP talks about “installers” in general. Does your view change if I ask you whether you would be happy implementing this functionality in python -m installer <path_to_wheel>? That’s not a troll, it’s a genuine question - we really do have multiple installers these days, and it’s a good sanity check on whether a proposal is saying “pip should do X” or “installers should do X”.

No, for the same reason that it won’t implement PEP 668’s suggested behaviours for “Python-specific package installer (that is, a tool such as pip - not an external tool such as apt)”.

The key difference is that it’s not meant to be a user-facing tool like pip, mousebender, etc are intended to be, and is primarily a shared implementation of unpacking wheels (the hope is that it’ll gain parity with pip and pip would switch to it) and the CLI is meant for “low level” use (i.e. setting up your Linux distro’s lowest layers before you have pip).

I share most of the concerns brought up above, particularly :

At the end of the day, if import foo.bar resolves to something, then foo is “installed” independent of how the import mechanism discovered it, how the those files (if any!) ended up on disk, or if there is any metadata files floating around. While pip and venv are obviously very privileged implementations of how to install packages and isolate things given their relation to core Python, they are by far not the only, and in many cases not the best solutions, to these problems. Please be careful that these privileged tools do not become overly indexed for the kinds of use cases that their maintainers happen to have.


Related, this proposal seems to take as given that in-tree virtual envs are the best (or at least sufficiently consensus best) option and should be suggested as the “standard”.

This pattern prevents having multiple envs with different versions of Python/dependencies/both for the same project because it picks a privileged name / location. Further, it makes it very awkward (particularly coupled with auto-activate / discovery based on cwd) to work on multiple projects that interact with each other (e.g. multiple libraries that depend on each other or a library and an application that uses it).

I do not think these are “fringe” or “advanced” use cases. In the sciences a very common pattern, and one that is being actively encouraged, is to put the reusable parts of the analysis into a library and have a separate “application” which uses it for the (embargoed) science analysis. In my now ~15 years of using Python, I am not sure I was ever in a situation where in-tree was a suitable, let alone ideal, solution.

That in-tree venvs is encouraged by some tools as also lead to pain in downstream projects when users have checked the whole venv in and required re-writing history (see Minor re-writing of history and moving from master to main for default branch - Development - Matplotlib) and any search for a function from Matplotlib on GitHub to estimate actual usage is completely spoiled by the many (many) people who have committed the whole venv to their hello-world project.


A better solution is for the pip that conda ships to simply patch out this behavior (or at least change the default)

6 Likes

My concern with this is that people would update pip with pip, in a Conda environment, following the instructions that pip prints.

3 Likes

Can you clarify this bit? If the shared code is in a library, then aren’t you installing it for your application? And if so, can’t you install it for each application? Is the convenience of not installing it per application what you suggesting by having it all in a single environment?

In my 20 years I have constantly found it suitable, and from a tooling perspective, ideal. :wink: But I’m going to start up a separate topic to see if we can’t find some solution that works for more use cases (including yours).

2 Likes

I completely believe you, but that is my whole point! There are a lot of valid ways to solve this particular problem and the differences between them likely skew in to subjective preference. If I wrote this pep the message would look like :

Please create a venv! run the command python -m venv ~/.virtualenvs/sys310 & . ~/.virtualenvs/sys310/bin/activate and then re-run the command

(with templating based on the Python version) and would expect to be argued with because that would be overly privileging my (possibly narrow) view of how this should be done.

I’ll reply to this in the new topic.

2 Likes

The analogy that comes to mind here is the, in principle, very helpful message GitHub gives the user when they try to push to a branch which has diverged from their local branch that suggest pulling the remote branch and then pushing again. In many cases this is the right thing to do, however on Matplotlib when there are merge conflicts between a feature branch we prefer rebasing over merging upstream main into the feature branch (please accept this position and that we do not squash merge as a given to avoid also spawning a discussion of the “right” git workflow :wink: ).

This means that the first time many contributors have to do a rebase they (reasonably) follow the instructions git told them and end up with a whole bunch of extra commits and then we have to walk them through un-doing it (we finally wrote it up in the docs).

If this PEP does go in I fear the amount of “yes, pip does say that and in some cases it is right, but …” discussions that will have to be had.

1 Like

I’m generally someone who avoids putting my virtual environments into my project directory, and that’s honestly because doing so ends up having a lot of bad behaviors by default. Lots of tools recurse from the current directory, and when you stick your virtual environment in that current directory, it means it’s going to recurse into that as well.

Now almost all of those tools offer some mechanism to fix it, typically by ignoring that directory, but that ends up requiring me to configure each of those tools independently, and often times per project, so when i switch to another project, the bad behavior comes back.

That being said, presumably if we standardized on something in tree, eventually tools would ignore those paths by default, and the biggest pain point of them goes away.

Though it would likely by ideal if we could pick something that supported multiple interpreters, because I suspect a non trivial number of people have reason to have multiple environments per project, and .venv doesn’t enable that.

4 Likes

I share some of Thomas’ concerns (although for many projects of mine an in-tree virtualenv would work fine).

The solution I use these days is to have a .venv file that contains the path to a virtualenv that lives somewhere else. It works pretty well and is very flexible, e.g. for switching between multiple interpreters on the same project or re-using the same environment for multiple projects.

This is easy to build tooling off of, e.g. I use a shell plugin that checks this to automatically activate environments when I change cwd. And maybe Brett could teach this trick to VSCode :slight_smile:

1 Like

Personally, I’ve been using .venv as a symlink to the actual virtual environment for some time now to be able to switch between multiple venvs for a project. Some things expect .venv to be a folder rather than a file and symlinks work great with that. I still put all the venvs in-tree within a .venvs folder (which has caused me problems a bit more often than .venv does as not as many tools ignore it by default) as I like to keep everything in the project folder but there’s nothing that would prevent this from working with a global directory for all venvs. Sadly, symlinks aren’t universally supported by all file systems so it’s not really a viable option as a standard. Still, using .venv file would probably cause more problems with various tooling than a painless symlink.

3 Likes

I’m open to ideas for better conveying this in the PEP, but here’s my take on this: An in-tree .venv is a reasonable default. We’re not locking people out of their workflows by deciding on a default.


I want to go a bit further though: Do anyone think it’s not reasonable to say “An active virtual environment is required. If you don’t have one, you can create it by …”?

While it clearly biases toward one approach (the one suggested), it certainly doesn’t lock out of other approaches for managing virtual environments. It also provides a clear “here’s how to get to a free-of-paper-cuts setup” approach.

Yes, I understand and appreciate that there are workflows where the proposed approach isnt sufficient. However, no one is talking about preventing people from creating virtual environments in other locations.

We can’t do something that works for every known workflow, therefore we shouldn’t pick a default is a bad approach.

My take on multiple interpreters is that you should be using an environment management tool at that point, circa nox and friends, to do that work.


I’ll update the PEP to cover Conda, multiple environments per project and centralised storage of environments; but that’s not gonna happen until tomorrow.

If tools like venv and virtualenv supported that usage, that would be a reasonable possibility (although it still doesn’t solve the “back link” issue for identifying where the venv is being used). But manually creating a virtual environment in a shared location and then creating a symlink is a lot less convenient than python -m venv .venv. Particularly as I very rarely use symlinks, so I can never remember how to create them - for reference, it’s New-Item -Type SymbolicLink .venv -Target C:\Some\Path\To\shared\venv where it appears that you can’t use ~ in the target or some weird things go wrong…

If there’s a standards-based recommendation for the name of the expected virtual environment (i.e. what this PEP is trying to establish) then yes, this is a reasonable message. But like most other people, I don’t think it’s up to a PEP to make that statement, it’s for the maintainers of the individual tool(s) to choose how to let the user know.

Even without an established standard, tools are welcome to add this message now, and can suggest whatever environment name they like. A PEP is only needed for us all to agree on what name we want to assume in the absence of any other information.

It seems like this PEP is drifting towards just saying:

  1. Installers SHOULD refuse to install into any environment that isn’t a virtual environment without an explicit opt-in from the user (in the form of either an explicit install destination like --prefix or --target, or a specific --i-know-what-i-am-doing flag).
  2. Tools wanting to determine a user’s default/intended virtual environment SHOULD look for a virtual environment named .venv, in the project root (or “by searching directories upwards from the current directory”, or “alongside pyproject.toml”, or whatever you want to say here).
  3. Tools wanting to create a virtual environment on a user’s behalf SHOULD give it (or recommend) the name .venv, in (whatever directory matches the logic from (2) above) unless the user explicitly requests another name.

To be honest, (1) seems quite different from the other two points.

Also, (2) and (3) probably need some refining, as we may need something to address the possibility that the user explicitly requests a different name (as allowed by (3)) and then the logic in (2) gets hopelessly confused because it’s not aware of that decision…

To put this another way:

  1. I think requiring an active virtual environment is an independent point, and given that in reality pip is the only installer likely to be affected, I think it’s something pip can do without a PEP.
  2. I think virtual environment naming is worthy of an (interoperability) PEP if we want to standardise it, but it needs more substance, to cover how we track the user’s choice of name if they override the standardised default. Given that venv is a stdlib module, this may even take such a PEP beyond packaging standards and into the area of a core standard (for example, if we want to add “owner” metadata to virtual environments).
4 Likes

I definitely agree with that, I’m mostly working on Linux nowadays but I still have trouble remembering if the first argument is the target or link name. Since it’s a workflow I haven’t seen anywhere else, I personally just wrote a direnv layout script that prepares this for me (creates a .venvs/3.x venv using a specified version with “$dirname-3.x” prompt if it doesn’t exist and updates the .venv symlink appropriately) but having some support from venv/virtualenv would make this more manageable. But even then, I’m not convinced this would be a good solution due to the problems with symlinks/junction links on Windows and file systems that simply don’t have support for symlinks at all.

I’m unsure what you mean by that. If a venv is put in .venvs directory inside the project folder, it should be known where the venv is used but if it’s not (in case the symlink just points to a venv in some global venvs directory), the custom venv’s prompt should still make it clear what the venv is used for (unless you’re looking to figure this out programatically). Did you mean something else by the “back link” issue?

Sorry, the conversation is split across two threads, I think. What I mean is that if I delete my project directory, there’s nothing that lets me know that the environment (in a central location) is now “orphaned” and can be deleted.

Tools could exist that manage this (remove orphaned environments, for example) and disciplined use of tools/processes could avoid it (a delete-project script that tidies up referenced environments). But I’m not disciplined, and mistakes happen, so having the information recorded in the first place is important.

Yes, that is what bothers me with virtual environments that are not next to the project. How do you handle the orphaned environments? For example Poetry does that, and last time I checked there was no clean way to handle this. No way to know if a environment is still in use or not. I do not know what a clean workflow would look like. How do you do it? Do you have maybe some text file or database of which environment corresponds to which project? I am honestly curious.

I guess this belongs in the other thread.

Yes, I think it’s unreasonable to say that, at least so briefly. “Required” for what, exactly? I’m pretty sure a venv is not required simply for running Python; is a venv required for any use of pip? For any use of pip with these parameters? (Or without, same difference.)

I’m also not a fan of the statement that virtual environments are “essential”, since - again - anything that doesn’t require any third-party software shouldn’t require a venv (unless I’m completely misunderstanding something here).

To be completely honest, if pip starts saying “there MUST be a venv active AT ALL TIMES”, I’m just going to create a single venv in my home directory and activate it in my .bashrc to shut up the message. In effect, it would be exactly the same as the current form of user-level installation. What’s the advantage? (The PEP as currently written hints at a theoretical way to opt out of this demand, so if that exists, I’d use it; but if it doesn’t, a single global venv is basically the same thing anyway.)

Virtual environments are extremely helpful for applications that are going to get deployed. For everything else, why are they mandatory?

6 Likes

For installing packages. No?

The PEP (and particularly the proposed error message) isn’t entirely clear on that point, hence my post.

1 Like