PEP 704 - Require virtual environments by default for package installers

tacaswell · January 19, 2023, 1:49am

I completely believe you, but that is my whole point! There are a lot of valid ways to solve this particular problem and the differences between them likely skew in to subjective preference. If I wrote this pep the message would look like :

Please create a venv! run the command python -m venv ~/.virtualenvs/sys310 & . ~/.virtualenvs/sys310/bin/activate and then re-run the command

(with templating based on the Python version) and would expect to be argued with because that would be overly privileging my (possibly narrow) view of how this should be done.

I’ll reply to this in the new topic.

tacaswell · January 19, 2023, 2:09am

The analogy that comes to mind here is the, in principle, very helpful message GitHub gives the user when they try to push to a branch which has diverged from their local branch that suggest pulling the remote branch and then pushing again. In many cases this is the right thing to do, however on Matplotlib when there are merge conflicts between a feature branch we prefer rebasing over merging upstream main into the feature branch (please accept this position and that we do not squash merge as a given to avoid also spawning a discussion of the “right” git workflow ).

This means that the first time many contributors have to do a rebase they (reasonably) follow the instructions git told them and end up with a whole bunch of extra commits and then we have to walk them through un-doing it (we finally wrote it up in the docs).

If this PEP does go in I fear the amount of “yes, pip does say that and in some cases it is right, but …” discussions that will have to be had.

dstufft · January 19, 2023, 2:10am

I’m generally someone who avoids putting my virtual environments into my project directory, and that’s honestly because doing so ends up having a lot of bad behaviors by default. Lots of tools recurse from the current directory, and when you stick your virtual environment in that current directory, it means it’s going to recurse into that as well.

Now almost all of those tools offer some mechanism to fix it, typically by ignoring that directory, but that ends up requiring me to configure each of those tools independently, and often times per project, so when i switch to another project, the bad behavior comes back.

That being said, presumably if we standardized on something in tree, eventually tools would ignore those paths by default, and the biggest pain point of them goes away.

Though it would likely by ideal if we could pick something that supported multiple interpreters, because I suspect a non trivial number of people have reason to have multiple environments per project, and .venv doesn’t enable that.

hauntsaninja · January 19, 2023, 7:47am

I share some of Thomas’ concerns (although for many projects of mine an in-tree virtualenv would work fine).

The solution I use these days is to have a .venv file that contains the path to a virtualenv that lives somewhere else. It works pretty well and is very flexible, e.g. for switching between multiple interpreters on the same project or re-using the same environment for multiple projects.

This is easy to build tooling off of, e.g. I use a shell plugin that checks this to automatically activate environments when I change cwd. And maybe Brett could teach this trick to VSCode

jack1142 · January 19, 2023, 8:14am

Personally, I’ve been using .venv as a symlink to the actual virtual environment for some time now to be able to switch between multiple venvs for a project. Some things expect .venv to be a folder rather than a file and symlinks work great with that. I still put all the venvs in-tree within a .venvs folder (which has caused me problems a bit more often than .venv does as not as many tools ignore it by default) as I like to keep everything in the project folder but there’s nothing that would prevent this from working with a global directory for all venvs. Sadly, symlinks aren’t universally supported by all file systems so it’s not really a viable option as a standard. Still, using .venv file would probably cause more problems with various tooling than a painless symlink.

pradyunsg · January 19, 2023, 9:58am

I’m open to ideas for better conveying this in the PEP, but here’s my take on this: An in-tree .venv is a reasonable default. We’re not locking people out of their workflows by deciding on a default.

I want to go a bit further though: Do anyone think it’s not reasonable to say “An active virtual environment is required. If you don’t have one, you can create it by …”?

While it clearly biases toward one approach (the one suggested), it certainly doesn’t lock out of other approaches for managing virtual environments. It also provides a clear “here’s how to get to a free-of-paper-cuts setup” approach.

Yes, I understand and appreciate that there are workflows where the proposed approach isnt sufficient. However, no one is talking about preventing people from creating virtual environments in other locations.

We can’t do something that works for every known workflow, therefore we shouldn’t pick a default is a bad approach.

My take on multiple interpreters is that you should be using an environment management tool at that point, circa nox and friends, to do that work.

I’ll update the PEP to cover Conda, multiple environments per project and centralised storage of environments; but that’s not gonna happen until tomorrow.

pf_moore · January 19, 2023, 10:13am

If tools like venv and virtualenv supported that usage, that would be a reasonable possibility (although it still doesn’t solve the “back link” issue for identifying where the venv is being used). But manually creating a virtual environment in a shared location and then creating a symlink is a lot less convenient than python -m venv .venv. Particularly as I very rarely use symlinks, so I can never remember how to create them - for reference, it’s New-Item -Type SymbolicLink .venv -Target C:\Some\Path\To\shared\venv where it appears that you can’t use ~ in the target or some weird things go wrong…

pf_moore · January 19, 2023, 10:35am

If there’s a standards-based recommendation for the name of the expected virtual environment (i.e. what this PEP is trying to establish) then yes, this is a reasonable message. But like most other people, I don’t think it’s up to a PEP to make that statement, it’s for the maintainers of the individual tool(s) to choose how to let the user know.

Even without an established standard, tools are welcome to add this message now, and can suggest whatever environment name they like. A PEP is only needed for us all to agree on what name we want to assume in the absence of any other information.

It seems like this PEP is drifting towards just saying:

Installers SHOULD refuse to install into any environment that isn’t a virtual environment without an explicit opt-in from the user (in the form of either an explicit install destination like --prefix or --target, or a specific --i-know-what-i-am-doing flag).
Tools wanting to determine a user’s default/intended virtual environment SHOULD look for a virtual environment named .venv, in the project root (or “by searching directories upwards from the current directory”, or “alongside pyproject.toml”, or whatever you want to say here).
Tools wanting to create a virtual environment on a user’s behalf SHOULD give it (or recommend) the name .venv, in (whatever directory matches the logic from (2) above) unless the user explicitly requests another name.

To be honest, (1) seems quite different from the other two points.

Also, (2) and (3) probably need some refining, as we may need something to address the possibility that the user explicitly requests a different name (as allowed by (3)) and then the logic in (2) gets hopelessly confused because it’s not aware of that decision…

To put this another way:

I think requiring an active virtual environment is an independent point, and given that in reality pip is the only installer likely to be affected, I think it’s something pip can do without a PEP.
I think virtual environment naming is worthy of an (interoperability) PEP if we want to standardise it, but it needs more substance, to cover how we track the user’s choice of name if they override the standardised default. Given that venv is a stdlib module, this may even take such a PEP beyond packaging standards and into the area of a core standard (for example, if we want to add “owner” metadata to virtual environments).

jack1142 · January 19, 2023, 10:47am

I definitely agree with that, I’m mostly working on Linux nowadays but I still have trouble remembering if the first argument is the target or link name. Since it’s a workflow I haven’t seen anywhere else, I personally just wrote a direnv layout script that prepares this for me (creates a .venvs/3.x venv using a specified version with “$dirname-3.x” prompt if it doesn’t exist and updates the .venv symlink appropriately) but having some support from venv/virtualenv would make this more manageable. But even then, I’m not convinced this would be a good solution due to the problems with symlinks/junction links on Windows and file systems that simply don’t have support for symlinks at all.

I’m unsure what you mean by that. If a venv is put in .venvs directory inside the project folder, it should be known where the venv is used but if it’s not (in case the symlink just points to a venv in some global venvs directory), the custom venv’s prompt should still make it clear what the venv is used for (unless you’re looking to figure this out programatically). Did you mean something else by the “back link” issue?

pf_moore · January 19, 2023, 10:59am

Sorry, the conversation is split across two threads, I think. What I mean is that if I delete my project directory, there’s nothing that lets me know that the environment (in a central location) is now “orphaned” and can be deleted.

Tools could exist that manage this (remove orphaned environments, for example) and disciplined use of tools/processes could avoid it (a delete-project script that tidies up referenced environments). But I’m not disciplined, and mistakes happen, so having the information recorded in the first place is important.

sinoroc · January 19, 2023, 11:30am

Yes, that is what bothers me with virtual environments that are not next to the project. How do you handle the orphaned environments? For example Poetry does that, and last time I checked there was no clean way to handle this. No way to know if a environment is still in use or not. I do not know what a clean workflow would look like. How do you do it? Do you have maybe some text file or database of which environment corresponds to which project? I am honestly curious.

I guess this belongs in the other thread.

Rosuav · January 19, 2023, 12:18pm

Yes, I think it’s unreasonable to say that, at least so briefly. “Required” for what, exactly? I’m pretty sure a venv is not required simply for running Python; is a venv required for any use of pip? For any use of pip with these parameters? (Or without, same difference.)

I’m also not a fan of the statement that virtual environments are “essential”, since - again - anything that doesn’t require any third-party software shouldn’t require a venv (unless I’m completely misunderstanding something here).

To be completely honest, if pip starts saying “there MUST be a venv active AT ALL TIMES”, I’m just going to create a single venv in my home directory and activate it in my .bashrc to shut up the message. In effect, it would be exactly the same as the current form of user-level installation. What’s the advantage? (The PEP as currently written hints at a theoretical way to opt out of this demand, so if that exists, I’d use it; but if it doesn’t, a single global venv is basically the same thing anyway.)

Virtual environments are extremely helpful for applications that are going to get deployed. For everything else, why are they mandatory?

sinoroc · January 19, 2023, 1:49pm

For installing packages. No?

Rosuav · January 19, 2023, 1:59pm

The PEP (and particularly the proposed error message) isn’t entirely clear on that point, hence my post.

pradyunsg · January 20, 2023, 10:24pm

@Rosuav I’m not sure I follow what you’re saying. The PEP’s language is:

When a user runs an installer without an active virtual environment, the installer SHOULD print an error message and exit with a non-zero exit code.

Do you want “runs an installer” to say more specific?

If it isn’t this, what language in the PEP isn’t sufficiently clear?

Only if you do additional things – i.e. create a virtual environment with --use-system-site-packages (or however that’s spelt).

That’s not however the default for virtual environments, and that isolation from the global/system environment is (partly) the whole point of virtual environments in this context.

Rosuav · January 20, 2023, 10:43pm

Is pip an installer, or is pip install an installer? Can I pip search without a venv? (Probably but I’m not entirely sure.) Can I pip freeze without a venv? (No idea.)

Would appreciate some clarity on this point too, then; what exactly ARE the differences between all the different ways of isolating? Clearly my mental model of user installations and virtual environments is wrong, given that I have generally thought that /usr/local/lib/python3.X/dist-packages is “stuff installed by your system package manager”, ~/.local/lib/python3.X/site-packages is “stuff you installed globally with pip”, and the currently-active venv is “stuff you installed for this app only, with pip”. Where am I wrong here? Or is it that venvs are supposed to replace the second category?

CAM-Gerlach · January 21, 2023, 8:21am

Some additional more content-relevant notes on the PEP, originally made on the review:

What counts as a “virtual environment”? Only an environment created with venv/virtualenv? What about Conda environments? Or PEP 582 environments? Or other types of isolated environments? IMO, the PEP should either explicitly define this, or link (:term:) to a precise and authoritative definition, e.g. in the PyPA glossary.
[Also, what counts as “an installer”? Obviously, pip counts (as its mentioned by name), and I’m assuming Hatch, PDM and Poetry count as well when used in that capacity, while apt, dnf, brew and choco don’t, though that isn’t explicitly stated.

If you limit it to “Python-specific” installer, what about tools like shiv, pex, etc.? And what about, of course, Conda? IMO, the PEP should provide a precise definition of that term and examples of what tools would and would not quality (similar to PEP 668 for what qualifies as an externally-managed environment).
Further, what “Python version” is being referred to—the version in the environment being installed in to? The version in the installer’s own runtime? Something else? The PEP should be specific here.

Workflow tools (which manage virtual environments for the user, under the hood) should be unaffected, since they should already be using the virtual environment for running the installer.

They will be affected if they don’t use venv/virtualenv virtual environments (e.g. Conda envs, I’d assume PDM PEP 582 envs, and possibly others), assuming tools don’t detect them. Either way, seems like that should be explicitly addressed in the PEP.

FWIW, this is also my experience this or something somewhat similar is a pretty common workflow as well, what I often practice and recommend to others. Additionally, another very common scenario I run into a lot, both inside and outside of the sciences, is multiple libraries and/or applications developed together that interact and must live in the same environment.

Yeah; that was my concern above as well. My impression is that given how common this is on the interwebs and in printed messages, despite the fact that it is usually (though not always) a bad idea, that enough users are likely to do this such that this would be likely to be a large-scale UX problem.

potiuk · January 27, 2023, 2:35pm

As helpfully advised by @pradyunsg, I am cross-posting my comment here. Just coming from Apache Airflow context where we we’ve been discussing it for a long time and have been involved in many discussions about it:

(copied from PEP 704: Require virtual environments by default for installers by pradyunsg · Pull Request #2964 · python/peps · GitHub)

Just a comment about that one. I really like that we have an opt-out now rather than opt-in (but it’s good we have an explicit opt-out as well that is useful in many cases - for example in most container use-cases. Those are specific use-cases and having and opt-out possible is more than enough for those cases - especially for legacy use cases.

Also having a default convention for the venv usage is great as part of this PEP. This will make a number of use-cases simpler and less decisions to make and having implicit steps for activation of ~/.venv environment is a good one too.

pradyunsg · February 5, 2023, 11:23am

I’ve been reading through the discussion again prior to another round of updates. Other than requests for clarifying language like “what is an installer”/“what is an environment” etc, I’m noticing two things here:

Concerns that UX of tooling isn’t supposed to be a PEP.
Concerns that having a single virtual environment workflow documented as the default is problematic because single virtual environment based workflows don’t cover all use cases/workflows.

For the first… I guess I’m hitting a governance/process issue. I’d figured that we’d want this to be a widely discussed thing that benefits from going through the same framework as a PEP, so why not make this a PEP. And the counter argument of we don’t do PEPs like that is… frustrating but fair. I’m not sure what to do about this. I don’t think that discussing this only on pip’s issue tracker is the right way to go about this because it affects not just pip but also how it interacts with multiple other things! I guess I’m hitting the wall of our process not fitting what I think we need here, and I’ll take that discussion to a separate thread.

For the latter… I know and agree. Nothing is blocking you from having a multiple virtual environment workflow or having a workflow where there’s a centralised management of virtual environments — having a consistent default suggestion isn’t going to block projects that need more complex workflows from continuing to use them.

Regarding conda, if we draw the line as Conda environments are basically system environments rather than Python environments because Conda ships everything, the obvious cororally to that is that it shouldn’t be treated differently and should require virtual environments. Now, it is well known and well documented that conda and pip interoperability isn’t great and that the two operate on different metadata models. This PEP would effectively enforce a clear split between managed-by-conda and managed-by-pip. Now, I’ll admit, with this view, I’m suggesting that we break user workflows — and, that aspect of this PEP should be better clarified; I’ll do so. I do think that enforcing this clear separation between managed-by-pip and managed-by-conda packages will be a good thing.

It’s a balancing act though, and if folks think that we should be doing something different, I’m all ears.

This PEP currently does not require any sort of automatic activation of environments. That will be a stumbling block for people who want a no-extra-steps workflow. However, it also means that we’re changing a subtle failure/thing-that-would-cause-issues-later with an explicit error which will also provide guidance on what to do. Subjectively, I think that’s a better place to be in since consistent errors and clear guidance are better than inconsistent failure modes and difficult to find/locate/apply guidance.

Agreed. No one is saying that you need to create a virtual environment to use Python. The PEP is saying you should be creating a virtual environment for installing and using third-party software; by default. There’s an opt-out for workflows that need it.

Perfect, you’re the exact sort of user persona that I want to have an opt-out for.

FWIW, this isn’t limited to sciences.

This is generally what gets recommended for reusable functionality vs business logic, for example.

I noticed that I didn’t clarify when I responded to this earlier: the proposal is that in-tree virtualenvs are good-enough to be a default suggestion while being easy-enough to discover and reason about. The difference is perhaps subtle but important. To be explicit, they’re not universally the best!

I wasn’t sure how to respond to this or whether to let this slide unresponded to. I’m reading this as implying that this is what is happening here and be careful to not do it — if so, IMO that is not correct. Avoiding that from happening is literally why I wanted this to be something that’s not just discussed on pip’s issue tracker.

FWIW, I guess I should clarify that the things that the PEP suggests aren’t “things someone likes and wants to push on everyone”. The whole point of this PEP is to change a workflow expectation: that pip itself can be used outside of virtual environments and it’ll unpack to the user-site or system-site by default. If we don’t want to change that workflow expectation, that’s fine. I don’t have a horse in this race; how a PEP like this changes things for experts who maintain Python or Python’s packaging tooling isn’t really something I want to optimise for, I’d much rather focus on the UX aspects here for the broader audience.

rgommers · February 5, 2023, 12:32pm

This is not a good idea. Virtualenvs on top of conda envs do not work well (worse that pip-installing into a conda env). It’s also not necessary and treating conda envs like system envs is conceptually not quite right. Conda envs are like system envs in terms of what they are able to contain, but much more like virtualenvs than like real system envs in terms of their most important characteristics: they need activation, you can have multiple of them, they have their own lock/requirements files, they’re ephemeral (destroy/recreate rather than updating them often is recommended).

If you want to include conda environments in your picture here, then you could:

rely on the externally managed designator for installers (good idea for the base env at least, and there’s an active discussion on that),
treat non-base conda envs like virtualenvs rather than like system envs, either by special-casing conda envs or by generalizing whatever you do to “user-activated environments” (I’d quite like the latter),
or leave things as they are.

All those options are better than what you are suggesting here.