PEP 704 - Require virtual environments by default for package installers

I’ve been reading through the discussion again prior to another round of updates. Other than requests for clarifying language like ā€œwhat is an installerā€/ā€œwhat is an environmentā€ etc, I’m noticing two things here:

  1. Concerns that UX of tooling isn’t supposed to be a PEP.
  2. Concerns that having a single virtual environment workflow documented as the default is problematic because single virtual environment based workflows don’t cover all use cases/workflows.

For the first… I guess I’m hitting a governance/process issue. I’d figured that we’d want this to be a widely discussed thing that benefits from going through the same framework as a PEP, so why not make this a PEP. And the counter argument of we don’t do PEPs like that is… frustrating but fair. I’m not sure what to do about this. I don’t think that discussing this only on pip’s issue tracker is the right way to go about this because it affects not just pip but also how it interacts with multiple other things! I guess I’m hitting the wall of our process not fitting what I think we need here, and I’ll take that discussion to a separate thread.

For the latter… I know and agree. Nothing is blocking you from having a multiple virtual environment workflow or having a workflow where there’s a centralised management of virtual environments — having a consistent default suggestion isn’t going to block projects that need more complex workflows from continuing to use them.

Regarding conda, if we draw the line as Conda environments are basically system environments rather than Python environments because Conda ships everything, the obvious cororally to that is that it shouldn’t be treated differently and should require virtual environments. Now, it is well known and well documented that conda and pip interoperability isn’t great and that the two operate on different metadata models. This PEP would effectively enforce a clear split between managed-by-conda and managed-by-pip. Now, I’ll admit, with this view, I’m suggesting that we break user workflows — and, that aspect of this PEP should be better clarified; I’ll do so. I do think that enforcing this clear separation between managed-by-pip and managed-by-conda packages will be a good thing.

It’s a balancing act though, and if folks think that we should be doing something different, I’m all ears.


This PEP currently does not require any sort of automatic activation of environments. That will be a stumbling block for people who want a no-extra-steps workflow. However, it also means that we’re changing a subtle failure/thing-that-would-cause-issues-later with an explicit error which will also provide guidance on what to do. Subjectively, I think that’s a better place to be in since consistent errors and clear guidance are better than inconsistent failure modes and difficult to find/locate/apply guidance.

Agreed. No one is saying that you need to create a virtual environment to use Python. The PEP is saying you should be creating a virtual environment for installing and using third-party software; by default. There’s an opt-out for workflows that need it.

Perfect, you’re the exact sort of user persona that I want to have an opt-out for. :slight_smile:

FWIW, this isn’t limited to sciences. :slight_smile:

This is generally what gets recommended for reusable functionality vs business logic, for example.

I noticed that I didn’t clarify when I responded to this earlier: the proposal is that in-tree virtualenvs are good-enough to be a default suggestion while being easy-enough to discover and reason about. The difference is perhaps subtle but important. To be explicit, they’re not universally the best!

I wasn’t sure how to respond to this or whether to let this slide unresponded to. I’m reading this as implying that this is what is happening here and be careful to not do it — if so, IMO that is not correct. Avoiding that from happening is literally why I wanted this to be something that’s not just discussed on pip’s issue tracker.

FWIW, I guess I should clarify that the things that the PEP suggests aren’t ā€œthings someone likes and wants to push on everyoneā€. The whole point of this PEP is to change a workflow expectation: that pip itself can be used outside of virtual environments and it’ll unpack to the user-site or system-site by default. If we don’t want to change that workflow expectation, that’s fine. I don’t have a horse in this race; how a PEP like this changes things for experts who maintain Python or Python’s packaging tooling isn’t really something I want to optimise for, I’d much rather focus on the UX aspects here for the broader audience.

2 Likes

This is not a good idea. Virtualenvs on top of conda envs do not work well (worse that pip-installing into a conda env). It’s also not necessary and treating conda envs like system envs is conceptually not quite right. Conda envs are like system envs in terms of what they are able to contain, but much more like virtualenvs than like real system envs in terms of their most important characteristics: they need activation, you can have multiple of them, they have their own lock/requirements files, they’re ephemeral (destroy/recreate rather than updating them often is recommended).

If you want to include conda environments in your picture here, then you could:

  • rely on the externally managed designator for installers (good idea for the base env at least, and there’s an active discussion on that),
  • treat non-base conda envs like virtualenvs rather than like system envs, either by special-casing conda envs or by generalizing whatever you do to ā€œuser-activated environmentsā€ (I’d quite like the latter),
  • or leave things as they are.

All those options are better than what you are suggesting here.

3 Likes

Fascinating! OK, my understanding based on my limited use, Anaconda | Using Pip in a Conda Environment, frequent communication that pip-directly-in-Conda can (and does!) break conda environments, and the fact that Conda’s pip interoperability is experimental is that we don’t want people mixing them.

I guess you’re saying that we do want them to do that, and should be treating conda’s python differently?

I just tried this (again), and it seemed to work fine. Am I missing something?

āÆ conda create -n py310
āÆ conda activate py310
āÆ conda install python=3.10
[...]
āÆ conda install numpy
[...]
āÆ ./example/bin/pip install numpy       
Requirement already satisfied: numpy in /Users/pradyunsg/miniconda/envs/py310/lib/python3.10/site-packages (1.23.5)
āÆ python -m venv --system-site-packages example
āÆ ./example/bin/python -c "import numpy; print('good')"
good

That sounds like a useful idea to have independently of this issue. But to work well, it would need a standard way to signal ā€œthis is a user-activated environmentā€. Currently, and as part of the language definition, this is done by checking if sys.base_prefix != sys.prefix. I don’t know if conda adheres to this approach, but if it doesn’t, then my first thought would be that they should (I’m taking your comment that conda envs are much more like virtualenvs than system envs as the basis of this view). If it does, then there’s nothing more to be said - we can just use that.

If, for some reason, conda can’t use the difference between sys.prefix and sys.base_prefix the way it’s documented, then IMO we need a new definition, that is agreed between everyone. And that will require conda to explain what they need that the current approach doesn’t do.

If conda does set sys.base_prefix != sys.prefix, what’s going to go wrong if we just say pip can install into such environments, and will need an override to install if sys.base_prefix == sys.prefix? Ultimately, isn’t it just the case that pip/virtualenv and conda interact a bit uncomfortably, and we’re trying to pick where we draw the line, knowing that nothing’s ideal?

That’s my sense of the situation as well!

Poking @jezdez since, well, this sounds like a thing where him wearing both pip maintainer and conda maintainer hats would be useful! :slight_smile:

As a sidenote, the reason for opening the discussion on the former approach was due to the latter approach being initially suggested to @pradyunsg (by me) in order to for this PEP to better interoperate with Conda envs. :slightly_smiling_face:

Yes, it most certainly can and does, to the point where we made our very first Spyder Says video about it:

However, it is necessary for using PyPI-only packages, editable installs and various other scenarios, and can typically be avoided if you are careful, know what you are doing and understand the basic mechanics involved (namely, installing only ā€œtip-of-the-stackā€ packages with pip, and avoiding further updates to the existing env). And if things do go wrong, it is cheap to remove and recreate the env—so long as it is not base.

That 's the reason I originally brought up the above—the real harm is pip install-ing into base, and if that could be prevent by default (with EXTERNALLY-MANGED or this PEP) without interfering with pip install-ing into non-base envs (or maybe adding an opt-out warning) then that would be a big win for Conda users, IMO (considering how many bork their whole Conda install because of it).

The unfortunate truth is that many/most ā€œregularā€ Conda users are students, scientists, engineers, analysts, etc. and not programmers, and thus don’t have the knowledge and experience to know this (because many/most of the people they’re learning from don’t know themselves, because no one taught it to them).

In fact, even the basic concept of what an environment is and why you should always use one is a difficult one to teach for the first time to students who are graduate-level scientists but with limited programming backgrounds—like I struggled with doing for a student just this Friday, who had run into issues with her base environment. This is especially when those students are typically already busy and often come to me tired, frustrated and just want to get their project/research done and go home, for whom Python is but one means to an end.

Which I suppose, circling back, is pretty relevant to think about here—while I remain a strong advocate of teaching users (how to) use environments early and often, due to the countless times I’ve seen users struggle with the aftermath of not doing so, it is equally important that we make them as easy and painless to use as practical if we are requiring them.

Because all that is being tested is just that Conda-installed packages in the outer Conda env can be imported inside it’s Python with the venv when --system-site-packages is set on creation and both envs are activated, in the correct order.

Where things tend to go wrong is everything that comes after: installing further packages in the venv and/or the conda env, attempting to update one or the other, installing duplicate packages in one or the other, activating one but not the other, installing packages with incompatible binary deps, etc., etc. I believe things are less broken than they were in the past, where I’m not even totally sure that the above would work right, but the things that can go wrong in this setup are close to a superset of the issues with installing PyPI packages directly into the Conda env, the limited protections and warnings on the Conda side don’t work, the likelihood of user error is much higher, and problem recovery still requires recreating both (vs. just the Conda env).

Unfortunately, its the former and that check won’t work:

$ python
Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:30:19) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.base_prefix
'C:\\Miniconda3\\envs\\sphinx-5'
>>> sys.prefix
'C:\\Miniconda3\\envs\\sphinx-5'

Per the definitions of sys.prefix and sys.base_prefix in the Python docs, sys.prefix/sys.base_prefix is set at install time to a ā€œvalue for the Python installationā€. Since each Conda environment contains its own install of Python (unlike with a venv), this path is thus naturally that of the Python installed into that environment.

And also, per those definitions, site.py will change those paths when a ā€œvirtual environmentā€ is in use, a term which links to Python’s venv module, and which the site.py code has a bespoke check for specifically virtual environment as created by venv (and similar). Therefore, in both cases, the current behavior seems to be the closest match to what is actually documented, and also the behavior of an unpatched Python installed inside a Conda environment.

That being said, I’m not an expert on any of this. Perhaps @jezdez could comment more.

It’s better not to mix them if you can avoid that, but often you can’t. E.g., niche packages that aren’t packaged for conda-forge, or installs from source (there’s no conda install ., you use pip install .). What @CAM-Gerlach said is all about right. Things have improved since that 2018 blog post you link above - but of course mixing two package managers is more fragile than doing everything with one package manager. When it tends to go wrong is what the very first sentence of that post alludes to: Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times.

These are more up-to-date docs: conda user guide - using pip in an environment.

You can even put PyPI requirements to install with pip into an environment.yml file. Basic example:

# To use (you can use `mamba` instead of `conda` if you want):
#
#   $ conda env create -f environment.yml
#   $ conda activate numpy-dev
#
name: test-env
channels:
  - conda-forge
dependencies:
  - python
  - numpy
  - pandas
  - pip
  - pip:
    - a-pkg-from-pypi

@CAM-Gerlach already explained why that doesn’t hold for conda envs. I’d also say that it’s not a definition of user-activated envs, it’s simply an implementation detail of virtualenvs that’s bubbling up there. Think of conda envs simply as more powerful virtualenvs, that can also contain non-PyPI dependencies and Python itself, while sharing the other properties of virtualenvs (creation of named envs with activation/deactivation).

A user-activated environment should probably be defined by the properties it has. There really are only a few: it’s a full-fledged Python environment that packages can be installed to, after an environment with a given name is created and activated. It could have some new property, like sys.is_user_activated_env, which for virtualenvs could be defined as sys.base_prefix != sys.prefix and for other types of envs in a similar appropriate way.

Probably, yes. The two are quite different though - using pip inside conda envs is fairly common and there are multiple use cases where it’s the right thing to do (I gave two examples above). Virtual environments on the other hand are completely unnecessary when you’re using conda; they offer nothing one needs, and using a virtualenv derived from a conda env would also require the user to do some quite strange double activation (first the conda env, then the virtualenv).

5 Likes

Let’s pursue the angle of having non-base Conda environments set sys.real_prefix PEP 704 – Require virtual environments by default for package installers | peps.python.org then, along with EXTERNALLY-MANAGED in the base environment?

I’d avoid referring to base Conda environment at all, because it’s not at all related to the base of a venv.

The ā€œbaseā€ environment for Conda is more like the hidden pipx environment where conda is installed to. Consider it this way: if conda was rewritten in Rust and the Python dependency was removed, the base environment would go away entirely.

What you want is a marker in the base Conda environment to say ā€œusers have to explicitly opt-in to modify thisā€. And that’s probably the marker that you want in the base Linux distro environments as well. And arguably in every system install by default.

But then for a known single-use install (such as a Docker container, or a temporary CI system, or a Nuget package, or a layout generated from a build), you don’t want that marker, because there’s no reason to discourage those users from installing directly - they’ve essentially already opted into the flag.

So really, is this whole proposal about installer UX in the face of PEP 668? Which seems to be pretty well described in that PEP:

If both of these conditions are true, the installer should exit with an error message indicating that package installation into this Python interpreter’s directory are disabled outside of a virtual environment.

The two tests it describes could be simplified down to ā€œdoes {sys.prefix}/EXTERNALLY-MANAGED exist?ā€ That way it’s entirely up to the distributor whether the file is there, and we know that existing venvs move sys.prefix away from the base install (which is why sys.prefix != sys.base_prefix) and so won’t see the file. A Conda environment can have the file if they want it (it could be distributed in its own conda package, for example).

It could be even simpler for pip, which is going to look in sys.prefix for a conf file, to simply add a configuration option to prevent installs. When using a venv, pip will find a different file (probably none) that doesn’t have the option, so the install will just work. If Conda doesn’t include that file (they won’t), then installs are allowed.

I think we got the problem framing correct in PEP 668. This current proposal is trying to reframe it as ā€œwe need to protect distros from their usersā€, when there’s a perfectly good way for distros to protect themselves.

7 Likes

Indeed. There’s another option that we could take then – to fit Conda’s needs of ā€œdon’t touch a package that Conda installedā€ in pip in addition to the blanket ā€œDon’t install anything hereā€ that we added for Linux distributions.

I think Conda just needs to do is Create separate distro and local directories.

Conda has no concept of the distinction between ā€œdistro-installed packages and […] packages installed by the local system administratorā€ – there’s no distro where conda is coming from, everything is installed by either the user or a local admin. Furthermore, it’s intentional that each environment has one path ($PREFIX) where everything is found.

The closest thing to that distinction (in spirit) might be the conda base environment (which can/should often not be touched, like the distro packages) vs. user-installed environments, but they are completely separate from the POV of path-lookup and package installation.

That said, I think Steve’s example of ā€œdoes {sys.prefix}/EXTERNALLY-MANAGED exist?ā€ would be enough to make things work.

1 Like

Conda is the distro, all it would need to do is have two site-packages directories in the environment, one that conda packages get installed into, one that pip installs into.

This doesn’t really make sense, at least not without a lot more definition of what you mean by ā€œdistroā€ here (if PyPI is also a ā€œdistroā€, then sure, but I’m pretty sure you weren’t implying this :wink: )

Perhaps what you mean is ā€œthe Conda environment and its Conda-installed packages is the distro, and pip-installed packages go into site-packagesā€? Which does make sense,[1] but only actually works if pip is searching both directories and then choosing to install only unsatisfied requirements into its own directory.[2] Without this, it’s just as messy as a venv based on a Conda env would be today.

The best way to make this work really is for Conda to build its packages directly into Lib and not Lib/site-packages, which it can easily do - it’s just how the files are packaged. It also saves having to convince Python to find another search path on startup, which is also way more messy than it should be. But of course, the transition costs are pretty significant here, even if the end result might be smoother in some ways.


  1. Big aside, this was my idea behind PEP 582 inheriting system packages by default - if you start from a Conda env, you want all those packages, and then layer on your app-specific ones in a separate directory. ā†©ļøŽ

  2. Again, part of the 582 workflow I’d imagined. ā†©ļøŽ

Yes that’s what I mean.

How you’re describing it should work is how it works today on distros like Debian that have a directory for apt-get installed packages to get installed into (/usr/lib/.../site-packages/) and a directory for pip installed packages to get installed into (/usr/local/lib/.../site-packages/). That’s exactly what that section that I linked to suggests to do, with what you need to do to make pip prefer the ā€œpipā€ directory.

The linked issue to make this easier to do hasn’t been finished yet, but both Debian and Fedora patch their copies of Python to work as the PEP suggests, and Conda can do the same.

I haven’t tried it recently, but doesn’t pip then ignore anything installed into the apt-installed packages when deciding what to install itself? Or is it already searching all referenced locations and if anything is satisfied from the distro package then it won’t install it again?

The latter.

1 Like

In that case, yeah it would be ~easy for conda to build packages into a different install directory and then package them up (the full install path is inside each package), or to patch CPython so that when pip queries for its install directory it gets a different one.

But I don’t think it would solve things here any better than [the absence of] the magic file/config setting that allows pip to install by default.

In summary, then, I think what’s being said is that there’s no problem for conda, but it’s something conda need to deal with - the tools exist but it’s up to them to implement the solution? Or, to put it another way, the PEP doesn’t need to worry about conda except in the very high level sense of ā€œthey haven’t done their bit yetā€.

Is that right?

I think the tools are already there for Conda to prevent pip from messing with its packages (e.g. omit the RECORD file, add EXTERNALLY-MANAGED to their stdlib directory), and if they want to allow their users to use pip against their own packages without an additional prompt, they can.

The proposal here seems like it would put Conda in the position of blocking pip by default, because a Conda environment doesn’t ā€œlookā€ like a venv, and would force Conda’s users to unblock themselves. I don’t think that’s necessary. Let the distributor decide whether to prevent pip from touching their files, rather than pip deciding it independently.

1 Like