PEP 704 - Require virtual environments by default for package installers

pradyunsg · February 5, 2023, 1:10pm

Fascinating! OK, my understanding based on my limited use, Anaconda | Using Pip in a Conda Environment, frequent communication that pip-directly-in-Conda can (and does!) break conda environments, and the fact that Conda’s pip interoperability is experimental is that we don’t want people mixing them.

I guess you’re saying that we do want them to do that, and should be treating conda’s python differently?

I just tried this (again), and it seemed to work fine. Am I missing something?

❯ conda create -n py310
❯ conda activate py310
❯ conda install python=3.10
[...]
❯ conda install numpy
[...]
❯ ./example/bin/pip install numpy       
Requirement already satisfied: numpy in /Users/pradyunsg/miniconda/envs/py310/lib/python3.10/site-packages (1.23.5)
❯ python -m venv --system-site-packages example
❯ ./example/bin/python -c "import numpy; print('good')"
good

pf_moore · February 5, 2023, 3:02pm

That sounds like a useful idea to have independently of this issue. But to work well, it would need a standard way to signal “this is a user-activated environment”. Currently, and as part of the language definition, this is done by checking if sys.base_prefix != sys.prefix. I don’t know if conda adheres to this approach, but if it doesn’t, then my first thought would be that they should (I’m taking your comment that conda envs are much more like virtualenvs than system envs as the basis of this view). If it does, then there’s nothing more to be said - we can just use that.

If, for some reason, conda can’t use the difference between sys.prefix and sys.base_prefix the way it’s documented, then IMO we need a new definition, that is agreed between everyone. And that will require conda to explain what they need that the current approach doesn’t do.

If conda does set sys.base_prefix != sys.prefix, what’s going to go wrong if we just say pip can install into such environments, and will need an override to install if sys.base_prefix == sys.prefix? Ultimately, isn’t it just the case that pip/virtualenv and conda interact a bit uncomfortably, and we’re trying to pick where we draw the line, knowing that nothing’s ideal?

pradyunsg · February 5, 2023, 3:22pm

That’s my sense of the situation as well!

Poking @jezdez since, well, this sounds like a thing where him wearing both pip maintainer and conda maintainer hats would be useful!

CAM-Gerlach · February 5, 2023, 8:42pm

As a sidenote, the reason for opening the discussion on the former approach was due to the latter approach being initially suggested to @pradyunsg (by me) in order to for this PEP to better interoperate with Conda envs.

Yes, it most certainly can and does, to the point where we made our very first Spyder Says video about it:

However, it is necessary for using PyPI-only packages, editable installs and various other scenarios, and can typically be avoided if you are careful, know what you are doing and understand the basic mechanics involved (namely, installing only “tip-of-the-stack” packages with pip, and avoiding further updates to the existing env). And if things do go wrong, it is cheap to remove and recreate the env—so long as it is not base.

That 's the reason I originally brought up the above—the real harm is pip install-ing into base, and if that could be prevent by default (with EXTERNALLY-MANGED or this PEP) without interfering with pip install-ing into non-base envs (or maybe adding an opt-out warning) then that would be a big win for Conda users, IMO (considering how many bork their whole Conda install because of it).

The unfortunate truth is that many/most “regular” Conda users are students, scientists, engineers, analysts, etc. and not programmers, and thus don’t have the knowledge and experience to know this (because many/most of the people they’re learning from don’t know themselves, because no one taught it to them).

In fact, even the basic concept of what an environment is and why you should always use one is a difficult one to teach for the first time to students who are graduate-level scientists but with limited programming backgrounds—like I struggled with doing for a student just this Friday, who had run into issues with her base environment. This is especially when those students are typically already busy and often come to me tired, frustrated and just want to get their project/research done and go home, for whom Python is but one means to an end.

Which I suppose, circling back, is pretty relevant to think about here—while I remain a strong advocate of teaching users (how to) use environments early and often, due to the countless times I’ve seen users struggle with the aftermath of not doing so, it is equally important that we make them as easy and painless to use as practical if we are requiring them.

Because all that is being tested is just that Conda-installed packages in the outer Conda env can be imported inside it’s Python with the venv when --system-site-packages is set on creation and both envs are activated, in the correct order.

Where things tend to go wrong is everything that comes after: installing further packages in the venv and/or the conda env, attempting to update one or the other, installing duplicate packages in one or the other, activating one but not the other, installing packages with incompatible binary deps, etc., etc. I believe things are less broken than they were in the past, where I’m not even totally sure that the above would work right, but the things that can go wrong in this setup are close to a superset of the issues with installing PyPI packages directly into the Conda env, the limited protections and warnings on the Conda side don’t work, the likelihood of user error is much higher, and problem recovery still requires recreating both (vs. just the Conda env).

Unfortunately, its the former and that check won’t work:

$ python
Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:30:19) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.base_prefix
'C:\\Miniconda3\\envs\\sphinx-5'
>>> sys.prefix
'C:\\Miniconda3\\envs\\sphinx-5'

Per the definitions of sys.prefix and sys.base_prefix in the Python docs, sys.prefix/sys.base_prefix is set at install time to a “value for the Python installation”. Since each Conda environment contains its own install of Python (unlike with a venv), this path is thus naturally that of the Python installed into that environment.

And also, per those definitions, site.py will change those paths when a “virtual environment” is in use, a term which links to Python’s venv module, and which the site.py code has a bespoke check for specifically virtual environment as created by venv (and similar). Therefore, in both cases, the current behavior seems to be the closest match to what is actually documented, and also the behavior of an unpatched Python installed inside a Conda environment.

That being said, I’m not an expert on any of this. Perhaps @jezdez could comment more.

rgommers · February 5, 2023, 11:09pm

It’s better not to mix them if you can avoid that, but often you can’t. E.g., niche packages that aren’t packaged for conda-forge, or installs from source (there’s no conda install ., you use pip install .). What @CAM-Gerlach said is all about right. Things have improved since that 2018 blog post you link above - but of course mixing two package managers is more fragile than doing everything with one package manager. When it tends to go wrong is what the very first sentence of that post alludes to: Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times.

These are more up-to-date docs: conda user guide - using pip in an environment.

You can even put PyPI requirements to install with pip into an environment.yml file. Basic example:

# To use (you can use `mamba` instead of `conda` if you want):
#
#   $ conda env create -f environment.yml
#   $ conda activate numpy-dev
#
name: test-env
channels:
  - conda-forge
dependencies:
  - python
  - numpy
  - pandas
  - pip
  - pip:
    - a-pkg-from-pypi

@CAM-Gerlach already explained why that doesn’t hold for conda envs. I’d also say that it’s not a definition of user-activated envs, it’s simply an implementation detail of virtualenvs that’s bubbling up there. Think of conda envs simply as more powerful virtualenvs, that can also contain non-PyPI dependencies and Python itself, while sharing the other properties of virtualenvs (creation of named envs with activation/deactivation).

A user-activated environment should probably be defined by the properties it has. There really are only a few: it’s a full-fledged Python environment that packages can be installed to, after an environment with a given name is created and activated. It could have some new property, like sys.is_user_activated_env, which for virtualenvs could be defined as sys.base_prefix != sys.prefix and for other types of envs in a similar appropriate way.

Probably, yes. The two are quite different though - using pip inside conda envs is fairly common and there are multiple use cases where it’s the right thing to do (I gave two examples above). Virtual environments on the other hand are completely unnecessary when you’re using conda; they offer nothing one needs, and using a virtualenv derived from a conda env would also require the user to do some quite strange double activation (first the conda env, then the virtualenv).

pradyunsg · February 6, 2023, 8:56am

Let’s pursue the angle of having non-base Conda environments set sys.real_prefix PEP 704 – Require virtual environments by default for package installers | peps.python.org then, along with EXTERNALLY-MANAGED in the base environment?

steve.dower · February 6, 2023, 12:34pm

I’d avoid referring to base Conda environment at all, because it’s not at all related to the base of a venv.

The “base” environment for Conda is more like the hidden pipx environment where conda is installed to. Consider it this way: if conda was rewritten in Rust and the Python dependency was removed, the base environment would go away entirely.

What you want is a marker in the base Conda environment to say “users have to explicitly opt-in to modify this”. And that’s probably the marker that you want in the base Linux distro environments as well. And arguably in every system install by default.

But then for a known single-use install (such as a Docker container, or a temporary CI system, or a Nuget package, or a layout generated from a build), you don’t want that marker, because there’s no reason to discourage those users from installing directly - they’ve essentially already opted into the flag.

So really, is this whole proposal about installer UX in the face of PEP 668? Which seems to be pretty well described in that PEP:

If both of these conditions are true, the installer should exit with an error message indicating that package installation into this Python interpreter’s directory are disabled outside of a virtual environment.

The two tests it describes could be simplified down to “does {sys.prefix}/EXTERNALLY-MANAGED exist?” That way it’s entirely up to the distributor whether the file is there, and we know that existing venvs move sys.prefix away from the base install (which is why sys.prefix != sys.base_prefix) and so won’t see the file. A Conda environment can have the file if they want it (it could be distributed in its own conda package, for example).

It could be even simpler for pip, which is going to look in sys.prefix for a conf file, to simply add a configuration option to prevent installs. When using a venv, pip will find a different file (probably none) that doesn’t have the option, so the install will just work. If Conda doesn’t include that file (they won’t), then installs are allowed.

I think we got the problem framing correct in PEP 668. This current proposal is trying to reframe it as “we need to protect distros from their users”, when there’s a perfectly good way for distros to protect themselves.

pradyunsg · February 7, 2023, 8:18am

Indeed. There’s another option that we could take then – to fit Conda’s needs of “don’t touch a package that Conda installed” in pip in addition to the blanket “Don’t install anything here” that we added for Linux distributions.

dstufft · February 7, 2023, 5:32pm

I think Conda just needs to do is Create separate distro and local directories.

h-vetinari · February 8, 2023, 12:03am

Conda has no concept of the distinction between “distro-installed packages and […] packages installed by the local system administrator” – there’s no distro where conda is coming from, everything is installed by either the user or a local admin. Furthermore, it’s intentional that each environment has one path ($PREFIX) where everything is found.

The closest thing to that distinction (in spirit) might be the conda base environment (which can/should often not be touched, like the distro packages) vs. user-installed environments, but they are completely separate from the POV of path-lookup and package installation.

That said, I think Steve’s example of “does {sys.prefix}/EXTERNALLY-MANAGED exist?” would be enough to make things work.

dstufft · February 8, 2023, 1:48am

Conda is the distro, all it would need to do is have two site-packages directories in the environment, one that conda packages get installed into, one that pip installs into.

steve.dower · February 8, 2023, 2:06pm

This doesn’t really make sense, at least not without a lot more definition of what you mean by “distro” here (if PyPI is also a “distro”, then sure, but I’m pretty sure you weren’t implying this )

Perhaps what you mean is “the Conda environment and its Conda-installed packages is the distro, and pip-installed packages go into site-packages”? Which does make sense,^[1] but only actually works if pip is searching both directories and then choosing to install only unsatisfied requirements into its own directory.^[2] Without this, it’s just as messy as a venv based on a Conda env would be today.

The best way to make this work really is for Conda to build its packages directly into Lib and not Lib/site-packages, which it can easily do - it’s just how the files are packaged. It also saves having to convince Python to find another search path on startup, which is also way more messy than it should be. But of course, the transition costs are pretty significant here, even if the end result might be smoother in some ways.

Big aside, this was my idea behind PEP 582 inheriting system packages by default - if you start from a Conda env, you want all those packages, and then layer on your app-specific ones in a separate directory. ↩︎
Again, part of the 582 workflow I’d imagined. ↩︎

dstufft · February 8, 2023, 2:25pm

Yes that’s what I mean.

How you’re describing it should work is how it works today on distros like Debian that have a directory for apt-get installed packages to get installed into (/usr/lib/.../site-packages/) and a directory for pip installed packages to get installed into (/usr/local/lib/.../site-packages/). That’s exactly what that section that I linked to suggests to do, with what you need to do to make pip prefer the “pip” directory.

The linked issue to make this easier to do hasn’t been finished yet, but both Debian and Fedora patch their copies of Python to work as the PEP suggests, and Conda can do the same.

steve.dower · February 8, 2023, 2:27pm

I haven’t tried it recently, but doesn’t pip then ignore anything installed into the apt-installed packages when deciding what to install itself? Or is it already searching all referenced locations and if anything is satisfied from the distro package then it won’t install it again?

dstufft · February 8, 2023, 2:31pm

The latter.

steve.dower · February 8, 2023, 2:43pm

In that case, yeah it would be ~easy for conda to build packages into a different install directory and then package them up (the full install path is inside each package), or to patch CPython so that when pip queries for its install directory it gets a different one.

But I don’t think it would solve things here any better than [the absence of] the magic file/config setting that allows pip to install by default.

pf_moore · February 8, 2023, 3:02pm

In summary, then, I think what’s being said is that there’s no problem for conda, but it’s something conda need to deal with - the tools exist but it’s up to them to implement the solution? Or, to put it another way, the PEP doesn’t need to worry about conda except in the very high level sense of “they haven’t done their bit yet”.

Is that right?

steve.dower · February 8, 2023, 3:20pm

I think the tools are already there for Conda to prevent pip from messing with its packages (e.g. omit the RECORD file, add EXTERNALLY-MANAGED to their stdlib directory), and if they want to allow their users to use pip against their own packages without an additional prompt, they can.

The proposal here seems like it would put Conda in the position of blocking pip by default, because a Conda environment doesn’t “look” like a venv, and would force Conda’s users to unblock themselves. I don’t think that’s necessary. Let the distributor decide whether to prevent pip from touching their files, rather than pip deciding it independently.

dstufft · February 8, 2023, 4:08pm

All the tools currently exist (though some of them still require patching Python itself until Allow Python distributors to add custom site install schemes · Issue #88142 · python/cpython · GitHub is solved) for someone who is distributing Python to select one of the following behaviors:

Allow pip to freely manage Python packages, including uninstalling packages that some other system has installed.
- This requires doing nothing special, it’s the default you get from Python.
Allow pip and another package manager to safely ^[1] cooperate on managing a Python install such that they each install into their own directories, and do not trample over each other.
- This requires the custom site install scheme thing from above, which Debian , Fedora, etc carry patches to implement it until Python itself provides functionality for it.
Disallow pip (by default) from touching their installation, marking it as something that pip shouldn’t be operating on unless you really know what you’re doing.

In the above, (2) and (3) can be combined so that pip won’t touch their Python by default, but if they do it anyways, it will be done in a safe way.

For Debian, Fedora, etc they should implement (2) and (3), Conda should probably only implement (2).

Safely at the file system level, of course if you’re installing packages you may break Python applications by installing versions of a dependency they don’t expect. ↩︎

pf_moore · February 8, 2023, 5:15pm

Thanks. So the PEP should probably note this, and say that the proposed change will have an impact on conda users unless the conda developers take the recommended actions. I don’t think there’s a need to add another mechanism just so that conda don’t have to do anything at their end. The impact on users isn’t fatal, the PEP provides for an “opt out” flag (and I’m sure pip will implement that), so it’s an inconvenience rather than a showstopper.

There’s also the question of how actively the conda developers follow this sort of discussion, and as a result whether the changes proposed are going to take them by surprise. I’m pretty sure someone has already pinged @jezdez in this thread, but I just did again in any case

And of course, it’s not a foregone conclusion that the PEP will be accepted - whether because of this or for some other reason.