PEP 668: Marking Python base environments as "externally managed"

Thanks @dstufft, @merwok, @kpfleming, and @pf_moore. I appreciate the time and your willingness to discuss my objections on the container case.

I don’t think I have much hope of convincing base OS image distributors to ignore the PEP’s “Keep the marker file in container images” recommendations. So I’ll probably wait for --break-system-packages rather than deleting EXTERNALLY-MANAGED because the former might be more informative to people reading my Dockerfiles. And as Donald mentioned, switching to a python:* base image may be the cleanest and easiest option in most cases.

1 Like

FWIW I followed up with #debian-python and the folks I talked to there were amenable to making the EXTERNALLY-MANAGED file a configurable option for their python package, that defaulted to ON, but that someone could reconfigure in an installed system to remove it.

I didn’t open an issue or do more than chat with some folks who happened to be around, so I suspect it won’t get done unless someone spends time opening issues and championing for it… but I feel like it’s probably a reasonable request of most distros that are often used in containers to provide something like that if possible.

1 Like

I’m not entirely satisfied with this: it’s usually easier to install Python into a complex base Docker image (eg osgeo/gdal, nvidia/cuda) than to add complexity to a Python Docker image. All Dockerfiles and CI scripts installing Python on these images, and installing the latest pip (etc), will need to be updated (I’m likely to install pip < 23.0 until the --break-system-packages flag is available, or otherwise try to install an earlier version of the Python package which doesn’t write EXTERNALLY-MANAGED).

3 Likes

This must be one of the most lopsided PEPs I’ve seen so far.
Breaking all user installs that don’t use a virtual environment
(which happens to be one of the most wasteful constructs, BTW),
just to convenience “distros” completely misses the end-user
perspective; and, at last count, end-users far outnumbered distros,
although if PEPs like this keep getting accepted in an echo chamber
of core developers and distro maintainers, that might change… :wink:

Just out of curiosity: if distro folks prefer virtualenv, why don’t they
encapsulate their system-managed environment in a venv that’s
auto-entered/left when system commands are invoked?

This PEP’s approach is going to cause a lot of grief for virtually
no benefit to the average Python user.

AND, the real problem is that the most likely action end-users
will take is that of least resistance, i.e. predictably the majority
will simply set PIP_BREAK_SYSTEM_PACKAGES=1 (or, edit
pip.conf to the same effect, as passing that long flag every time
isn’t very convenient). In other words, the original problem this PEP
set out to solve is going to persist…

This wasn’t really added to appease distros. Quite to the contrary,
a number of them only grudgingly package pip and venv at all, and
stick them in separate packages which aren’t installed by default
when you install their Python normal interpreter and stdlib
packages. The usual sentiment, at least on the GNU/Linux
distribution side, is that if you want to go out of your way to
install packages with alternative package tools and overwrite or
otherwise break things you installed from system packages (or vice
versa), then that’s on you. As is often said in open source, if it
breaks you get to keep the pieces.

Yes this gives distributions a way to set a marker in that system
environment saying that installing packages with pip into that
environment is unsafe, but why should their asserting it get under
your skin? If you really want to use pip to install things into the
system context, this PEP gives you as a user several options:

  1. Vote with your feet and choose a distribution which doesn’t mark
    its system environment as externally-managed because it caters to
    users like you who prefer to mix distribution and language
    ecosystem package managers in a single environment without
    isolation. If there’s not one, this is your opportunity to make a
    new distribution that satisfies your needs, after all they’ve
    been created over less.

  2. Thumb your nose at your chosen distro by removing the marker
    file. Depending on your distro you may even be able to set a
    package manager rule that tells it not to put that file back
    (e.g. dpkg-divert on Debian derivative distros). If you’re
    generating container images for example, this is a simple
    one-liner addition to your Dockerfile or whatever. If it’s in a
    CI job and the system is going to be thrown away as soon as your
    tests are done, an rm in a script suffices.

  3. Use one of the multiple options pip provides to tell it to ignore
    the presence of that marker when installing (command line switch,
    envvar). Yes they’re scary-looking, but that’s just to make it
    clear that when you choose to ignore the advice of both your
    distributor and the Python packaging ecosystem, and subsequently
    end up with a tangled mess of metal where your computer used to
    be, you’re fully aware that you’re on the hook to sort it out
    yourself.

6 Likes

Additionally, this PEP is an opt-in mechanism for Linux distributions, who don’t provide Python for a single purpose, and this allows for the distros to unmix those use cases.
https://peps.python.org/pep-0668/#motivation discusses this.

As it stands, you are as much of a user of Python packaging tooling as you are of your Linux distribution. This PEP is primarily a mechanism for Linux distributions to opt-in to behaviours that were necessary to protect the operating system itself when there’s an innocent user mistake, and prevent breakage of core OS tooling.

And, I’ll note that even fairly novice users will see “break system packages” and realise that it’s not exactly a normal thing to do, and that breaking your system is the expected outcome when you do that.

You are not wrong, and we’re cognizant of that.

And, here, I’m going to disagree with you. For starters, it’s well-documented that trying to optimise for an “average” user is not a good way to design things (eg: Jet Cockpits and The End of Average - Partnership for the Future of Learning) and it’s not like we’ve got visibility into every user anyway. Secondly, I don’t think that what you are implying is correct. I think this is, overall, beneficial to the Python + Linux story even if there are going to be migration pains.

Right. And, predictably, that breakage will remain (path of least resistance);
so, the net effect of this PEP is inconvenience to Python users with zero benefit.

A user-friendly solution would protect the OS (and those innocent users who
make mistakes causing breakage) in a transparent way without forcing
users to jump through hoops.

I don’t know… I’ve been using pip for ages and never managed to break
any system packages… :smirk_cat: So, apparenlty, I’m not the “average
user”… maybe designing for me specifically would be a better idea… LOL

That’s fair. Let’s agree to disagree on that one. Time will tell…

Definitely not :slight_smile: There are thousands of stories out there like “All I did was sudo pip install flask and now my system is broken”… because that pulled in a new version of jinja2 or click and broke some distro-provided tool written in Python.

5 Likes

Yes, seriously. This is one of the single biggest user errors we woud get reported on pip.

6 Likes

Put another way, it’s a means of childproofing the tools so that
relative “toddlers” (in a systems administration sense) are less
likely to cut themselves on the sharp edges. Those of us who are
experienced and know the consequences of such choices can still
rather easily take off the training wheels when we want to do ramp
jumps.

1 Like

You keep stating this as if it is a fact but it’s pure supposition on your part, motivated by the fact that you personally are content to risk mucking up your system python. (Good for you, I guess) It’s really not clear at all that most other users would see a warning about a flag like --break-system-packages and conclude “sure, this is a good idea”.

5 Likes

If you, for example, build your own Python parallel to the system Python and put it somewhere that won’t impact on the system (for example, using make altinstall), by my reading nothing changes. It won’t have a marker file, because you didn’t put one there.

If you want to install packages into a Python that is neither a virtual environment nor a separate installation of Python (sans marker file) that didn’t come with the computer, that leaves the one that did come with the computer - i.e., the system Python.

If you are installing things into the system Python then you are, inherently, not using it as intended. That Python is there to help your computer do operating-system-y things, not to provide a way to develop Python code.

If Python installations that came with the computer were intended to be used that way, surely Microsoft would have just arranged to ship one (and, perhaps, find some obnoxious way for Windows Update to bump the minor version under your nose without warning you - after all, newer = better, right?) instead of going through this Windows Store rigamarole. It’s not as if they care about the bloat.

The distro is not “breaking” anything by trying to prevent you from installing into the system Python. It is protecting the integrity of the system Python - so that system scripts don’t have to think about dependency resolution (or whether the APIs they’re using will get deprecated later), and so that the exposure surface is minimized if someone manages to hijack Numpy or something and inject malware.

The end-user perspective is supposed to be that making a virtual environment based off the system Python gives you a Python that you can play with worry-free, because it has been decoupled from anything important that might depend on it in nebulous ways. A quick check on my system shows that it weighs less than 25MB. Numpy weighs more than twice that by itself (30 in the Python part and 35 more in three compiled .so files). I have a “main sandbox” virtual environment with over half a GB of libraries installed.

But actually, that’s misrepresenting it completely. The environment itself weighs almost nothing (less than 16KB on my system of activation scripts, a config file and some symlinks). The weight of a fresh virtual environment is made up almost entirely of pip and setuptools. Which the system Python might not have, and which venv might therefore need to download.

“Protecting the OS” entails protecting against any modification to the site-packages installed in the system Python. Of course, we’re talking about Linux and Python here, so “protection” generally involves still being allowed to do whatever dangerous thing - but you have to jump through hoops. If you weren’t jumping through hoops, and you were allowed to do the thing, we could not say that there was any “protection”.

sudo is one such hoop that is fundamental to Linux (and to the Unices before it). In my distro, there are additional hoops:

  • ensurepip’s bundled pip and setuptools wheels are removed;
  • ensurepip is modified to check whether the system Python was used, and if so put up a lengthy error message explaining the rules that have been laid down;
  • since the data isn’t there by default, you have to get it from the system’s package manager instead (i.e., apt install python3-pip, which in turn is going to ask for sudo).

This represents the system taking over and replacing the default pip install workflow, exactly because that workflow is deemed dangerous. At least to my understanding, any arbitrary piece of the system functionality is permitted to be implemented in a way that requires Python, in turn depending on any arbitrary pre-installed library (whether in the standard library or in something that the distro carefully selected, version-pinned, pre-installed and tested) - without necessarily being documented anywhere - and is thus permitted to break arbitrarily if you try to install, upgrade or downgrade anything. We’re talking about “demons may fly out your nose if you dereference a null pointer” levels of risk here, except now it’s your operating system instead of a random application. Keep in mind that if someone succeeds in putting malware that affects Linux on PyPI and getting downloads, it will be able to wreak havoc from the system Python that would not be possible from a virtual environment, even though the virtual environment is not doing any sandboxing - simply because ordinary system functionality could end up indirectly invoking that code.

And yes, people do sudo pip install all the time despite that they clearly shouldn’t, because there are any number of people out there who are willing to counsel you to do that - simply because it “worked for them” for some problem, that one time. If you’ve been actually modifying your system Python without sudo somehow, well… your distro trusts you a lot more than most would.

Yes, and there have been many times that countless beginners writing code in C let their pointers step outside the bounds of an array, and got the right output from the program anyway.

1 Like

Can we get this confirmed? Also, I use make install rather than altinstall, but it’s still separate from the system Python (/usr/local/bin/python3 separate from /usr/bin/python3).

This is correct. The canonical CPython source is actually entirely unchanged by this PEP, so make install is also unchanged. The PEP only recommends Python redistributors to add the marker file to the compiled Python.

1 Like

Cool, thanks. Will this then remove the scary warning on running pip as root, replacing it instead with the new warning about running pip against the system Python? I’ve always just ignored that warning since it’s my own Python that I’m installing into.

(Though I am also in the “expecting breakage” camp, happily running unreleased Python versions and then seeing magnificent segfaults and gorgeous desynchronization errors as things get updated and the packages haven’t been recompiled. It’s good fun.)

Yes, replacing the warning is one of the main goals. It can already be disabled now: https://github.com/pypa/pip/pull/10990

I’m not sure when (or if) it will be entirely dropped; feel free to open an issue on GitHub to kick off the discussion.

1 Like

Excellent, thanks for the clarity!

Now, that my account got finally unblocked, I’d be happy to correct the inaccuracies here.

First, my prediction (not “supposition”) is motivated by the contents of the overwhelming majority of self-help pages returned by a simple Google search on the topic. I included a sample link in my first post above. And,while I did repeat my prediction a few times, I always decorated with “predictably”, i.e. did not present it “as a fact”.

Second, I’m not at all “content to risk mucking up your system python”; BUT, I do have a few simple rules of hygiene that completely eliminate that risk, which makes me a collateral victim of this PEP. That’s also what “gets under my skin”, as @fungi so aptly put it. BUT, I can see the value in stopping people without such self-imposed rules from shooting themselves in the foot; especially from the perspective of increased support burden. I just wish that this had been done without forcing the same remedy down the throat of those who don’t actually need it… (see more on that in my next post below)

I can also sympathize with the idea that a big scary flag is likely to make some people think twice before they copy and paste the solution from StackOverflow or some other trendy page. I’d still predict a good 50% rate (or higher) of people who will simply turn off the intended protection, but I obviously cannot know, and on that point you are exactly correct. Time will tell how effectively this PEP works its magic.