PEP 668: Marking Python base environments as "externally managed"

Agreed. It should probably be a separate topic dedicated to collecting use cases for sysconfig, rather than derailing this thread, though.

1 Like

Adding a cross reference to Linux distro patches to `sysconfig` are changing `pip install --prefix` outside virtual environments since the distro needs for /usr vs /usr/local was discussed there too, and the point of that thread is to figure out/discuss a design for a solution to the problem.

Debian is getting ready to implement this, for Debian 12 (“bookworm”). This will carry into Ubuntu 23.04 (“lunar”) too.

python-pip version 23.0+dfsg-1 includes PEP668 support, upstream. This version warns users (who have apt-listchanges enabled) about the feature.

python3.11 version 3.11.2-1 or 3.11.1-3 (depending on timing) will declare itself to be EXTERNALLY-MANAGED. This version will carry a README explaining the situation.

We are already hearing from concerned users, whose workflows are going to get broken. I’m expecting some more of this. I wish it wasn’t right before our freeze in Debian, but that’s the timing that this worked out at. If necessary, we can roll back EXTERNALLY-MANAGED in our python3.11 for bookworm’s release, but I’d like to make this happen…

3 Likes

Excited to hear that Debian’s adopting this! ^>^

I think the Debian message could likely mention --system-site-packages flag (and equivalents)? That’ll likely resolve the main concern that the user had (sorry, didn’t click through for the replies that might’ve been sent already).

He just followed up himself to say it’s an option, but not a great option.

I’m likely missing context, not being a Debian user, but I would freak out when I read this, if I don’t know what’s actually going on:

Practically, this means that you can’t use pip to install packages outside a virtualenv, on a Debian system, any more.

This sounds like Debian is breaking me, and I will have no choice but to either use virtualenv, or switch system entirely. But in fact I still can install packages outside a virtualenv on a Debian system; I just can’t do that against the Python installation(s) Debian provides. I am not sure if (how) this subtle difference can be significant to certain people, but do wonder whether the message can be tweaked to be less absolute.

1 Like

That’s a fair point, I’ll try to get that across.

A number of my Alpine-edge based Linux container builds are already broken because of this PEP’s somewhat contradictory guidance regarding containers.

The impact of deciding that “if you want to use pip in a container you must use a venv now” adds ~15MB per container image with no additional functionality. The entire busybox image can fit in the venv overhead 4 times over. Here are the Dockerfiles I used and the resulting image sizes (arm64 architecture on 6-Feb-2023):

without venv - 84.2MB

FROM alpine:3
RUN apk add --no-cache python3 py3-pip;
WORKDIR /app
RUN set -eux; \
  pip install requests; \
  pip cache purge
ENTRYPOINT /bin/sh

with venv - 99MB

FROM alpine:3
RUN apk add --no-cache python3 py3-pip;
WORKDIR /app
RUN set -eux; \
  python3 -m venv venv; \
  . venv/bin/activate; \
  pip install requests; \
  pip cache purge
ENTRYPOINT /bin/sh

The question isn’t really about containers, it’s about who manages the Python that you’re installing into. So for instance, the official Python containers, which have a dedicated Python install that isn’t managed by the OSs package manager, should not be marked as externally managed, but rather should be managed by pip etc.

I believe the idea is that there is going to be a flag you can pass to override the externally managed file marker, so you won’t be forced to use a virtual environment. However, system tools rely on system python with system libraries even inside of containers, so you very well may break your system if you’re installing things into your system python using pip.

3 Likes

I understand where you’re coming from here, but I am talking about the specific case of python in containers made by distros like Alpine and Debian. I think we disagree on whether containers are “special”. Some of my concerns are echoed in the PEP:

  1. A distro Python when used in a single-application container image (e.g., a Docker container). In this use case, the risk of breaking system software is lower, since generally only a single application runs in the container, and the impact is lower, since you can rebuild the container and you don’t have to struggle to recover a running machine. There are also a large number of existing Dockerfiles with an unqualified RUN pip install ... statement, etc., and it would be good not to break those. So, builders of base container images may want to ensure that the marker file is not present, even if the underlying OS ships one by default.

So in a way the pep acknowledges the breakage it will cause in the container world but then goes on to recommend the breakage anyway, by saying “Keep the marker file in container images”. We’ve got ~9 years of people working this way generally without significant breakage and if things did break they could always have added venv on their own without outside steering.

I’m arguing that containers are an exceptional case. In the above example Dockerfiles (which aren’t too far from what people do in real images) I installed python immediately before first using pip - so no OS-level tools are there to break, except maybe pip itself. I’m generally not going to use python to shell-out to an os-owned python program.

Anyway if the idea is to always use venv in containers because there could be package conflicts then in addition to the venv overhead we’ll have the os-owned version of some packages in the container as well as the venv-owned version of those same packages – and that’s not a great outcome for a container image author. Wasted space and a larger number of packages that might get flagged in vulnerability scans are established as anti-practices in the container world.

I’m not shipping a stable OS that happens to have an app in it - I’m shipping a packaged app. I think it’s like a race car where you rip out most of the interior so that what remains is optimized for the car’s single function. Anyway I await the --break-system-packages flag making it into the distros that already have EXTERNALLY-MANAGED implemented. I wanted to give some feedback directly instead of having it arrive second hand.

BTW if I’m not mistaken I think this PEP breaks PEP-370 for the majority of python users (who have installed python via a package manger) but maybe I’ve misunderstood.

1 Like

Given that you’re configuring the container, and are freely using root permissions to do so, why not just remove the EXTERNALLY-MANAGED file yourself, before you start running pip? If you’re asserting control over the full system stack in the container, you’re entirely within your rights to do that.

1 Like

In addition, you could request that the creators of the base image that you are using do that when they create their image, so that consumers of that image don’t need to deal with this at all.

2 Likes

The quoted passage recommends the opposite! It says that even if Debian is shipping the marker file, the debian docker image could remove it to avoid the issues.

2 Likes

I’ll point out that the PEP doesn’t actually require that EXTERNALLY-MANAGED be used in any specific case, it just defines what happens if that file exists. There is a section that is explicitly marked as non-normative where the PEP offers some recommendations at what the PEP authors think would be best practices in varying conditions, but distros are free to ignore those recommendations if it makes sense.

For the container use case, you’re using a container that doesn’t ship with Python, but that you’re ultimately installing it yourself through the package manager. There’s not a good way for the package to differentiate between your installation that is “safe”, and a standard installation that includes several tools that depend on system packages where it is unsafe.

You’re ripping out most of the interior sure, but you’re also “buying” (downloading) normal street car parts (the apt install python3 package) and expecting them to be satisfactory for your race car use case out of the box.

I think the recommendations in the PEP are still the right thing to do by default here, but if you’re willing to take the functionality of your container image into your own hands, there are options for you to do that:

  1. Delete the EXTERNALLY-MANAGED file.
  2. Use the --break-system-packages flag once it’s available.
  3. Ask the OS distributors to provide a way to configure the python package to omit the EXTERNALLY-MANAGED file.
3 Likes

Thanks @dstufft, @merwok, @kpfleming, and @pf_moore. I appreciate the time and your willingness to discuss my objections on the container case.

I don’t think I have much hope of convincing base OS image distributors to ignore the PEP’s “Keep the marker file in container images” recommendations. So I’ll probably wait for --break-system-packages rather than deleting EXTERNALLY-MANAGED because the former might be more informative to people reading my Dockerfiles. And as Donald mentioned, switching to a python:* base image may be the cleanest and easiest option in most cases.

1 Like

FWIW I followed up with #debian-python and the folks I talked to there were amenable to making the EXTERNALLY-MANAGED file a configurable option for their python package, that defaulted to ON, but that someone could reconfigure in an installed system to remove it.

I didn’t open an issue or do more than chat with some folks who happened to be around, so I suspect it won’t get done unless someone spends time opening issues and championing for it… but I feel like it’s probably a reasonable request of most distros that are often used in containers to provide something like that if possible.

1 Like

I’m not entirely satisfied with this: it’s usually easier to install Python into a complex base Docker image (eg osgeo/gdal, nvidia/cuda) than to add complexity to a Python Docker image. All Dockerfiles and CI scripts installing Python on these images, and installing the latest pip (etc), will need to be updated (I’m likely to install pip < 23.0 until the --break-system-packages flag is available, or otherwise try to install an earlier version of the Python package which doesn’t write EXTERNALLY-MANAGED).

3 Likes

This must be one of the most lopsided PEPs I’ve seen so far.
Breaking all user installs that don’t use a virtual environment
(which happens to be one of the most wasteful constructs, BTW),
just to convenience “distros” completely misses the end-user
perspective; and, at last count, end-users far outnumbered distros,
although if PEPs like this keep getting accepted in an echo chamber
of core developers and distro maintainers, that might change… :wink:

Just out of curiosity: if distro folks prefer virtualenv, why don’t they
encapsulate their system-managed environment in a venv that’s
auto-entered/left when system commands are invoked?

This PEP’s approach is going to cause a lot of grief for virtually
no benefit to the average Python user.

AND, the real problem is that the most likely action end-users
will take is that of least resistance, i.e. predictably the majority
will simply set PIP_BREAK_SYSTEM_PACKAGES=1 (or, edit
pip.conf to the same effect, as passing that long flag every time
isn’t very convenient). In other words, the original problem this PEP
set out to solve is going to persist…

This wasn’t really added to appease distros. Quite to the contrary,
a number of them only grudgingly package pip and venv at all, and
stick them in separate packages which aren’t installed by default
when you install their Python normal interpreter and stdlib
packages. The usual sentiment, at least on the GNU/Linux
distribution side, is that if you want to go out of your way to
install packages with alternative package tools and overwrite or
otherwise break things you installed from system packages (or vice
versa), then that’s on you. As is often said in open source, if it
breaks you get to keep the pieces.

Yes this gives distributions a way to set a marker in that system
environment saying that installing packages with pip into that
environment is unsafe, but why should their asserting it get under
your skin? If you really want to use pip to install things into the
system context, this PEP gives you as a user several options:

  1. Vote with your feet and choose a distribution which doesn’t mark
    its system environment as externally-managed because it caters to
    users like you who prefer to mix distribution and language
    ecosystem package managers in a single environment without
    isolation. If there’s not one, this is your opportunity to make a
    new distribution that satisfies your needs, after all they’ve
    been created over less.

  2. Thumb your nose at your chosen distro by removing the marker
    file. Depending on your distro you may even be able to set a
    package manager rule that tells it not to put that file back
    (e.g. dpkg-divert on Debian derivative distros). If you’re
    generating container images for example, this is a simple
    one-liner addition to your Dockerfile or whatever. If it’s in a
    CI job and the system is going to be thrown away as soon as your
    tests are done, an rm in a script suffices.

  3. Use one of the multiple options pip provides to tell it to ignore
    the presence of that marker when installing (command line switch,
    envvar). Yes they’re scary-looking, but that’s just to make it
    clear that when you choose to ignore the advice of both your
    distributor and the Python packaging ecosystem, and subsequently
    end up with a tangled mess of metal where your computer used to
    be, you’re fully aware that you’re on the hook to sort it out
    yourself.

6 Likes

Additionally, this PEP is an opt-in mechanism for Linux distributions, who don’t provide Python for a single purpose, and this allows for the distros to unmix those use cases.
https://peps.python.org/pep-0668/#motivation discusses this.

As it stands, you are as much of a user of Python packaging tooling as you are of your Linux distribution. This PEP is primarily a mechanism for Linux distributions to opt-in to behaviours that were necessary to protect the operating system itself when there’s an innocent user mistake, and prevent breakage of core OS tooling.