Graceful cooperation between external and Python package managers (PEP 668)

At the “Linux in Distros” sprint at PyCon US in May, we drafted a PEP about making external package managers like apt/dnf/etc. and Python-specific package managers like pip play more nicely together. This includes the “sudo pip install” problem, but it’s a little more general than that.

The short version is it has two recommendations:

  1. When a distro indicates it’s managing a Python installation, tools like pip should only install into a virtualenv, by default (with a way to override it), and show an error message that the distro can customize.

  2. Distros should have two site-packages directories, one for distro-packaged files and one for local-sysadmin-installed files (e.g., /usr/lib/python3.x/site-packags vs. /usr/local/lib/python3.x/site-packages), and tools like pip should only create, delete, or modify files in the latter directory.

Please see PEP 668 [will update the link once published] and let us know what you think. The PEP has an extended rationale for these recommendations and discusses a couple of alternative approaches.

This has been previously discussed on linux-sig - thanks to all the folks who provided feedback on the draft!

11 Likes

This looks really nice! One small ask: in the Use Cases table, please refer to a ‘single-application container’ instead of a ‘Docker container’. The remainder of the text does this, just the table specifically refers to Docker and there are far more container-image tools than Docker out there :slight_smile:

2 Likes

I didn’t know this was a thing. I should probably share my trials and tribulations trying to wrap Python packages for the Chocolatey package manager (the Windows apt-like package manager). Is there some place where I should post a report or present on it or whatever?

1 Like

A couple of questions:

  • Anyone have feedback beyond what @kpfleming said above regarding Docker → container?
  • Should @dstufft or @pf_moore be the PEP delegate on this? Or should this be put up to the SC to delegate or decide on?

As a Homebrew user, I don’t care for recommendation 1 (aside from the fact that I feel user installs should be allowed; I didn’t see how/where the actual PEP addresses that). I always pip install packages system-wide, and if I were to use brew to install a Python package, Homebrew would install it inside a virtualenv pipx-style so that it wouldn’t interfere with pip’s operation. (At least, I think that’s Homebrew policy. I recently found that brew installs Mercurial as though it were installed with pip; not sure what’s up with that.)

I’m willing to be told I’m in the minority on this, but I won’t like it if both system-wide and user installs become discouraged.

1 Like

Regarding PEP Delegate, I don’t think this needs to go to the SC, as it doesn’t involve any changes to Python itself (assuming I didn’t miss anything in my brief skim of the PEP!)

I’m willing to be PEP delegate if necessary (it sort of feels like “interoperability” to me, I guess) but I’d have no objection if someone else wanted to offer to take on the role. I don’t feel that I have any special expertise here.

One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there’s only a couple of messages here. I’m not convinced that “silence means approval” is sufficient here, it’s difficult to be sure where interested parties hang out, so silence seems far more likely to imply “wasn’t aware of the proposal” in this case. In fact, I’d suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made.

1 Like

It wouldn’t be me, it’d either be Paul (probably?) or the SC.

To be clear, these are recommendations for distros, not for the general Python world. It would be up to Homebrew to decide which of these recommendations make sense for them. If they’ve already taken the approach that the default Python installation area is for users and they’re putting their installed tools into virtual environments, then they might continue to do that and ignore that recommendation (or they might decide to change!).

The PEP doesn’t mandate anything in that recommendations section, that just exists for documenting how a distro can, we think, best interact with the Python world given “typical” constraints given by a distro.

The meat of what this PEP actually specifies can be summarized into:

  • Python installers should not modify file paths outside of the target scheme they are targeting (so if you’re set --user, don’t modify the global one, if you’re modifying the global one, don’t start uninstalling stuff that happens to be on random paths in sys.path.
  • Python installers should look for the EXTERNALLY-MANAGED managed file in it’s defined location, and if it exists (unless we’re inside of a virtual environment) should refuse to run without some additional flag or confirmation mechanism to override that as a “are you sure, you’re possibly breaking your system here”.

Everything else is justifications, deep dives into how these two changes will affect different scenarios, recommendations that we felt are good “do this, unless you have a good reason” defaults for distros, and what other alternatives we explored.

If a distro doesn’t add the EXTERNALLY-MANAGED file, very little should change. The edge case here is if the distro is patching Python or otherwise causing additional paths to show up on sys.path where pip would previously uninstall stuff from, but under this PEP, the first item I listed means it would no longer touch those files. I think this is a better default anyways as it’s far less likely to randomly break things, but it is potentially a minor backwards incompatibility.

1 Like

Some addition to what Donald has already said.

Packages installed in user-site have about the same possibility to break system-wide packages as those in the system-site, so the PEP’s recommendation is to also block user-site installs by default.

I mentioned the PEP’s existence to some of the Homebrew maintainers before it was published. Homebrew does not currently have any concrete policies around third-party Python packages; most of them use a virtualenv-style installation because a) that works for most packages and b) most formula authors just copy the approach from an existing formula :stuck_out_tongue: This PEP will likely only slightly impact Homebrew (if they are to adopt it), but will likely have no negative impact to it. (It would impact users that pip install stuffs directly against the Homebrew Python, of course, but my impression is Homebrew maintainers don’t seem very concerned either way—not about that breaking Homebrew packages, nor if they’re no longer able to do it in the future.)

On the Homebrew topic, I hope people are familiar with this: Homebrew Python Is Not For You // Justin Mayer // Python • Infosec • Maintainer • Speaker

I am one of the maintainers of the Python interpreter in Fedora, Red Hat Enterprise Linux (RHEL), and CentOS. My main focus is on Fedora, but Fedora eventually defines what’s it gonna be like in RHEL/CentOS. I’ve been selected to summarize our feedback.

tl;dr we want to participate as much as possible, but we have some small concerns and we want to test it out in practice before it is approved

What we currently do

This also applies to Python 3.6+ in RHEL 7, 8, and future 9 as well as the appropriate CentOS Linux/Stream releases.

  1. We patch distutils, as is mentioned in the PEP. The patch makes sure that we install to /usr/lib(64)/python3.X/site-packages when we create RPM packages, but to /usr/local/lib(64)/python3.X/site-packages for other use cases. The patch is indeed conceptually the sort of hook envisioned by bpo-43976, except implemented as a code patch to distutils instead of as a changed sysconfig scheme. That difference is quite important, stay tuned.

  2. We also patch pip to prevent uninstallation from /usr/lib(64)/python3.X/site-packages when it is upgrading to /usr/local/lib(64)/python3.X/site-packages. This is conceptually what the PEP describes in the Writing to only the target sysconfig scheme, except very hacky and specific to our schemes. I am glad that the PEP addresses this, thank you!

  3. Our modern Python packaging macros use pip to install packages and we remove the RECORD file, as well as set the INSTALLER to rpm. This works nicely to prevent pip uninstalling packages, but will still fail if the user attempts an upgrade if we didn’t have (2). Also, the majority of Python RPM packages in Fedora still use the old macros that install with setup.py install and have egg-info instead of dist-info. Having the marker helps us to address this distro-wide and display a specific actionable error message without patching pip.

  4. Our patch from (1) also patches the site module to include /usr/local... if Python is invoked without -s and we encourage our packagers to use #!/usr/bin/python3 -s in shebangs. Our macros from (3) do that automatically. But some packages need to explicitly see Python packages installed in /usr/local... (e.g. pip-installed plugins) and they cannot use -s.

What problems do we face

  1. Distutils are deprecated and our patch patches distutils instead of sysconfig. Once distutils are gone, we will no longer be able to patch them in the standard library and we would need to apply patches to setuptools/pip instead which is problematic because users tend to upgrade to newer versions of pip/setuptools from PyPI. We have discussed this at length with interested parties (namely @jaraco, @pradyunsg, and a bit with @doko42 and @steve.dower) – a plan emerged and was summarized in PEP 632 discussion.
    We planned to redo our patch in Python 3.10+ to patch sysconfig instead. See our proof of concept as a GitHub pull request. We have submitted several patches to CPython upstream, namely merging distutils.sysconfig into sysconfig and making distutils load install schemes from sysconfig.
    When everything was ready upstream, we have attempted to update our patch to use distutils only to realize there are unexpected consequences – namely the distutils patch was quite limited in its scope and impact (package installation) while the sysconfig one leaked in many different places. Currently, we know how to solve some of the issues we found, but we have not finished this in time for Fedora 35 which introduced Python 3.10 due to other urgent tasks and we plan to continue with Python 3.11.

  2. Pip requires patching. That means once the user does pip install -U pip, the patch is gone. And the next invocation of pip install -U setuptools might as well wipe the RPM-installed setuptools (making it invisible for system tools with #!/usr/bin/python3 -s shebangs, which was a disaster until recently, because entry_point console scripts used to import from pkg_resources (part of setuptools), but even now is a big problem). We were unable to solve this properly and once again, I am very glad that this PEP solves it.

  3. Many people did not understand/like the /usr/local... change when it was introduced, as well as they don’t really understand/like the RECORD-less installations, see for example one recent bug that was reported because get-pip.py is unable to uninstall system-installed pip.

PEP 668 in Fedora

We want to implement this PEP in Fedora, as it will allow us to drop most of our patches (or do them in an upstream-supported way) and hopefully solve the problems. However, our implementation heavily depends on bpo-43976 which is kinda mentioned in the PEP but is not described in much detail. As with our previous attempts to do this, we expect the devil to be in the details and we would like to have a working proof of concept before this PEP is approved. Making bpo-43976 work in Fedora will be my priority for upcoming months, but unfortunatelly we still have some details to figure out (such as the -s behavior). Ideally, we would play with the implementation for a while before this PEP is approved. Is that acceptable to you? A draft pull request for pip and Python would be really helpful.

Our concerns in the current draft

  1. The PEP recommends removing the marker file in container images. It specifically mentions dpkg-based systems, however, this will be really tricky in RPM-based systems. We don’t control all places where Fedora creates container images to be able to explicitly rm that marker. We don’t want to remove it in the post-installation scriptlet, because there is no “configuration flag” here: the installation of packages is not interactive. We could possibly package that marker as a separate package, marking it Recommended by Python, but that only makes it even harder for us to influence what installations will (not) have it.
    We would prefer if the marker is recommended to be installed by distributors even in base container images and people who build on top of those images would explicitly remove it if they want to use pip in a way that conflicts with the purpose of that marker.
  2. The PEP recommends that we make Python require pipx. That is extremely unlikely to happen. Our Python does not even require a system-installed pip. Our Python has venv working out of the box as well as ensurepip because we explicitly want to avoid problems of that thing being broken. pipx is not part of the standard library and we don’t want to pull it into systems that only have Python installed as a dependency of e.g. dnf. There is a certain compromise between “Python is too bloated” and “standard things work out of the box” and requiring pipx is way too far.
  3. bpo-43976 is not part of this PEP’s specification.

Thank you for working on this and trying to standardize things! You rock.

I think the idea we had was that prior to having the built in mechanism for extending sysconfig, distros could continue to, or start patching sysconfig until the upstream supported patch was finished (and if you wanted that on older Pythons you would have to backport it anyways).

We also have the provisional state where we can accept an idea in theory with the details we think are going to work, and then continue to do refinements as we get more real world use or implementation discovered issues.

Ultimately though, from my POV distro packagers like yourself are the main “customer” served by this PEP (though it of course makes things better for pretty much everyone who touches a distro or uses a distro or deveops software that gets packaged for a distro), so I’m personally more than happy to move at whatever pace distro packagers want to move on this change. The other authors might feel differently I dunno.

Note that this is part of a non normative section and represents what we thought would likely arise as best practices, but which are not required in any way. The goal of that was that container images are kind of weird in that their somewhat emphereal nature means that the breakages this attempts to avoid are lessened, and we didn’t want to just suddenly break all docker users who are using a system installed python + pip.

I think it would be entirely fine for Fedora to make it’s own recommendation here that better matches with how that system works. I also think it would be fine for us to update those recommendations as we get more real world experience with any unforseen side effects or interactions and what the best way to avoid or mitigate them end up being.

Like above, this is part of the non normative general recommendations section, which you should feel free to ignore. I don’t think the intention here is that you can’t have python packaged & installed without getting pipx (or similiar). But rather it’s an attempt to recognize that users are less likely to reach for a hypothetical “just do what I said, even if it breaks my system” switch in pip (or whatever installer they are using) if they already have another alternative that they can be recommended to use at hand.

Thus “consider packaging pipx or similar and incorporating that into the error message” is the biggest point for that section IMO. Then the installing it by default thing is just further paving the cowpath to getting users to use it. I will note that the recommendation is for that is to install it for “python for end users”, not “python”. Maybe Fedora doesn’t have a distinction like that, I’m not sure, but the latest idea in Debian land is that the python3-full package is intended for end users to get a full setup for using python itself as an end user, but that other packages would depend on some other python package that doesn’t install pipx.

As above though, this section is non normative and should be ignored if the recommendations don’t make sense in a particular situation.

Others may feel differently, but I think our point of view here was that this PEP assumes that either that patch will land OR distros will continue to (or start) patching sysconfig.

However, even if sysconfig remains unpatched, that just means that this PEP loses some safety features around the hypothetical “I know what I’m doing, please pip install anyways” flag, because it is assumed that distros will, at a minimum, add the externally managed marker so that flag would end up being required for an end user to even be in a situation where sysconfig patches matter.

tl;dr:

I think we can move at whatever speed distros need, but I don’t think waiting for bpo-43976 is required if a distro is willing to either patch sysconfig and/or only have one layer of protection before that lands. The rest is non normative recommendations which you should feel free to ignore if they don’t make sense for Fedora, but if there are better recommendations it is certainly something that can change too.

Same here. @FFY00 would be able to provide some more context, I believe the intention is to move PEP 668 and bpo-43976 independently, since the former sill provides much benefit without the latter, and the latter does not need an Informational PEP (although it may be nice to have one for its own describing why and how distros may use the feature).

Yes, these two improvements are parallel and are being driven independently. The purpose of the PEP is simply standardizing an externally managed marker, not CPython customization mechanisms. The PEP does acknowledge some possible mechanism to achieve such customizations, but that is only informational.

Perhaps it would make sense to split the PEP into two, one for the EXTERNALLY-MANAGED specification, and an informational one describing the issues and possible solutions.

Hmm… thinking about the container recommendation a bit more — I think we want the containers to have the same protections of not clobbering distro-managed packages even inside the containers.

It’s not like container environments are fundamentally different in terms of how the packages are laid out on the file system. The fundamental problem of clobbering over each other in the same directory remains unchanged and we shouldn’t recommend removing the protections we’re adding in this PEP within containers.

1 Like

Things seem to have settled down here.

So far, it seems like the actionable bits here are:

  • Add a list of folks/distros/redistributors who we’ve reached out to, and explicitly list them in the PEP with their response that this works for them.
  • Flip the container recommendation, to recommend including the same protections in a container.

If that’s really all, I’ll try to file a PR for this by Friday/this weekend. If anyone beats me to it, all good.

I think an informational PEP, described explicitly as a core Python PEP rather than “just” a packaging PEP, and ratified by the SC, would be useful here.

It could describe all of the machinery involved, and give a full description of what 3rd parties maintaining any sort of “system package manager” tool need to do in order to interoperate with the Python ecosystem.

It wouldn’t cover just Linux distros at that point - it would also cover cygwin, conda, msys2, homebrew and even things like embedded systems that want Python packages to work cleanly with their toolsets.

An alternative to a PEP would be a section in the Python docs (maybe under “Setup and Usage” although that’s more about the user experience than the redistributor experience). But traditionally I think informational PEPs have been the vehicle for this sort of document.

1 Like

I wasn’t going to speak up, as the current state of things is not really a problem for us, but ActiveState has no issues with this PEP and will implement it as soon as it is accepted.

Thanks to everyone that contributed to this PEP! It is much more detailed than anything I was imagining at PyCon.

2 Likes