PEP 668: Marking Python base environments as "externally managed"

At the “Linux in Distros” sprint at PyCon US in May, we drafted a PEP about making external package managers like apt/dnf/etc. and Python-specific package managers like pip play more nicely together. This includes the “sudo pip install” problem, but it’s a little more general than that.

The short version is it has two recommendations:

  1. When a distro indicates it’s managing a Python installation, tools like pip should only install into a virtualenv, by default (with a way to override it), and show an error message that the distro can customize.

  2. Distros should have two site-packages directories, one for distro-packaged files and one for local-sysadmin-installed files (e.g., /usr/lib/python3.x/site-packags vs. /usr/local/lib/python3.x/site-packages), and tools like pip should only create, delete, or modify files in the latter directory.

Please see PEP 668 and let us know what you think. The PEP has an extended rationale for these recommendations and discusses a couple of alternative approaches.

This has been previously discussed on linux-sig - thanks to all the folks who provided feedback on the draft!

13 Likes

This looks really nice! One small ask: in the Use Cases table, please refer to a ‘single-application container’ instead of a ‘Docker container’. The remainder of the text does this, just the table specifically refers to Docker and there are far more container-image tools than Docker out there :slight_smile:

3 Likes

I didn’t know this was a thing. I should probably share my trials and tribulations trying to wrap Python packages for the Chocolatey package manager (the Windows apt-like package manager). Is there some place where I should post a report or present on it or whatever?

1 Like

A couple of questions:

  • Anyone have feedback beyond what @kpfleming said above regarding Docker → container?
  • Should @dstufft or @pf_moore be the PEP delegate on this? Or should this be put up to the SC to delegate or decide on?

As a Homebrew user, I don’t care for recommendation 1 (aside from the fact that I feel user installs should be allowed; I didn’t see how/where the actual PEP addresses that). I always pip install packages system-wide, and if I were to use brew to install a Python package, Homebrew would install it inside a virtualenv pipx-style so that it wouldn’t interfere with pip’s operation. (At least, I think that’s Homebrew policy. I recently found that brew installs Mercurial as though it were installed with pip; not sure what’s up with that.)

I’m willing to be told I’m in the minority on this, but I won’t like it if both system-wide and user installs become discouraged.

1 Like

Regarding PEP Delegate, I don’t think this needs to go to the SC, as it doesn’t involve any changes to Python itself (assuming I didn’t miss anything in my brief skim of the PEP!)

I’m willing to be PEP delegate if necessary (it sort of feels like “interoperability” to me, I guess) but I’d have no objection if someone else wanted to offer to take on the role. I don’t feel that I have any special expertise here.

One thing I would be looking for is a bit more discussion - the linux-sig discussion mentioned was only 6 messages since May, and there’s only a couple of messages here. I’m not convinced that “silence means approval” is sufficient here, it’s difficult to be sure where interested parties hang out, so silence seems far more likely to imply “wasn’t aware of the proposal” in this case. In fact, I’d suggest that the PEP gets a section listing distributions that have confirmed their intent to support this proposal, including the distribution, and a link to where the commitment was made.

1 Like

It wouldn’t be me, it’d either be Paul (probably?) or the SC.

To be clear, these are recommendations for distros, not for the general Python world. It would be up to Homebrew to decide which of these recommendations make sense for them. If they’ve already taken the approach that the default Python installation area is for users and they’re putting their installed tools into virtual environments, then they might continue to do that and ignore that recommendation (or they might decide to change!).

The PEP doesn’t mandate anything in that recommendations section, that just exists for documenting how a distro can, we think, best interact with the Python world given “typical” constraints given by a distro.

The meat of what this PEP actually specifies can be summarized into:

  • Python installers should not modify file paths outside of the target scheme they are targeting (so if you’re set --user, don’t modify the global one, if you’re modifying the global one, don’t start uninstalling stuff that happens to be on random paths in sys.path.
  • Python installers should look for the EXTERNALLY-MANAGED managed file in it’s defined location, and if it exists (unless we’re inside of a virtual environment) should refuse to run without some additional flag or confirmation mechanism to override that as a “are you sure, you’re possibly breaking your system here”.

Everything else is justifications, deep dives into how these two changes will affect different scenarios, recommendations that we felt are good “do this, unless you have a good reason” defaults for distros, and what other alternatives we explored.

If a distro doesn’t add the EXTERNALLY-MANAGED file, very little should change. The edge case here is if the distro is patching Python or otherwise causing additional paths to show up on sys.path where pip would previously uninstall stuff from, but under this PEP, the first item I listed means it would no longer touch those files. I think this is a better default anyways as it’s far less likely to randomly break things, but it is potentially a minor backwards incompatibility.

1 Like

Some addition to what Donald has already said.

Packages installed in user-site have about the same possibility to break system-wide packages as those in the system-site, so the PEP’s recommendation is to also block user-site installs by default.

I mentioned the PEP’s existence to some of the Homebrew maintainers before it was published. Homebrew does not currently have any concrete policies around third-party Python packages; most of them use a virtualenv-style installation because a) that works for most packages and b) most formula authors just copy the approach from an existing formula :stuck_out_tongue: This PEP will likely only slightly impact Homebrew (if they are to adopt it), but will likely have no negative impact to it. (It would impact users that pip install stuffs directly against the Homebrew Python, of course, but my impression is Homebrew maintainers don’t seem very concerned either way—not about that breaking Homebrew packages, nor if they’re no longer able to do it in the future.)

On the Homebrew topic, I hope people are familiar with this: Homebrew Python Is Not For You // Justin Mayer // Python • Infosec • Maintainer • Speaker

I am one of the maintainers of the Python interpreter in Fedora, Red Hat Enterprise Linux (RHEL), and CentOS. My main focus is on Fedora, but Fedora eventually defines what’s it gonna be like in RHEL/CentOS. I’ve been selected to summarize our feedback.

tl;dr we want to participate as much as possible, but we have some small concerns and we want to test it out in practice before it is approved

What we currently do

This also applies to Python 3.6+ in RHEL 7, 8, and future 9 as well as the appropriate CentOS Linux/Stream releases.

  1. We patch distutils, as is mentioned in the PEP. The patch makes sure that we install to /usr/lib(64)/python3.X/site-packages when we create RPM packages, but to /usr/local/lib(64)/python3.X/site-packages for other use cases. The patch is indeed conceptually the sort of hook envisioned by bpo-43976, except implemented as a code patch to distutils instead of as a changed sysconfig scheme. That difference is quite important, stay tuned.

  2. We also patch pip to prevent uninstallation from /usr/lib(64)/python3.X/site-packages when it is upgrading to /usr/local/lib(64)/python3.X/site-packages. This is conceptually what the PEP describes in the Writing to only the target sysconfig scheme, except very hacky and specific to our schemes. I am glad that the PEP addresses this, thank you!

  3. Our modern Python packaging macros use pip to install packages and we remove the RECORD file, as well as set the INSTALLER to rpm. This works nicely to prevent pip uninstalling packages, but will still fail if the user attempts an upgrade if we didn’t have (2). Also, the majority of Python RPM packages in Fedora still use the old macros that install with setup.py install and have egg-info instead of dist-info. Having the marker helps us to address this distro-wide and display a specific actionable error message without patching pip.

  4. Our patch from (1) also patches the site module to include /usr/local... if Python is invoked without -s and we encourage our packagers to use #!/usr/bin/python3 -s in shebangs. Our macros from (3) do that automatically. But some packages need to explicitly see Python packages installed in /usr/local... (e.g. pip-installed plugins) and they cannot use -s.

What problems do we face

  1. Distutils are deprecated and our patch patches distutils instead of sysconfig. Once distutils are gone, we will no longer be able to patch them in the standard library and we would need to apply patches to setuptools/pip instead which is problematic because users tend to upgrade to newer versions of pip/setuptools from PyPI. We have discussed this at length with interested parties (namely @jaraco, @pradyunsg, and a bit with @doko42 and @steve.dower) – a plan emerged and was summarized in PEP 632 discussion.
    We planned to redo our patch in Python 3.10+ to patch sysconfig instead. See our proof of concept as a GitHub pull request. We have submitted several patches to CPython upstream, namely merging distutils.sysconfig into sysconfig and making distutils load install schemes from sysconfig.
    When everything was ready upstream, we have attempted to update our patch to use distutils only to realize there are unexpected consequences – namely the distutils patch was quite limited in its scope and impact (package installation) while the sysconfig one leaked in many different places. Currently, we know how to solve some of the issues we found, but we have not finished this in time for Fedora 35 which introduced Python 3.10 due to other urgent tasks and we plan to continue with Python 3.11.

  2. Pip requires patching. That means once the user does pip install -U pip, the patch is gone. And the next invocation of pip install -U setuptools might as well wipe the RPM-installed setuptools (making it invisible for system tools with #!/usr/bin/python3 -s shebangs, which was a disaster until recently, because entry_point console scripts used to import from pkg_resources (part of setuptools), but even now is a big problem). We were unable to solve this properly and once again, I am very glad that this PEP solves it.

  3. Many people did not understand/like the /usr/local... change when it was introduced, as well as they don’t really understand/like the RECORD-less installations, see for example one recent bug that was reported because get-pip.py is unable to uninstall system-installed pip.

PEP 668 in Fedora

We want to implement this PEP in Fedora, as it will allow us to drop most of our patches (or do them in an upstream-supported way) and hopefully solve the problems. However, our implementation heavily depends on bpo-43976 which is kinda mentioned in the PEP but is not described in much detail. As with our previous attempts to do this, we expect the devil to be in the details and we would like to have a working proof of concept before this PEP is approved. Making bpo-43976 work in Fedora will be my priority for upcoming months, but unfortunatelly we still have some details to figure out (such as the -s behavior). Ideally, we would play with the implementation for a while before this PEP is approved. Is that acceptable to you? A draft pull request for pip and Python would be really helpful.

Our concerns in the current draft

  1. The PEP recommends removing the marker file in container images. It specifically mentions dpkg-based systems, however, this will be really tricky in RPM-based systems. We don’t control all places where Fedora creates container images to be able to explicitly rm that marker. We don’t want to remove it in the post-installation scriptlet, because there is no “configuration flag” here: the installation of packages is not interactive. We could possibly package that marker as a separate package, marking it Recommended by Python, but that only makes it even harder for us to influence what installations will (not) have it.
    We would prefer if the marker is recommended to be installed by distributors even in base container images and people who build on top of those images would explicitly remove it if they want to use pip in a way that conflicts with the purpose of that marker.
  2. The PEP recommends that we make Python require pipx. That is extremely unlikely to happen. Our Python does not even require a system-installed pip. Our Python has venv working out of the box as well as ensurepip because we explicitly want to avoid problems of that thing being broken. pipx is not part of the standard library and we don’t want to pull it into systems that only have Python installed as a dependency of e.g. dnf. There is a certain compromise between “Python is too bloated” and “standard things work out of the box” and requiring pipx is way too far.
  3. bpo-43976 is not part of this PEP’s specification.

Thank you for working on this and trying to standardize things! You rock.

1 Like

I think the idea we had was that prior to having the built in mechanism for extending sysconfig, distros could continue to, or start patching sysconfig until the upstream supported patch was finished (and if you wanted that on older Pythons you would have to backport it anyways).

We also have the provisional state where we can accept an idea in theory with the details we think are going to work, and then continue to do refinements as we get more real world use or implementation discovered issues.

Ultimately though, from my POV distro packagers like yourself are the main “customer” served by this PEP (though it of course makes things better for pretty much everyone who touches a distro or uses a distro or deveops software that gets packaged for a distro), so I’m personally more than happy to move at whatever pace distro packagers want to move on this change. The other authors might feel differently I dunno.

Note that this is part of a non normative section and represents what we thought would likely arise as best practices, but which are not required in any way. The goal of that was that container images are kind of weird in that their somewhat emphereal nature means that the breakages this attempts to avoid are lessened, and we didn’t want to just suddenly break all docker users who are using a system installed python + pip.

I think it would be entirely fine for Fedora to make it’s own recommendation here that better matches with how that system works. I also think it would be fine for us to update those recommendations as we get more real world experience with any unforseen side effects or interactions and what the best way to avoid or mitigate them end up being.

Like above, this is part of the non normative general recommendations section, which you should feel free to ignore. I don’t think the intention here is that you can’t have python packaged & installed without getting pipx (or similiar). But rather it’s an attempt to recognize that users are less likely to reach for a hypothetical “just do what I said, even if it breaks my system” switch in pip (or whatever installer they are using) if they already have another alternative that they can be recommended to use at hand.

Thus “consider packaging pipx or similar and incorporating that into the error message” is the biggest point for that section IMO. Then the installing it by default thing is just further paving the cowpath to getting users to use it. I will note that the recommendation is for that is to install it for “python for end users”, not “python”. Maybe Fedora doesn’t have a distinction like that, I’m not sure, but the latest idea in Debian land is that the python3-full package is intended for end users to get a full setup for using python itself as an end user, but that other packages would depend on some other python package that doesn’t install pipx.

As above though, this section is non normative and should be ignored if the recommendations don’t make sense in a particular situation.

Others may feel differently, but I think our point of view here was that this PEP assumes that either that patch will land OR distros will continue to (or start) patching sysconfig.

However, even if sysconfig remains unpatched, that just means that this PEP loses some safety features around the hypothetical “I know what I’m doing, please pip install anyways” flag, because it is assumed that distros will, at a minimum, add the externally managed marker so that flag would end up being required for an end user to even be in a situation where sysconfig patches matter.

tl;dr:

I think we can move at whatever speed distros need, but I don’t think waiting for bpo-43976 is required if a distro is willing to either patch sysconfig and/or only have one layer of protection before that lands. The rest is non normative recommendations which you should feel free to ignore if they don’t make sense for Fedora, but if there are better recommendations it is certainly something that can change too.

Same here. @FFY00 would be able to provide some more context, I believe the intention is to move PEP 668 and bpo-43976 independently, since the former sill provides much benefit without the latter, and the latter does not need an Informational PEP (although it may be nice to have one for its own describing why and how distros may use the feature).

Yes, these two improvements are parallel and are being driven independently. The purpose of the PEP is simply standardizing an externally managed marker, not CPython customization mechanisms. The PEP does acknowledge some possible mechanism to achieve such customizations, but that is only informational.

Perhaps it would make sense to split the PEP into two, one for the EXTERNALLY-MANAGED specification, and an informational one describing the issues and possible solutions.

Hmm… thinking about the container recommendation a bit more — I think we want the containers to have the same protections of not clobbering distro-managed packages even inside the containers.

It’s not like container environments are fundamentally different in terms of how the packages are laid out on the file system. The fundamental problem of clobbering over each other in the same directory remains unchanged and we shouldn’t recommend removing the protections we’re adding in this PEP within containers.

1 Like

Things seem to have settled down here.

So far, it seems like the actionable bits here are:

  • Add a list of folks/distros/redistributors who we’ve reached out to, and explicitly list them in the PEP with their response that this works for them.
  • Flip the container recommendation, to recommend including the same protections in a container.

If that’s really all, I’ll try to file a PR for this by Friday/this weekend. If anyone beats me to it, all good.

I think an informational PEP, described explicitly as a core Python PEP rather than “just” a packaging PEP, and ratified by the SC, would be useful here.

It could describe all of the machinery involved, and give a full description of what 3rd parties maintaining any sort of “system package manager” tool need to do in order to interoperate with the Python ecosystem.

It wouldn’t cover just Linux distros at that point - it would also cover cygwin, conda, msys2, homebrew and even things like embedded systems that want Python packages to work cleanly with their toolsets.

An alternative to a PEP would be a section in the Python docs (maybe under “Setup and Usage” although that’s more about the user experience than the redistributor experience). But traditionally I think informational PEPs have been the vehicle for this sort of document.

1 Like

I wasn’t going to speak up, as the current state of things is not really a problem for us, but ActiveState has no issues with this PEP and will implement it as soon as it is accepted.

Thanks to everyone that contributed to this PEP! It is much more detailed than anything I was imagining at PyCon.

2 Likes

I hope I am not too late (looking at the Sep 16 submission information from @pradyunsg), but as I was encouraged several times to contribute and help with making PEP668 better, I would like to make some proposals in response to those requests.

TL;DR; I have two proposals of amendments to PEP 668 - specificaly in the area of containers that I consider I have quite some experience with. I would love to add more clarifications to the “container” case briefly described in PEP 668 - that includes removing of “marker removal” recommendation for containers accompanies with set of best practices/guidelines and recommendations for image writers to help them and encourage to follow PEP 668 and use venv in the images. While discussing/reading the PEP I had the feeling that this subject has been a bit neglected (possibly even for a good reason) in PEP 668 but it could be treated with a bit more care (and I am happy to take care of it).

Apologies for a long message that follows, but I think not everyone knows the context and I would like to introduce my experience/findings to build some trust that I could take care about it.

Some of my background

I work on Apache Airflow for years, this is one of the most complex (when it comes to dependencies) open-source python project (with >500 dependencies overall) - it is accompanied by a very well developed, community maintained and highly optmized container Dockerfiles and Docker images. We manage those for three purposes: development environment, CI, and Production. The image and CI we have also includeds extensive automated testing of the image, and automated management of those > 500 dependencies (which is not easy for multiple reason - but the main one that Airflow is both an application to instal and a platform that allows users to write and execute their custom Python. This creates the unique set of challenge that we need to have fix dependencies to install Airflow but also open dependencies that will let our users install their own version of dependencies and write their own custom code.

Also Airflow is not the only project where I worked on Python-based images. I worked for 1.5 year in NoMagic.ai (Robotics + AI startup in Poland) where I moved the company to a Docker-based environment where we run both development and production of Python ROS (Robotics Operating System) with Nvidia GPU accelerated simulations and this is where I got ins-outs of building Python-centric docker images. I consider Python’s ROS the second most complex Python project when it comes to dependencies out there (I think Airflow beats it by just a bit but I might be wrong)

I am also one of the few lucky people who is fully focused on contributing to Open Source. This is my daily job. I am an independent contributor, with parts of my time paid by several Open-Source stakeholders but I have a lot of freedom to choose what I work on my day job (plus I tend to spend 50% of my other free time continuing contributing to OSS and especially Apache (I am a member of Apache Software Foundation) and I think bringing my experience from Python/Images/Container cross-knowledge and experience is somethig I would love to help others with.

Where I lack the experience

First - sincere apologies for not being profficient with the PEP process. I am experienced contributor, committer, and PMC member of Apache Airflow, and I created and led to completion quite a number of AIP-s - Airflow Improvement Proposals, but PEP process is somethign I have no experience with. So I would really love some guidance on how (if) I can make my proposals happen.

Also the subject of PEP 668 is relatively new to me - only recently I was made aware of it and I read a lot since. I understand what it does and where it came from- but likely I do not have full context so apologies if I state the obvious or if the points I raised have been discussed already.

Maybe even it will turn out that what I proposed should not be part of the PEP 668, maybe it needs some follow up for some of the details I proposed, but in this I think I’d love to hear some guidance and suggestions on what to do and how to approach it (and how to make sure PEP 668 might be amended/linked to the proposal in the way that current recommendation will not undermine it).

Context

A bit more context from my side as only few people from the discussion here were involved with Disable warning from pip install when executed as "root" user. · Issue #10556 · pypa/pip · GitHub. Initially I was quite opposed to the way how currently PEP complains about using root and directing to virtualenv instead. I still personally do not like the message there (because it is ambiguous and problem has seemingly nothing to do with proposed remediation, but as I understand it, the way it is worded is a by-product of “virtualenv” being “recommended” but kind of “between the lines” and not wholeaheartedly and straightforwardly seen as the “only future-proof solution”. Since a lot of the people who complained about the “unremovable warning” came from the “container” world I figured that PEP 668 could be a bit more detailed and “bold” in proposing it, but it should be acompanied with good practices and recommendations and with rationale that will make it easier to understand why and how PEP 668 and “going venv” is also good for containers (as it is not at all obvious, clean and from the current PEP 668 you get ambiguous messages about it).

I admit that possibly also from my side the discussion was not going in best directions sometimes, but after reading this discussion, re-reading the PEP 668 several times and reading a history of dozen of similar issues I understand much more why virtualenv is the way forward also for containers. I asked a few clarifying questions but I found out that asking too many too precise questions might be too much of a demand (though I personally believe challenging status quo, asking questions where you have doubts and generally being curious is a good thing), so I decided to try it out and answer the questions myself by converting Airflow to use it and fix the problems along the way:

Proposal

Here is the gist of my proposal:

  1. following @hroncok suggestion - I think “removing marker files in container images” is not best recommendation. Even if it is non normative it is still part of the recommendation and people might be quite mislead by it. I think there is no good reason why containers should be different. I think a bit stronger statement there would not hurt.

Especially that

  1. In order to make it more helpful to image creators who might have similar doubts I had initially, I think we can extend container part of PEP 668 with set of recommendation for people who build their images, dos and dont’s and best practices. I have quite a good set of findinga and recommendations for container images based on the exericise I’ve done for Airflow. I know it’s not “comprehensive” to cover all the ways how container images with Python are built. But I have quite an extensive experience going through years of development images, mostly including Python and I have already gone through the exercise of converting Airflow images as well as findings and fixes/workarounds to issues I anticipated it will bring (documented in the “Disable Warnings” issue above). Airflow Images are very sophisticated, allow for both extension and customization, they have gone through many iteration and serve many cases. You can see our docs in airflow docs (sorry as a new user I can only add two links in the post) and watch the 45 minutes “Production Docker Image for Apache Airflow” talk I gave last year at the Airflow Summit 2020 where I explain whys, whats and whats and provide more context on decisions made there: Production Docker image for Apache Airflow - YouTube

Maybe this is a bold statement but during the discussions and image conversion I have implemented I think I identified and figured out how to address most of the issues people might have when converting to venv-based images. In any case I am also willing not only to write it up but also extend, discuss, defend and generally become co-author of the PEP and its follow-ups when it comes to the container part. I think I have all the experience and skills (and time) needed for that.

Recommendations best/practice areas

During the discussions and testing I identified the following areas that needs explanation/clarification, and I think once we do it - we could change the recommendation to also make venv and marker files as first-class citizens in container usage for Python:

  • impact of venv on the size of certain container images (I tihnk recommending to use alternatives for alpine image which is particularly affected and some basic calculations making conscious “yes it will be a bit bigger but this justified”
  • recommend ways how to use (fixed paths and cloning) venv in order to optimize the image sizes (including multi-segmented images)
  • ways how to share venv between mutliple (and often arbitrary) users - which is necessary for Open-Shift best practices for writing good images
  • recommended ways (and needs) how to activate the venv inside the images including mutliple cases: regular users, sudo, sudo with interactive login - this is most needed because Dockerfiles work a bit counter-intuitively for users who are used to terminal sessions, and some of the - even popular -packages are still not compatible with the “obvious” way of adding venv bin folder to the PATH
  • guidance on creation of venv from venv. This is an edge case but one that caused me a lot of headache when converting Airlfow to venv-based image (but I think I solved all those problems and I can come up with good set of recommendations).

Looking forward, for comments and suggestions, apologies for any mishaps I might have made not knowing the ettiquete here - I’d really love to help to make the Python + Images work better and look forward in helping with that.

1 Like

Hello there. It’s been several days now and I’ve been patiently waiting.

I’d love to hear some comments. Is it possible to propose some clairifications to the “container” part of PEP 668? Has it been submitted for approval yet? How can I propose the changes @pradyunsg ?

The comment here: Disable warning from pip install when executed as "root" user. · Issue #10556 · pypa/pip · GitHub suggest it has not and from my own experience I think adding some clarifications for container environments might make it much more appealing for container image creators to accept the “venv is also ok for containers” - so if the goal is to make it more likely to be adopted I’d love to improve it.

Any guidance and help that could help me with making the proposal?

1 Like

Hi @potiuk, thank you for giving feedback. The PEP is still in draft, so it is possible to change it. Unfortunately, I think all of the authors are fairly busy, so I think probably the best move forward is to open a PR with your proposed changes and ask for reviews.

I have read your post when you original commented, but haven’t had time to properly review it. I am now skimming it to give you some feedback. Here are some of my initial thoughts:

While I acknowledge this might not be the best practice, not following it will break a lot of setups. This was the main motivation behind the recommendation.
Perhaps the PEP does not accurate reflect this situation, so it could be updated to better do so, and maybe only recommend this action when desired to keep backwards compatibility. Though, I am not really sure of what this might mean in the long term, this is just a thought.

One thing to note here is that Airflow has a very specific use-case, which is not at all representative of the majority of the users, who are the main target. It does, however, represent the needs of more complex projects, which we should still attend to.
The PEP is already very extensive, so perhaps this proposal could be split up to its own informational PEP. I think documenting your experience would be incredibly valuable, I am just not sure if this PEP is the best place to do so.