PEP 668: Marking Python base environments as "externally managed"

I wasn’t going to speak up, as the current state of things is not really a problem for us, but ActiveState has no issues with this PEP and will implement it as soon as it is accepted.

Thanks to everyone that contributed to this PEP! It is much more detailed than anything I was imagining at PyCon.

2 Likes

I hope I am not too late (looking at the Sep 16 submission information from @pradyunsg), but as I was encouraged several times to contribute and help with making PEP668 better, I would like to make some proposals in response to those requests.

TL;DR; I have two proposals of amendments to PEP 668 - specificaly in the area of containers that I consider I have quite some experience with. I would love to add more clarifications to the “container” case briefly described in PEP 668 - that includes removing of “marker removal” recommendation for containers accompanies with set of best practices/guidelines and recommendations for image writers to help them and encourage to follow PEP 668 and use venv in the images. While discussing/reading the PEP I had the feeling that this subject has been a bit neglected (possibly even for a good reason) in PEP 668 but it could be treated with a bit more care (and I am happy to take care of it).

Apologies for a long message that follows, but I think not everyone knows the context and I would like to introduce my experience/findings to build some trust that I could take care about it.

Some of my background

I work on Apache Airflow for years, this is one of the most complex (when it comes to dependencies) open-source python project (with >500 dependencies overall) - it is accompanied by a very well developed, community maintained and highly optmized container Dockerfiles and Docker images. We manage those for three purposes: development environment, CI, and Production. The image and CI we have also includeds extensive automated testing of the image, and automated management of those > 500 dependencies (which is not easy for multiple reason - but the main one that Airflow is both an application to instal and a platform that allows users to write and execute their custom Python. This creates the unique set of challenge that we need to have fix dependencies to install Airflow but also open dependencies that will let our users install their own version of dependencies and write their own custom code.

Also Airflow is not the only project where I worked on Python-based images. I worked for 1.5 year in NoMagic.ai (Robotics + AI startup in Poland) where I moved the company to a Docker-based environment where we run both development and production of Python ROS (Robotics Operating System) with Nvidia GPU accelerated simulations and this is where I got ins-outs of building Python-centric docker images. I consider Python’s ROS the second most complex Python project when it comes to dependencies out there (I think Airflow beats it by just a bit but I might be wrong)

I am also one of the few lucky people who is fully focused on contributing to Open Source. This is my daily job. I am an independent contributor, with parts of my time paid by several Open-Source stakeholders but I have a lot of freedom to choose what I work on my day job (plus I tend to spend 50% of my other free time continuing contributing to OSS and especially Apache (I am a member of Apache Software Foundation) and I think bringing my experience from Python/Images/Container cross-knowledge and experience is somethig I would love to help others with.

Where I lack the experience

First - sincere apologies for not being profficient with the PEP process. I am experienced contributor, committer, and PMC member of Apache Airflow, and I created and led to completion quite a number of AIP-s - Airflow Improvement Proposals, but PEP process is somethign I have no experience with. So I would really love some guidance on how (if) I can make my proposals happen.

Also the subject of PEP 668 is relatively new to me - only recently I was made aware of it and I read a lot since. I understand what it does and where it came from- but likely I do not have full context so apologies if I state the obvious or if the points I raised have been discussed already.

Maybe even it will turn out that what I proposed should not be part of the PEP 668, maybe it needs some follow up for some of the details I proposed, but in this I think I’d love to hear some guidance and suggestions on what to do and how to approach it (and how to make sure PEP 668 might be amended/linked to the proposal in the way that current recommendation will not undermine it).

Context

A bit more context from my side as only few people from the discussion here were involved with Disable warning from pip install when executed as "root" user. · Issue #10556 · pypa/pip · GitHub. Initially I was quite opposed to the way how currently PEP complains about using root and directing to virtualenv instead. I still personally do not like the message there (because it is ambiguous and problem has seemingly nothing to do with proposed remediation, but as I understand it, the way it is worded is a by-product of “virtualenv” being “recommended” but kind of “between the lines” and not wholeaheartedly and straightforwardly seen as the “only future-proof solution”. Since a lot of the people who complained about the “unremovable warning” came from the “container” world I figured that PEP 668 could be a bit more detailed and “bold” in proposing it, but it should be acompanied with good practices and recommendations and with rationale that will make it easier to understand why and how PEP 668 and “going venv” is also good for containers (as it is not at all obvious, clean and from the current PEP 668 you get ambiguous messages about it).

I admit that possibly also from my side the discussion was not going in best directions sometimes, but after reading this discussion, re-reading the PEP 668 several times and reading a history of dozen of similar issues I understand much more why virtualenv is the way forward also for containers. I asked a few clarifying questions but I found out that asking too many too precise questions might be too much of a demand (though I personally believe challenging status quo, asking questions where you have doubts and generally being curious is a good thing), so I decided to try it out and answer the questions myself by converting Airflow to use it and fix the problems along the way:

Proposal

Here is the gist of my proposal:

  1. following @hroncok suggestion - I think “removing marker files in container images” is not best recommendation. Even if it is non normative it is still part of the recommendation and people might be quite mislead by it. I think there is no good reason why containers should be different. I think a bit stronger statement there would not hurt.

Especially that

  1. In order to make it more helpful to image creators who might have similar doubts I had initially, I think we can extend container part of PEP 668 with set of recommendation for people who build their images, dos and dont’s and best practices. I have quite a good set of findinga and recommendations for container images based on the exericise I’ve done for Airflow. I know it’s not “comprehensive” to cover all the ways how container images with Python are built. But I have quite an extensive experience going through years of development images, mostly including Python and I have already gone through the exercise of converting Airflow images as well as findings and fixes/workarounds to issues I anticipated it will bring (documented in the “Disable Warnings” issue above). Airflow Images are very sophisticated, allow for both extension and customization, they have gone through many iteration and serve many cases. You can see our docs in airflow docs (sorry as a new user I can only add two links in the post) and watch the 45 minutes “Production Docker Image for Apache Airflow” talk I gave last year at the Airflow Summit 2020 where I explain whys, whats and whats and provide more context on decisions made there: Production Docker image for Apache Airflow - YouTube

Maybe this is a bold statement but during the discussions and image conversion I have implemented I think I identified and figured out how to address most of the issues people might have when converting to venv-based images. In any case I am also willing not only to write it up but also extend, discuss, defend and generally become co-author of the PEP and its follow-ups when it comes to the container part. I think I have all the experience and skills (and time) needed for that.

Recommendations best/practice areas

During the discussions and testing I identified the following areas that needs explanation/clarification, and I think once we do it - we could change the recommendation to also make venv and marker files as first-class citizens in container usage for Python:

  • impact of venv on the size of certain container images (I tihnk recommending to use alternatives for alpine image which is particularly affected and some basic calculations making conscious “yes it will be a bit bigger but this justified”
  • recommend ways how to use (fixed paths and cloning) venv in order to optimize the image sizes (including multi-segmented images)
  • ways how to share venv between mutliple (and often arbitrary) users - which is necessary for Open-Shift best practices for writing good images
  • recommended ways (and needs) how to activate the venv inside the images including mutliple cases: regular users, sudo, sudo with interactive login - this is most needed because Dockerfiles work a bit counter-intuitively for users who are used to terminal sessions, and some of the - even popular -packages are still not compatible with the “obvious” way of adding venv bin folder to the PATH
  • guidance on creation of venv from venv. This is an edge case but one that caused me a lot of headache when converting Airlfow to venv-based image (but I think I solved all those problems and I can come up with good set of recommendations).

Looking forward, for comments and suggestions, apologies for any mishaps I might have made not knowing the ettiquete here - I’d really love to help to make the Python + Images work better and look forward in helping with that.

1 Like

Hello there. It’s been several days now and I’ve been patiently waiting.

I’d love to hear some comments. Is it possible to propose some clairifications to the “container” part of PEP 668? Has it been submitted for approval yet? How can I propose the changes @pradyunsg ?

The comment here: Disable warning from pip install when executed as "root" user. · Issue #10556 · pypa/pip · GitHub suggest it has not and from my own experience I think adding some clarifications for container environments might make it much more appealing for container image creators to accept the “venv is also ok for containers” - so if the goal is to make it more likely to be adopted I’d love to improve it.

Any guidance and help that could help me with making the proposal?

1 Like

Hi @potiuk, thank you for giving feedback. The PEP is still in draft, so it is possible to change it. Unfortunately, I think all of the authors are fairly busy, so I think probably the best move forward is to open a PR with your proposed changes and ask for reviews.

I have read your post when you original commented, but haven’t had time to properly review it. I am now skimming it to give you some feedback. Here are some of my initial thoughts:

While I acknowledge this might not be the best practice, not following it will break a lot of setups. This was the main motivation behind the recommendation.
Perhaps the PEP does not accurate reflect this situation, so it could be updated to better do so, and maybe only recommend this action when desired to keep backwards compatibility. Though, I am not really sure of what this might mean in the long term, this is just a thought.

One thing to note here is that Airflow has a very specific use-case, which is not at all representative of the majority of the users, who are the main target. It does, however, represent the needs of more complex projects, which we should still attend to.
The PEP is already very extensive, so perhaps this proposal could be split up to its own informational PEP. I think documenting your experience would be incredibly valuable, I am just not sure if this PEP is the best place to do so.

I was writing this up, and I’m not a 100% sure that I did not miss any major redistributor with their own package manager? Here’s the list so far:

  • Fedora / RHEL / CentOS (commented in discussions)
  • Debian (co-authors)
  • Arch Linux (co-authors)
  • Conda / Anaconda (commented in discussions)

I guess one good group to reach out to would be Homebrew / Linuxbrew. I can’t think of who else though, so thoughts on that are welcome!

It’s worth mentioning Spack I think. It works in a very similar way to Conda.

(Conda [3] is a bit of a special case, as the conda command can install much more than just Python packages, making it more like a distro package manager in some senses.

I don’t actually think about it like this at all. The conda base environment is like a Linux distro, and you should never mess with it as a user. However, user-created conda environments are conceptually similar to Python’s virtual environments, just more powerful. They do environment activation, and have a “one environment for one development task” philosophy. The environment activation does a bunch of magic, that’s why you can’t layer a virtualenv on top of an activated conda env. And a major part of the reason why you need environments is because the package manager/repository provides multiple versions of every package - just like PyPI, and unlike Linux distros and Homebrew.

The above all applies to Spack environments too I believe.

This makes the “using a virtual environment is best-practice” advice more generalizable. We have two kinds of Python installs:

  1. “base” or “system”: macOS system, Linux distros, Homebrew, Python.org installers, Python built from source, conda base env Python, etc.
  2. “dev environment”: virtual/conda/spack environments

The advice for all of these is: never mess with (1), always work in (2).

The NixOS Python docs page linked from the PEP is kind of illustrative in this regard: it has a python-with-my-packages where everything is in the “base” OS packages and that’s immutable so you can’t pip install anything, and then further down it explains how to get virtual/conda/micromamba/mach-nix environments - all user-level dev envs where you can install whatever you want.

3 Likes

I like this characterisation. Apart from being a good model for this discussion, it also leads nicely into the (completely off-topic for this discussion, but maybe worth having a separate thread on at some point) question of shipping Python “applications”, which is essentially the question of how you take your code and bundle up the essential aspects of your “dev environment” in a way that allows your code to be deployed to a “base” environment without violating the “never mess with the base environment” rule.

FWIW, I’ve started a slightly bike-sheddy topic around this PEP: Renaming PEP 668

If I remember correctly that statement was less about conda’s capabilities with the ability to create virtual environments, and more about the fact that it cannot rely solely on the Python level metadata because a number of its Packages have nothing to do with Python at all. Thus, in this case, it’s operating more like a Linux distribution in that it has its own database of installed packages that exists outside of the Python level metadata.

Granted conda itself has integration with pip, but pip doesn’t integrate with conda.

Though if I recall correctly the PEP doesn’t expect conda to write the “don’t touch me” file, just the sysconfig split.

Based on the discussion in Renaming PEP 668, I’m going to go ahead and make a PR changing the PEP title to the following.

Marking Python base environments as “externally managed”

If someone has concerns around this renaming, please bring those up in that thread.

2 Likes

Alrighty, are there any other folks that we should reach out to? I only have Spack and Homebrew on my list of groups to reach out to for this.

@rockobonaparte What’s the story for Chocolatey potentially implementing the bits that this PEP proposes?

There’s also ActiveState Python here, that I missed earlier!

1 Like

MacPorts is a major distributor of Python, third-party Python packages, and many other open source packages on macOS using their own package manager. It was based on the FreeBSD ports package manager also still in use.

1 Like

There’s another package manager for macOS called fink that distributes Python.

On Windows, Cygwin and MSYS2 might also be interested, especially with all the patching they have to do to their Python distributions.

In Nixpkgs/NixOS there is not really any difference between system packages and user packages. With NixOS and Nix in general the system and users can have profiles where they can install packages/environments, or they can open a shell with whatever packages/environments. The built packages are exactly the same and will be shared as well. Everything built with Nix is read-only. Using pip or conda outside of Nix means you have to give it a writable folder so it can work.

If I understand the PEP correctly in Nixpkgs we will add a EXTERNALLY-MANAGED file to our CPython build. For NixOS we will not add any directories for system-wide local installation because that goes against our way of working.

Link to the “official” documentation regarding Python in Nixpkgs is btw NixOS - Nixpkgs 21.11 manual.

1 Like

I’ve reached out to the Spack folks, over the Slack channel that they have for the project.

1 Like

Quote from @tgamblin, about Spack and this PEP:

We are trying to move people to environments so they don’t modify the existing python installation directory (as it’s likely that the python install is a dependency of many things with different needs). Ideally it’d be read-only, and they could use pip in an env. We’d probably mark the installed python as externally managed and not the python in the spack environment

So, if I’m understanding correctly, this would be useful to Spack!

1 Like

@pf_moore (and others as well) Is the following list of redistributors of Python good-enough to show that this PEP is reasonably generally applicable? (this was the concern flagged in PEP 668: Marking Python base environments as "externally managed" - #6 by pf_moore)

  • ActiveState Python
  • Arch Linux
  • Conda
  • Debian (and derivatives)
  • Fedora/RHEL/CentOS
  • Nix/NixOS
  • Spack

I’d like to get a better sense if there’s any need to reach out more redistributors for establishing that this PEP would be useful+beneficial overall. Primarily because I don’t think we want to go to every potential redistributor of Python. :slight_smile:

1 Like

Seems pretty good to me. The only other ones I can even think of are Homebrew and Pyenv, but like you say, we’re not trying to be exhaustive here.

2 Likes

From what I understand based on past interactions, Homebrew doesn’t specify a groups of people working on policy decisions like this. Homebrew maintainers (as in ensuring brew to work as intended) aren’t that interested in ecosystem-specific things, and generally don’t really know who to talk to either. So what I would do is to simply make the decision, and contribute the externally-managed marker to Homebrew afterwards. Hopefully “the community” will welcome the change, but if not, things continue as-is there and we won’t get complaints either.

2 Likes

And, as a pyenv user, I don’t think it would be adding such a marker file — pyenv doesn’t manage Python packages, but instead Python installations.

1 Like