I hope I am not too late (looking at the Sep 16 submission information from @pradyunsg), but as I was encouraged several times to contribute and help with making PEP668 better, I would like to make some proposals in response to those requests.
TL;DR; I have two proposals of amendments to PEP 668 - specificaly in the area of containers that I consider I have quite some experience with. I would love to add more clarifications to the “container” case briefly described in PEP 668 - that includes removing of “marker removal” recommendation for containers accompanies with set of best practices/guidelines and recommendations for image writers to help them and encourage to follow PEP 668 and use venv in the images. While discussing/reading the PEP I had the feeling that this subject has been a bit neglected (possibly even for a good reason) in PEP 668 but it could be treated with a bit more care (and I am happy to take care of it).
Apologies for a long message that follows, but I think not everyone knows the context and I would like to introduce my experience/findings to build some trust that I could take care about it.
Some of my background
I work on Apache Airflow for years, this is one of the most complex (when it comes to dependencies) open-source python project (with >500 dependencies overall) - it is accompanied by a very well developed, community maintained and highly optmized container Dockerfiles and Docker images. We manage those for three purposes: development environment, CI, and Production. The image and CI we have also includeds extensive automated testing of the image, and automated management of those > 500 dependencies (which is not easy for multiple reason - but the main one that Airflow is both an application to instal and a platform that allows users to write and execute their custom Python. This creates the unique set of challenge that we need to have fix dependencies to install Airflow but also open dependencies that will let our users install their own version of dependencies and write their own custom code.
Also Airflow is not the only project where I worked on Python-based images. I worked for 1.5 year in NoMagic.ai (Robotics + AI startup in Poland) where I moved the company to a Docker-based environment where we run both development and production of Python ROS (Robotics Operating System) with Nvidia GPU accelerated simulations and this is where I got ins-outs of building Python-centric docker images. I consider Python’s ROS the second most complex Python project when it comes to dependencies out there (I think Airflow beats it by just a bit but I might be wrong)
I am also one of the few lucky people who is fully focused on contributing to Open Source. This is my daily job. I am an independent contributor, with parts of my time paid by several Open-Source stakeholders but I have a lot of freedom to choose what I work on my day job (plus I tend to spend 50% of my other free time continuing contributing to OSS and especially Apache (I am a member of Apache Software Foundation) and I think bringing my experience from Python/Images/Container cross-knowledge and experience is somethig I would love to help others with.
Where I lack the experience
First - sincere apologies for not being profficient with the PEP process. I am experienced contributor, committer, and PMC member of Apache Airflow, and I created and led to completion quite a number of AIP-s - Airflow Improvement Proposals, but PEP process is somethign I have no experience with. So I would really love some guidance on how (if) I can make my proposals happen.
Also the subject of PEP 668 is relatively new to me - only recently I was made aware of it and I read a lot since. I understand what it does and where it came from- but likely I do not have full context so apologies if I state the obvious or if the points I raised have been discussed already.
Maybe even it will turn out that what I proposed should not be part of the PEP 668, maybe it needs some follow up for some of the details I proposed, but in this I think I’d love to hear some guidance and suggestions on what to do and how to approach it (and how to make sure PEP 668 might be amended/linked to the proposal in the way that current recommendation will not undermine it).
Context
A bit more context from my side as only few people from the discussion here were involved with Disable warning from pip install when executed as "root" user. · Issue #10556 · pypa/pip · GitHub. Initially I was quite opposed to the way how currently PEP complains about using root and directing to virtualenv instead. I still personally do not like the message there (because it is ambiguous and problem has seemingly nothing to do with proposed remediation, but as I understand it, the way it is worded is a by-product of “virtualenv” being “recommended” but kind of “between the lines” and not wholeaheartedly and straightforwardly seen as the “only future-proof solution”. Since a lot of the people who complained about the “unremovable warning” came from the “container” world I figured that PEP 668 could be a bit more detailed and “bold” in proposing it, but it should be acompanied with good practices and recommendations and with rationale that will make it easier to understand why and how PEP 668 and “going venv” is also good for containers (as it is not at all obvious, clean and from the current PEP 668 you get ambiguous messages about it).
I admit that possibly also from my side the discussion was not going in best directions sometimes, but after reading this discussion, re-reading the PEP 668 several times and reading a history of dozen of similar issues I understand much more why virtualenv is the way forward also for containers. I asked a few clarifying questions but I found out that asking too many too precise questions might be too much of a demand (though I personally believe challenging status quo, asking questions where you have doubts and generally being curious is a good thing), so I decided to try it out and answer the questions myself by converting Airflow to use it and fix the problems along the way:
Proposal
Here is the gist of my proposal:
- following @hroncok suggestion - I think “removing marker files in container images” is not best recommendation. Even if it is non normative it is still part of the recommendation and people might be quite mislead by it. I think there is no good reason why containers should be different. I think a bit stronger statement there would not hurt.
Especially that
- In order to make it more helpful to image creators who might have similar doubts I had initially, I think we can extend container part of PEP 668 with set of recommendation for people who build their images, dos and dont’s and best practices. I have quite a good set of findinga and recommendations for container images based on the exericise I’ve done for Airflow. I know it’s not “comprehensive” to cover all the ways how container images with Python are built. But I have quite an extensive experience going through years of development images, mostly including Python and I have already gone through the exercise of converting Airflow images as well as findings and fixes/workarounds to issues I anticipated it will bring (documented in the “Disable Warnings” issue above). Airflow Images are very sophisticated, allow for both extension and customization, they have gone through many iteration and serve many cases. You can see our docs in airflow docs (sorry as a new user I can only add two links in the post) and watch the 45 minutes “Production Docker Image for Apache Airflow” talk I gave last year at the Airflow Summit 2020 where I explain whys, whats and whats and provide more context on decisions made there: Production Docker image for Apache Airflow - YouTube
Maybe this is a bold statement but during the discussions and image conversion I have implemented I think I identified and figured out how to address most of the issues people might have when converting to venv-based images. In any case I am also willing not only to write it up but also extend, discuss, defend and generally become co-author of the PEP and its follow-ups when it comes to the container part. I think I have all the experience and skills (and time) needed for that.
Recommendations best/practice areas
During the discussions and testing I identified the following areas that needs explanation/clarification, and I think once we do it - we could change the recommendation to also make venv and marker files as first-class citizens in container usage for Python:
- impact of venv on the size of certain container images (I tihnk recommending to use alternatives for alpine image which is particularly affected and some basic calculations making conscious “yes it will be a bit bigger but this justified”
- recommend ways how to use (fixed paths and cloning) venv in order to optimize the image sizes (including multi-segmented images)
- ways how to share venv between mutliple (and often arbitrary) users - which is necessary for Open-Shift best practices for writing good images
- recommended ways (and needs) how to activate the venv inside the images including mutliple cases: regular users, sudo, sudo with interactive login - this is most needed because Dockerfiles work a bit counter-intuitively for users who are used to terminal sessions, and some of the - even popular -packages are still not compatible with the “obvious” way of adding venv bin folder to the PATH
- guidance on creation of venv from venv. This is an edge case but one that caused me a lot of headache when converting Airlfow to venv-based image (but I think I solved all those problems and I can come up with good set of recommendations).
Looking forward, for comments and suggestions, apologies for any mishaps I might have made not knowing the ettiquete here - I’d really love to help to make the Python + Images
work better and look forward in helping with that.