Pre-PEP: Add ability to install a package with reproducible dependencies

I agree we need to narrow down the use cases and indeed, come up with a stronger case.

Yes. My understanding is that it should be rejected. How I see it we should only allow to have EXACTLY ONE package to “drive it”.

I actually think now that this is more of use case for pipx rather than for pip - because this is really much more what “airflow” users think about when they are installing Airtflow - they think of it as an “application” with additional dependencies.

Also, the “power user” story of airflow is a bit different than “let the user solve the conflict”. The story is to let pip (or pipx ) to resolve the “additional” installation as a second resolution step with additional packages after the constraints have been used to install “base” application with “selected standard options”.

So my ideal “user story” is:

pipx install --locked apache-airflow[amazon,google,sentry]==3.1.1 my-custom-package==3.5.6,my-other-custom-package==10.4.2

How I imagine it could work - mimicking what we suggest our users - is to run 2-step installation

  • step 1: pipx installs airflow 3.1.1 + all optional dependencies specified using the lockfile stored in the registry
  • step 2: pipx installs all the other packages on top of the original installation WITHOUT lockfile (i.e. for example it would allow to upgrade some of deps airflow uses to a different version than in the constraints, if the custom packages need it). One important part of it such installation step should have also apache-airflow==3.1.1 as additional dependency, in order to prevent resolver to downgrade or upgrade airflow, because the intention is that “airflow 3.1.1” is result of such pipx install step - that’s the intention of the user. Also “–locked” is a good name in this case because it effectively “locks” airflow 3.1.1 (and initially other deps) - but then it allows to upgrade those deps (but not airflow). Also there should be a possibilty to update that environment

pipx update "apache-airflow" new-dep==3.2.3

That would only perform the 2nd step (without constraints, still pinning “apache-airfflow” to 3.1.1.

Following from that, what exactly are we standardizing here? A registry for lockfiles?

What I would like to standardise it is a mechanism built in any registry (including PyPI) how package maintainer might want to upload a “golden” set of dependencies that they knew worked. I would love if I would not have to tell the user two things:

  • where to get constraints from
  • that they should pin their “driving” project to the original version when they are updating the environment

That is, I think, what I would seek from standardisation efforts.

Currently, there is significant friction with the constraints mechanism Airflow uses:

  • users should discover that this is the “recommended” way - this is not a standard, this is specific to airflow and many of our users don’t really realise this is “THE” way we recommend them
  • they have to construct the URL to get constraints from - and that URL contains not only version of airflow but also Python version they use - which often leads to confusion as they might not even realise which python version they are using. With standardized lockfile this is gone.
  • they should pin the “driving” dependency next time they update dependencies - this is an error many of our users do - they add new dependencies and pip resolver finds out that the easiest way to installl those is to downgrade or upgrade airflow - which is against the intention of the user
  • we rely on GitHub - this is something we would like to get rid of. Especially for cases where GitHub has control over it. Recently they decreased rate limits for raw URLs - and I can easily imagine some of our users behind NATs might get throttled - because this is unauthenticated request - also one day they might change ways how such raw URLs are constructed. We would prefer that our users only rely on the registry where they are already installing airflow from (public PyPI, or private one they are using internally). If we add it to standard registry APIs - this problem is solved.
  • our constraints generation is “poor mans” version. We miss a lot of potential variations people might have - architecture, operating system, etc. etc. with standardized lockfile, this problem is gone.

Standardized lockfiles are still pretty new. It’s natural that we should think about “next steps”, but personally I don’t think we’re ready for that next step standard until we start trying out the practice a bit more.

I think currently the problem is that standardized lockfiles can be produced, yes, but their use is I guess almost 0. I don’t understand what would be the use case for standardized lock files “using” currently and why someone would like to produce them at all (but I might be wrong of course - maybe I am missing something). So I have hard time understanding how we are even “testing” it in the community. Is there someone who uses them for something?

I quite agree that maybe it’s “early” for standardizing it, but the thing is that the biggest benefit we can achieve from those files is when we have some standard, low-friction way of using them. Other than that, they seem to be pretty useless if you ask me. So yes - it’s early, but I don’t think it is “too early” - especially that cases like Airflow and use-cases for them are very well tested and tried using custom mechanism (we run Airflow for > 5 years now with constraints and it has proven to be immensely useful). Yeah maybe outlier, but maybe it’s actually a valid use case that will be unlocked for others when it will get standardized? I guess we won’t find out until we try and no amount of “waiting” will change it. “Doing” is the only way to find out as I see it.