I am also adding a few more notes here. As discussed with @ofek → i agreed to become a co-author of that PEP, also becaue I have two specific cases that I wanted to mention now as motivation for this PEP.
I added my PR: PEP 752: Updates including co-authoring of the PEP by potiuk · Pull Request #4292 · python/peps · GitHub explaining the motivation, and here some extra details:
Airflow consists of the core packages and 90+ providers. Those providers are released more ore less every two weeks (yep up to 90+ of them) - and every few month or so we add a new provider. Each provider is an integration with some technology or service, and we follow single naming convention: apache-airflow-providers-SOMETHING
. For example:
- apache-airflow-providers-google
- apache-airflow-providers-trino
- apache-airlfow-providers-atlassian-jira
and so on. For Apache Software Foundation, it is important to have “apache-” prefixes in the ASF packages, because this is one of the ways ASF can signal that the package is developed and is released following “The Apache Way” - including vendor neutrality, community decision making, and release process (including signalling that the package released has been voted +1 by at least 3 PMC members of the Apache Airflow - thus making release a “legal act of the Foundation”).
Now - when we want to accept co a new provider we discuss it at the devlist - and one of the things we choose is a name we use after “apache-airlfow-providers-”. We have a whole CI set of actions and scripts to follow the naming convention and our whole monorepo is structured to follow that naming convention - for example airflow/providers/amazon at main · apache/airflow · GitHub us where “apache-airflow-providers-amazon” lives. And there are many, many scripts and tooling that rely on that convention - including documentation building, tests, linting etc. etc. We simply have to follow that naming convention.
But… this has drawbacks. Two cases:
apache-airflow-providers-teradata
→ this is anobvious
name for Teradata provider, yet when Teradata approached us with proposal to contribute the provider, it turned out thatapache-airflow-providers-teradata
has been already published by someone else. Luckily it was a good community member and the provider he released was not really used and mostly abandoned - so he agreed to transfer the ownership to us (this conversation is in private@airflow.apache.org mailing list so I can’t share it unfortunatelyapache-airflow-providers-edge
→ this issue is still not resolved for us and we are not sure if we will be able to resolve it. We discussed about a new provider which is more of an internal one and “edge” came from that discussion as the best name (though few others were considered) - we are not releasing it yet - it will be released with the upcoming Airflow 3, however we already have the provider in our repo (airflow/providers/edge at main · apache/airflow · GitHub - but we have not checked/reserved the name) and some time ago a security vulnerability was raised to us https://lists.apache.org/thread/m396pvn9p6kg5pf9lv7oon4b5lsh95k2 and it turned out that someone (not even the security researcher) already reserved that name in PyPI - without publishing the package, so we don’t even know whom to contact to transfer the ownership in case it was non-malicious act.
We are still weeks from releasing the provider but we will have to reach out to PyPI maintainers now to find out how we can claim the ownership - or alternatively go with different name, but that would be rather inferior, because the name perfectly matches what we would like to do. And soon we will have other provider ideas, which we discuss with several proposed names by different people - and if we do not have PEP 752, the only way to protect against such situation is to proactively reserve all potentially discussed names - which would be pretty terrible waste of our time and effort.
I hope that explains why we have the motivation and why we think this PEP is really needed.