PEP 752: Implicit namespaces for package repositories

To directly answer this question, I don’t think it’s possible for a PEP to be authored by a company. The author is always one or more individuals.

There may be rules on conflict of interest - if so, they would be in PEP 1 and the devguide (as the PEP process is defined by the core dev team, not by the packaging community). The only such rules I’m aware of, though, are for PEP delegates, not authors. In any case, the PEP sponsor (@barry in this case) would ensure that the PEP presents the arguments fairly, so I don’t think this is an issue.

6 Likes

Also for the SC themselves (where at most two members can share an employer).

It’s technically fine for authors to be biased from a process perspective (presumably they have some personal or professional interest, otherwise they wouldn’t be investing time in a proposal!). It’s just not conducive to establishing consensus if readers feel the PEP text isn’t adequately addressing concerns that are raised (for any PEP, not just these ones).

1 Like

If it’s precisely as you say, it’s not a perception of conflict of interest, it’s the very definition of a conflict of interest.

See the Wikipedia article:

A conflict of interest (COI) is a situation in which a person or organization is involved in multiple interests, financial or otherwise, and serving one interest could involve working against another.

Note: having a conflict of interest doesn’t imply someone is dishonest or working against the community.

1 Like

I think this is not very relevant to our domain of technical proposals and rather the text is the most important part. For example, somebody could just have easily came from a community project with the same proposal and by your definition could in theory be advocating for something that would not be in the best interest of companies, etc.

Since the very first draft namespace grants were available to any type of organization.

3 Likes

@ofek You’ll likely find Ideas for client side package provenance checks of interest.

While that initial post is derived from the PEP 752 and PEP 755 discussions, it covers enough topics that PEP 752 considers out of scope that it seemed better to give it its own thread.

1 Like

@ofek asked me for some feedback here from the side of Sentry. I think as far as Sentry is concerned we have no strong feelings about this in any way and I don’t think having this RFC or not having that RFC would change much for us.

I think from the side of a package index, reserved prefixes or namespaces are always tricky because they create a higher administrative burden. The prefix all the sudden is very meaningful and someone controls it. What happens if that person becomes unresponsive and control over the prefix is lost? What if the wrong person registered the prefix? The complexity of this is already somewhat obvious from the related RFC that sets out a policy for PyPI: PEP 755 – Implicit namespace policy for PyPI | peps.python.org

Out of all the options i think the nuget approach is the most reasonable one and that’s also the one that I was hoping the Rust project would chose. It at least leaves it open to the index for how to deal with delegations and and the politics that this will undoubtedly cause.

I think the real question here is if the folks that are tasked with this for PyPI are willing to deal with the load that prefix reservations will cause on them. Given the current already pretty sad state of affairs about 2FA revocations/reissuance my hunch is that one should be very careful with that.

10 Likes

i just re-read the pep’s and i’m in favor

i’d like to suggest allowing a mechanism to explicitly add/keep non-authorized projects in closed grants as well as allowing to close grants even if non-authorized projects are part of it (in which case the non-authorized projects would be grandfathered - but i understand if that gets rejected due to the additional organizational stress and technicalities

1 Like

I have a pending PR (rendered)!

I added a section about this idea. I would appreciate feedback on Java because I’m not familiar with how dependency definitions happen nowadays regarding resolution behavior so I didn’t include it in the table.

Thanks for bearing with me Mike, it took me a bit to understand what this idea was about :sweat_smile:

I also added a section about this, thank you for the detailed writeup! Although I don’t think it’s a viable complete alternative, I made it clear that using attestations for more things is an expressly good idea.

Done anyway!

1 Like

That thread isn’t about one specific idea. It is summarising a variety of different ideas, most of which wouldn’t depend on PEP 470, and some of which would not require any index server changes at all (since they could be implemented entirely as new client side checks against externally hosted metadata files). PEP 752 itself is included in that list of possibilities.

As a result, I’m not claiming that I think PEP 752 is a bad idea in an absolute sense (I actually think it’s a genuinely plausible approach for improving the trust levels of the overall package index), I’m simply stating that I think it’s a bad place to start.

If we build a mechanism for explicit provenance assertions first, then prefix reservations can tie into that by implicitly associating particular PyPI responses with particular default provenance assertions. It would allow a prefix reservation PEP to just be about prefix reservations, not the underlying provenance attestation mechanism.

We don’t have a fully general mechanism for making provenance assertions though, so PEP 752 is having to propose one just for the prefix reservations, which then makes it more difficult to ever introduce a general purpose one (since we’d end up with two different provenance assertion mechanisms existing side by side forever - three if you count end-to-end TUF signing, four if you count digital attestations, although the key management issues with those make me skeptical of their generality).

The variation I currently most like would consist of a few different components:

  • a new “HTTPS with a well-known relative URL” “provenance” API, which allows third parties to explicitly state “we publish these PyPI projects” along with any optional related info that is deemed relevant (e.g. Sigstore info, end-to-end TUF signing keys, organisation level contact details). Ideally (to avoid concerns with the infrastructure management burdens of hosting provenance metadata) this would be designed such that dumping a bunch of JSON files on a GitHub static pages site or ReadTheDocs subdomain would be a valid publication method.
  • an update to the main repository API to allow projects to specify their provenance host, so client side tools can be instructed to flag projects that aren’t from known providers
  • a way to explicitly request client-side validation of the provenance of specific packages against specific provenance hosts
  • and then building on those in a revised prefix reservation proposal that relies on that repository-independent provenance host information rather than having to rely on PyPI-specific organisation and user account information

Edit: PEP 694 would also tie neatly into this approach, by allowing package upload tools to confirm the generated provenance metadata against the staged release, and allow that metadata to be published before the release is flipped over to general availability.

2 Likes

Would the following changes be acceptable/adequately reflect what you’re thinking?

  • Exclusive Reliance on Digital AttestationsExclusive Reliance on Provenance Assertions
  • The idea [5]_ here would be to solely rely on :pep:`740` attestations to verify certain properties of dependencies, such as:One of the ideas here [5]_ would be to solely rely on provenance assertions, such as those made possible by :pep:`740` attestations, to verify certain properties of dependencies, such as:
  • A general downside to this approach is thatA general downside to this approach, should digital attestations become the default mechanism, is that

No, because that’s still missing the essential point: we should design a provenance assertion mechanism that can be shared across repositories first, and then use that for prefix reservations.

That gives two key points for PEP 752 to defend (if you genuinely have your heart set on tying namespace grants directly to PyPI’s account management system):

  • using a provenance assertion mechanism that doesn’t work across repositories (when we can readily design one that does so by using JSON files hosted at a well-known URL on a nominated provenance host)
  • using a provenance assertion mechanism that is only useful for prefix reservations (when we can readily design one that can also be checked directly, regardless of whether prefix reservations exist or not)
4 Likes

I understand now, thank you! I will update the text when I awake tomorrow.

1 Like

I pushed an update! Does the new text adequately represent your proposal? PEP 752 – Implicit namespaces for package repositories | peps.python.org

No, because you’re still treating that thread as a single coherent proposal that I think we should implement as written.

That is not the point of that thread. The point of that thread is to say “Hey, PEP 752 is lumping together a bunch of different things because implementing prefix reservations on PyPI requires underlying package and project provenance assertion and attestation capabilities that don’t exist yet. Maybe we should take a step back and invest some genuine thought in building solid foundations for a prefix reservation capability that are useful in their own right, rather than trying to design everything as part of a single monolithic proposal”.

As far as multiple repositories go, one of the proposals that doesn’t handle them is PEP 752 because only PyPI has any way of checking the claimed organisational account links, and even PyPI’s ability to check them is weak since they’re likely to be tied to a set of credentials that also allows package publication. How do I as a PyPI user check that the “Microsoft” account on PyPI is actually controlled by the same entity that operates “microsoft.com”? If the answer is “You trust the PSF to have checked that”, then that’s putting a big security burden on the PSF. It also doesn’t answer the question of how the PSF verifies that organisation accounts are created by suitably authorised people after the org account feature becomes generally available.

There is at least one alternative (specifically, domain based package publisher records), we could use that would work across repositories, would support independent verification by PyPI clients, and could be used by the PSF to verify org account authorisation, so that’s the key point PEP 752 needs to address: why use PyPI org accounts directly for project provenance assertions when with a bit of additional work we can almost certainly come up with something better that is useful for more than just the prefix reservation idea?

The rest of that section is still a weird mish-mash of responses to things the survey thread doesn’t say (because it’s an overview of potential subproposals, not a proposal in its own right). The only bit of the survey thread that is directly relevant to PEP 752 is the fact that I believe we should be able to design a domain based mechanism for project provenance assertions with much nicer security and administrative properties than we get from relying directly on PyPI’s org accounts feature.

3 Likes

Perhaps could you please tell me how you wish for me to break down your enumeration of possibilities? To me it reads mostly as a general proposal for provenance assertions like you mentioned above:

And if I’m reading the sizing of the letters in your post properly the headers seem to indicate that as well:

# Explicit provenance assertions
## Using email addresses
## Using repository user and/or organisation names
## Using domain names
## Using HTTPS URLs
# Implicit provenance constraints
## Trust on first use
## Sharing trusted provenance lists
## Defining a default verified publisher list
# Namespace prefix provenance contraints with open namespace grants
# Project registration prevention with restricted namespace grants

If my response to the proposal does not indicate that understanding then I will have to think about how to make it more clear that it’s a response to a broad category-level proposal.

For PEP 752 purposes, I think you can ignore the initial survey post. For that, I was doing my best to present everything as neutrally as I could without expressing my own opinions. The headings are just grouping the different ideas into categories rather than indicating any kind of importance level.

The second post in the thread is the one that describes my personal opinion: Ideas for client side package provenance checks - #2 by ncoghlan

That’s the origin of my requests in this thread to focus on comparing the pros and cons of domain control based assertions and PyPI org account based ones:

However, at this point, I would strongly encourage you not to try to immediately come up with a response that explains why my suggestion to focus on using domain control as the basis for account and project provenance assertions is wrong and using PyPI org accounts directly is the way to go. I get the impression my comments are coming across as attacks, and that is triggering a defensive response rather than a contemplative one.

Instead, take some time to consider how the namespace prefix proposal would really change if everywhere a PyPI account name appears in the proposal a domain name were to appear instead. Then consider what it would mean for clients if they could go to a well known URL on the claimed domain and obtain information about which accounts, projects, and namespace prefix reservations on PyPI are controlled by the same entity as the one that controls that domain name.

I think you’ll find that the repository side of things wouldn’t change much (we’d just be replacing one string with another), but on the client side instead of an opaque token that we just have to trust we’d instead have an identifier that we can use to start doing our own additional verification of provenance.

2 Likes

I feel like this discussion is still going a bit in circles, but it triggered a different-but-possibly-related-thought on how we might be able to drive meaningful progress on this topic. I don’t recall seeing this idea (of “virtual indexes”) discussed before, but I’m also not deeply involved in the packaging ecosystem…my participation is much more as a consumer that cares greatly about software supply chain security. And please forgive me if I’ve just re-proposed an idea that’s been previously rejected - I didn’t intend to resurrect old debates.

1 Like

Re: community buy-in in the PEP.

My team, the Charm Tech @ Canonical would gladly claim the charms- namespace which we use internally for “charm” libs, although most “charmers” don’t publish on PYPI for historical reasons.

Having a namespace would enable us to move from historical distribution system to PYPI.

2 Likes

I am also adding a few more notes here. As discussed with @ofek → i agreed to become a co-author of that PEP, also becaue I have two specific cases that I wanted to mention now as motivation for this PEP.

I added my PR: PEP 752: Updates including co-authoring of the PEP by potiuk · Pull Request #4292 · python/peps · GitHub explaining the motivation, and here some extra details:

Airflow consists of the core packages and 90+ providers. Those providers are released more ore less every two weeks (yep up to 90+ of them) - and every few month or so we add a new provider. Each provider is an integration with some technology or service, and we follow single naming convention: apache-airflow-providers-SOMETHING . For example:

  • apache-airflow-providers-google
  • apache-airflow-providers-trino
  • apache-airlfow-providers-atlassian-jira

and so on. For Apache Software Foundation, it is important to have “apache-” prefixes in the ASF packages, because this is one of the ways ASF can signal that the package is developed and is released following “The Apache Way” - including vendor neutrality, community decision making, and release process (including signalling that the package released has been voted +1 by at least 3 PMC members of the Apache Airflow - thus making release a “legal act of the Foundation”).

Now - when we want to accept co a new provider we discuss it at the devlist - and one of the things we choose is a name we use after “apache-airlfow-providers-”. We have a whole CI set of actions and scripts to follow the naming convention and our whole monorepo is structured to follow that naming convention - for example airflow/providers/amazon at main · apache/airflow · GitHub us where “apache-airflow-providers-amazon” lives. And there are many, many scripts and tooling that rely on that convention - including documentation building, tests, linting etc. etc. We simply have to follow that naming convention.

But… this has drawbacks. Two cases:

  • apache-airflow-providers-teradata → this is an obvious name for Teradata provider, yet when Teradata approached us with proposal to contribute the provider, it turned out that apache-airflow-providers-teradata has been already published by someone else. Luckily it was a good community member and the provider he released was not really used and mostly abandoned - so he agreed to transfer the ownership to us (this conversation is in private@airflow.apache.org mailing list so I can’t share it unfortunately
  • apache-airflow-providers-edge → this issue is still not resolved for us and we are not sure if we will be able to resolve it. We discussed about a new provider which is more of an internal one and “edge” came from that discussion as the best name (though few others were considered) - we are not releasing it yet - it will be released with the upcoming Airflow 3, however we already have the provider in our repo (airflow/providers/edge at main · apache/airflow · GitHub - but we have not checked/reserved the name) and some time ago a security vulnerability was raised to us https://lists.apache.org/thread/m396pvn9p6kg5pf9lv7oon4b5lsh95k2 and it turned out that someone (not even the security researcher) already reserved that name in PyPI - without publishing the package, so we don’t even know whom to contact to transfer the ownership in case it was non-malicious act.

We are still weeks from releasing the provider but we will have to reach out to PyPI maintainers now to find out how we can claim the ownership - or alternatively go with different name, but that would be rather inferior, because the name perfectly matches what we would like to do. And soon we will have other provider ideas, which we discuss with several proposed names by different people - and if we do not have PEP 752, the only way to protect against such situation is to proactively reserve all potentially discussed names - which would be pretty terrible waste of our time and effort.

I hope that explains why we have the motivation and why we think this PEP is really needed.

2 Likes

A slight tangent, but I’m not sure I follow with apache-airflow-providers-edge – it seems to be 404 on PyPI (Client Challenge) and also not found on Inspector. I tried to open the mailing list thread but I encountered a “Content not found” error.

A