PEP 752: Implicit namespaces for package repositories

https://lists.apache.org/thread/zs6ymo5yh8sms67wqjvchkt07sootyym might be a better link - the reporter posted to both private security mailing list and public devlist. the link I posted was the private security list.

And yes you get 404 because the package has never been published, it’s been reserved by someone but we do not know whom, but package has not been published. Or at least we beiieve that’s the reason. When someone registers a project for their organization, and does not publish a project there, this is what you get - 404 and inspector returns nothing.

We believe someone registered the name because that is the only reason that makes sense from the list below: Help · PyPI (last point)

our publishing tool may return an error that your new project can’t be created with your desired name, despite no evidence of a project or release of the same name on PyPI. Currently, there are four primary reasons this may occur:

  • The project name conflicts with a Python Standard Library module from any major version from 2.5 to present.
  • The project name is too similar to an existing project and may be confusable.
  • The project name has been explicitly prohibited by the PyPI administrators. For example, pip install requirements.txt is a common typo for pip install -r requirements.txt, and should not surprise the user with a malicious package.
  • The project name has been registered by another user, but no releases have been created.See How do I claim an abandoned or previously registered project name?

From that post (and guessing a bit, as the first message isn’t shown), it seems that Katwal as reporter uploaded a test package with the same name to PyPI, which was then reported and removed:

I would be happy to proceed with option 1 and transfer the name to you but now I don’t have control over that project because someone has reported that project and the project got removed from the pypi registry and now I am unable to claim it also

You could try a request to the PyPI admins (by email or pypi/support) to release the name, if the above is a correct guess.

Yes, just few minutes ago - before you responded I created a PEP 541-compliant request to get access to the apache-airflow-providers-edge PEP 541 Request: apache-airflow-providers-edge · Issue #5829 · pypi/support · GitHub.

This actually shows the kind of overhead that similar problems might cause - right now we have to spend a lot of ours, and PyPI maintainer’s time to actually find out why we cannot use the apache-airflow-providers-edge name, who claimed it (and whether indeed my guess is correct that someone reserved the package in the first place).

Having PEP 752 in place - organization like ours would only have to request the prefix once, and we would have avoided repeating similar problems in the future.

2 Likes

Hello everyone, long time no see :slight_smile:

I think we’re getting closer to asking for a pronouncement. Since last time:

I think in the coming week or two all there is yet to do is respond to an outstanding feedback item (if I remember correctly) and make something explicit in the proposal that was revealed to be ambiguous during the implementation.

2 Likes

Some questions I had on the PEP:

  1. Should indexes be required to provide a list of all reserved namespaces? The PEP mentions a new ‘namespace detail’ endpoint in the JSON API, but I don’t believe I saw anything about a full list of all grants. Knowing which names are reserved can be useful when choosing new project names.
  2. I’m unclear how project creation works with this PEP. The ‘uploads’ section states that if the proposed name is under a reserved prefix and doesn’t yet exist (i.e. a new project), the upload must fail if “the project is not owned by an organization with an active grant for the namespace”. This seems circular, though, as PyPI to my knowledge doesn’t allow package reservations.
  3. A recent update stated that there are 9,800 outstanding requests for PyPI organisations. Should this PEP talk to the resource impact of namespace requests? At the very least as an expectation-setting exercise, it would be useful to know as a community project that such requests may take several years, or that PyPI may explicitly prioritise requests from paid subscribers (as a hypothetical).

Further editorial questions:

  1. Is a better title for the ‘Uploads’ section something like ‘Project creation’? It specifies nothing else about package uploads under reserved prefixes.
  2. From a specification point of view, please could the PEP provide a better definition of the ‘organisations’ concept than linking to a PyPI blog post? I believe there’s been previous discussion about trying to avoid making interoperability standards too PyPI-specific.
  3. From memory, the authors chose to reject reserved prefixes for individuals, making them exclusive to organisations. Could this be added as an explicit rejected idea?

A

1 Like

I’ve been thinking about this quite a bit, and in general all of the provenance based features suffer from one big, glaring problem.

How does a client know which values are correct for a given project? Every proposal I’ve seen for any sort of provenance based feature glosses over this completely, and just assumes that the client will know… somehow?

Yes we could surface the domain name in the PyPI UI, but I don’t think that’s particularly useful for a few reasons:

  • Millions of dollars have been spent trying to make phishing attacks harder and humans are still terrible at determining which domain is the one they actually meant to use. I don’t think we’re likely going to do better than they are, so I don’t think human verification of the domain name in the UI is going to be an amazing source of safety [1].
  • There’s already signals in the UI that someone who wanted to manually vet that the packages they’re installing come from some claimed source. Adding yet another one doesn’t seem to be of great advantage to me.
  • Domain names are temporal in nature, I have to renew them every year. What do we do when a domain name is no longer registered, or is registered to a different user? If we change the display in the browser, do we expect users to come back and re-validate their packages every time a domain changes hands?

However the bigger problem is how does a client know automatically which domains are valid? We can create a complicated mechanism to have a well known url at a domain that the client goes and fetches provenance attestations at… but we can’t actually trust any of that information unless we know that the domain itself is trusted, and the UI is completely useless for that.

I’ve not seen a single idea on how to actually do that, and until we have that, client side verification of provenances is a pipe dream. It brings back memories of trying to decide how to know which signing keys to trust for a project, where everyone is gung-ho to implement a signing mechanism without figuring out how to actually know how to trust the keys to begin with.

I’ve yet to see someone come up with an idea that actually has all the required pieces. Could we design something like that? Maybe! I don’t know, but nobody has done it yet and I personally think that solving trust is a significantly harder problem than the comments about this idea suggest.


  1. You could argue the same holds true for package names at all, but that’s not something we can really get away from. But we can choose not to add a second hard to validate token. ↩︎

4 Likes

The idea of grants is a good one IMO. All the better if this can help fund PyPI - for the benefit of everybody - by being an exclusive and expensive feature. Clearly, if this is not an exclusive concept then we are back to the single namespace issue, just for namespace grants instead of projects…

My major comment on this PEP relates to the question of why this is a PEP at all? There seems to be no treatment of benefits to repositories other than PyPI, and other than the promise of future work to “prove cryptographically that a specific release came from an owner of the associated namespace”, there is no implication for installers.

Sans-PEP, PyPI could happily have a new /namespaces endpoint, and it can legitimately start injecting _namespace keys into its JSON response (per PEP-700) if so needed.

By making a PEP/standard, we are incurring a cost on all package repositories, since this PEP adds new namespace metadata and a new mandatory PEP-503 endpoint, with no obvious benefit to those repositories.

Another point worth noting for this PEP is if you search for “A package to prevent Dependency Confusion attacks against”, you will find 1,272 results (at the time of writing) all being empty packages there exclusively to reserve the name for a single company. It looks to me that the namespace concept won’t be of value to them, though perhaps if it had existed they may have followed a different naming convention.

1 Like

Others have expressed a desire to know all grants, which I share, however I think there are valid cases where a repository would choose to not expose all grants for privacy reasons [1]. If a repository wants to, there is nothing in the spec that precludes the creation of such an endpoint.

Thanks! I updated the PEP to better express the intent:

If the name of a package being uploaded matches a reserved namespace and either of the following criteria are true:

  • The project does not yet exist.
  • The project is not owned by an organization with an active grant for the namespace.

Then the upload MUST fail with a 403 HTTP status code.

I think this would be more appropriate for PEP 755 since this PEP is purely about the technical specification.

I think the new edit as mentioned above rectifies this.

Rather than linking the word “organizations” to that blog post I added a footnote mentioning that it is merely an example implementation. I don’t think I can express the concept of organizations any better than they are currently described.

Good idea! I added this:

Granting Reservations to Users

As package repositories have a flat namespace, allowing any user to reserve a namespace would be untenable not just because there would be contention for a finite resource, but also because no repository has enough human operators to manage the vetting of an arbitrary number of users.


  1. the policy proposal for PyPI rejects the idea for such a page ↩︎

I very much appreciate everyone who has contributed to these discussions over the course of the last several months! At this point I think I addressed all feedback to the best of my ability and am content with the content of the proposal, so I would like to formally request pronouncement.

@dustin, this is ready whenever you have time to spare :slight_smile:

2 Likes

Thank you for making the updates! Indeed some points are more relevant to PEP 755, sorry for the confusion.

I would like to push back on one aspect though, which I think does bridge the divide between the two PEPs.

It feels odd that (as proposed) package indexes are gaining a large new feature (namespaces) without a specified index of that feature. I think that this is a gap in the specification, as either (a) index developers will be hesitant to provide an unspecified feature, providing a worse experience to index consumers, or (b), developers of different package indexes might implement a /namespace/ endpoint differently, leading to confusion and difficulty in later standardising such an endpoint.

As a sketch example of the usefulness of an index of all namespaces, a workflow tool could fetch & cache all reserved namespaces, using that list to pre-warn a user that their package will fail to publish on {package repository}, avoiding putting extra load on that repository’s servers and the time taken to make potentially slow network requests.

Beyond a standards perspective, I also think it is worth having the index of all grants. Non-PyPI repositories/indexes are often (entirely?) used as ‘private’ (e.g. full PyPI mirrors, internal organisation repositories, personal devpi instances). In these cases, the co-ordination problem solved by namespaces is lower, and namespace prefixes themselves may be more changeable. In this ‘private’ setting, there is no reason not to publish a list of namespaces.

PyPI, though, serves as a community resource. In this capacity, I think the rationale in PEP 755 for rejecting this idea, that “[it] has potential to leak private information such as upcoming products” is the wrong philosophical approach. Projects on PyPI are not secret, and are immediately world-readable upon upload. Organisations that care about secrecy of upcoming products have had ways to manage, and will continue to do so. I do not think that this is a good enough rationale to reject publication of an index of namespaces.

As you say though, and discussed above, there is nothing prohibiting the creation of an endpoint, so a reasonable response here would be that this isn’t a problem for PEP 752, and that we should debate this in 755. I just wanted to note my concerns from both a standards/future standardisation perspective, and the slightly philosophical point above.

A

4 Likes

Cool. Looking forward to it.

Not material to the PEP itself, but a small thing under Repository Metadata:

The JSON API version will be incremented from 1.2 to 1.3 . The following API changes MUST be implemented by repositories that support this PEP. Repositories that do not support this PEP MUST NOT implement these changes so that consumers of the API are able to determine whether the repository supports this PEP.

I believe this needs to be 1.4 now – PEP 740 bumped the metadata version to 1.3 :slightly_smiling_face:

(I’ve recently done some cleanup on the living specs to make the version history easier to follow: Simple repository API - Python Packaging User Guide)

2 Likes

Update: I’ve reviewed the PEP and I’m overall very supportive. I’ve provided my remaining low-level feedback to Ofek and Jarek (and made sure the rest of the PyPI team was aligned with my suggestions) and I think once that feedback is resolved we can move forward with pronouncement.

For transparency’s sake, my high-level feedback is that the PEP should be better aligned with the following goals & anti-goals:

Goals for the PEP:

  • Define a standard for how package names relate to namespaces
  • Define a standard for the interface package indexes provide to users regarding namespaces
    • What happens when an upload conflicts with a namespace
    • How a user knows if a namespace is available or in use
    • How a user knows what namespace a given project is in
  • Backwards compatibility with existing installation methods
  • Backwards compatibility with previously existing project names
  • Don’t limit the possibility of moving toward ‘structured’ namespaces in the future

Anti-goals for the PEP:

  • Define a policy for how a particular index will distribute namespaces to users
  • Define how this relates to the concept of organizations on PyPI
  • Determine which namespaces should/shouldn’t be generally available to users on a given index

(Regarding the anti-goals: the PEP should be fully index-agnostic, and also not dictate policy for any given index including PyPI, but rather focus more on general standards and how interoperability should work for all indexes and all installers.)

I plan to re-review once Ofek & Jarek have updated the PEP and I’m looking forward to acceptance & implementation!

7 Likes