PEP 752: Package repository namespaces

I was going to ask “but what if you’re a corporation with no users” and then I answered myself “well then probably you can’t afford a namespace”. But then this made me think of something that seems incompletely addressed in the PEP: what happens if a company reserves a prefix but then goes out of business?

The PEP says:

If a grant is shared with other organizations, the owner organization MUST initiate a transfer as a prerequisite for organization deletion.

That’s all very well, but what if they don’t? The situation I’m envisioning here is where some startup buys a prefix and then later suddenly collapses without transferring the prefix or even attempting to delete it. Does this result in a “zombie” namespace that isn’t deleted but can never be used because no one owns it?

The other text in the grant removal section of the PEP talks about “unclaiming”. That term seems to suggest to me that the namespace would, after unclaiming, be available for claiming by someone else. Is that the case?

If an unclaimed namespace can never be claimed again, then I would worry about the possibility of startups using VC funding to grab prefixes that then wind up becoming zombified and unusable, even if the company never did much with the prefix.

Or, if an unclaimed namespace can be claimed again, then that would compound what I still see as the main problem with this PEP, namely that it’s difficult for users to tell at a glance what a given name prefix means. Already we have the problem that foo-bar could either mean “FooCorp owns this package” or “someone else owns this package because they created it before FooCorp bought the prefix”, or even “there is no prefix here because no one ever bought the foo- prefix”. But now there could also be the possibility of “FoozerCorp owns this package because they bought the foo- prefix formerly owned by FooCorp”. And of course this could be extended ad infinitum if a single prefix keeps getting bought by one company after another.

This is similar to the problem that exists with domain names, but with domain names it’s clear there is no “MUST” about transferring the name to someone else, or any need for an explicit “unregistration”; if you stop paying for your domain name, you just lose it and it can be bought by anyone else.

Because of these issues I think the PEP should talk a bit about the possibility of prefix “abandonment”, and clarify whether unclaimed prefixes can be re-claimed by entities unrelated to the original owner. It might even be good to include an explicit duration for prefix ownership (i.e., you must “renew” ownership every N years) — or maybe something like that is planned for PyPI organizations already?

My assumption is that if you’re getting a prefix through a corporate org, if you stop paying for the org then you also lose the prefix and it gets abandoned similar to how a domain name works. Otherwise there’s nothing stopping people from paying for one (month, year, whatever) to get the prefix and then canceling.

For OSS projects probably a similar idea like around PEP 541 would need to occur for a defunct OSS project to “lose” their namespace.

It would be good for this to be explicitly addressed though.

1 Like

I’ve not been keeping up with this discussion fully and have only skimmed through it. I apologise if I’m bringing up something that has already been considered… Thanks @ofek for keeping this PEP updated!

I think we could have this literally hit https://pypi.org/simple/%25prefix/projectname to fetch that project, with PyPI putting the project in that namespace based on a METADATA key on upload.

It’s a bit more disruptive but I think it’d be fine/feasible too. None the less, I’m not writing the PEP so whomever considers doing so can have this additional idea to consider.

Copying NPM’s syntax (e.g. @foo/bar) would alienate a large number of Windows users because the @ character is considered special in PowerShell.

As someone who regularly uses multiple platforms… this feels like an extremely weak argument IMO and it would be good to tighten this up.

For starters, Powershell isn’t exclusively a Windows thing. And, we already have enough of variants of this with extras+zsh, URL prefixes+powershell as well as markers + everything(?).

TBH, there’s lots of possible bikeshed colours we can use for the specific syntax to avoid the problem that whole portion covers. organisation::package.name is one that should work on most platforms AFAIK. But, if it doesn’t on some specific shell, I think that’s fine too - basically every shell has quoting capabilities and I don’t think we have to actively try to cater to every one.

It’s obviously an issue if every shell needs quoting for a potentially common operation like this, so like… let’s not use spaces or something like that here.

(not calling out anything else in that section coz that all seems good to me)

I don’t think “you can’t afford it” is a particularly strong corrective measure here.

[snip] raised $2.5 million for a company [snip] $1.8 million to secure the domain name. [snip] “It was worth it. No regrets,”

(from https://www.geekwire.com/2024/covid-era-whiz-kid-is-back-and-he-brought-a-friend-a-wearable-always-listening-99-ai-companion/)

More generally though, I am a bit confused by this PEP. I think this PEP should completely keep out of the business of specifying how exactly the namespaces should be granted/owned and defer that to the index?

Then, we can have policy that the PSF and PyPI can evolve outside of the PEP process entirely. This means that we can try to figure out the technical details of how installers should interact with PyPI (which is “unchanged” in the proposed design and significantly differently in the rejected idea) and what the UX should be separately from… well… the much more complicated problem of how names should be allocated (given it interacts with trademarks, is possibly liability for the PSF and all that good stuff - let’s leave that to the PSF and lawyers to figure out).

In other words, I think this PEP really shouldn’t have anything around the processes for managing these namespaces and that should be a PyPI detail (similar to how project ownership isn’t a PEP level specification, but rather a PyPI detail – there’s a separate PEP discussing package name policy that underwent legal review and all that jazz). At most, this PEP can be suggesting things to do for pypi.org as recommendations but it’s gotta be non-binding.

TBH, I had limited energy this evening to look at this and “why is there so much of the policy stuff in this PEP” was a thing I said out of frustration while reading this PEP… because it’s really a weird mix of a PyPI policy process document that’s structured like a PEP and a minimal design for how namespaces would work mixed in.

6 Likes

We do have variants of this problem but this would introduce the first issue with package names themselves, there currently are zero.

Are you recommending that I remove this line from that section?

Are you suggesting to move the Grant Applications section under Recommendations? Everything else seems expressly required.

I understand you probably have a particular interest on how this impacts installers (being a maintainer of pip) and I realize there is not much discussion of client-side behavior currently, but I plan to improve that in the next PR :slightly_smiling_face: Please keep in mind however that, as you mentioned, there is not a whole lot to speak about with the current proposal because the metadata is at the level of the project rather than artifacts now so the best I can do is add some educated guesses on how tools could incorporate this data for extra security checks.

That’d work as would removing the discussion of syntax quirks in that section since it doesn’t feel to me that it adds much to the argument of “why not that style”.

To be clear, I’d also be OK if this stuff stays as-is as well. :slight_smile:

Oops… I’m realising I didn’t make any concrete suggestions at all around this, which isn’t particularly productive. Thanks for asking this! Concrete suggestions below…

  • Almost everything mentioning “organization”, “grant”, “application” and “policy” should be in the motivation or recommendations sections – not in terminology, rationale or specification sections – and shouldn’t use RFC-style MUST words.
  • All of PEP 752 – Package repository namespaces | peps.python.org should also be a UX recommendation.

I do, indeed, and I like that the “let’s not be disruptive” approach taken.

That said, I’m also interested as someone whose employer might be interested in this, as a community member who might end up asking for a namespace for their project, as well as a PyPI moderator interested in the specific procedures we’d end up having around this.[1]

And… Having seen how non-trivial things were for project name retention policies, I’m also wary of those details drowning out the discussion about implementation and UX of this functionality, or deferring agreement on the design for a long time. :sweat_smile:


  1. why do I have so many hats… ↩︎

4 Likes

I’m confused by this. The feature is fundamentally about giving organizations a grant so I have no idea how I can take those words out.

I think open/restricted is the best option that has come up so far:

  • open namespaces/prefixes:
    • anyone can create new projects using the given prefix
    • variants:
      • claimed: a namespace grant applies, but it is set to “open” for this prefix
      • unclaimed: there is no namespace grant covering this prefix
  • restricted namespaces/prefixes
    • only approved publishers can create new projects using the given prefix
    • variants:
      • private: only the org owning the namespace can create new projects
      • shared: one or more additional orgs or users have been granted permission to publish projects using the given prefix (either through the “existing project” exception that applies when the namespace was granted, or via an access delegation from the org owning the namespace)

Using public/private instead still makes sense (due to the two namespace categories being “open for public use” and “restricted to private use”), but using private/shared doesn’t (because neither “shared for private use” nor “restricted to shared use” accurately describe the intent of the private/restricted category)

If there are genuine technical reasons for the restriction, then the PEP should state them. Company confidentiality isn’t a good enough reason, since the answer there can easily be “If you want to keep your association with the name secret, then you’re not allowed to stop anyone else from using it”.

However, I’d prefer to see the PEP say “If a repository provides a way to list all granted namespaces, it should expose them using this API” (and then describe the API). That entire section can then be flagged as SHOULD or MAY so it isn’t a blocker for PyPI adding prefix reservation support (and if it’s only as weak as MAY, cite the reasons why the transparency rationale that applies for trademark applications doesn’t apply to namespace grants in the Python packaging ecosystem, otherwise make it a SHOULD).

Even if the simple repository API and PyPI’s search functionality may not be 100% comprehensive, they’re good enough that if they give you the green light you’re unlikely to be disappointed when you actually attempt the package upload.

Repository operators should be actively discouraged from keeping namespace grants secret, not told that grants must be kept as secret as possible.

All good questions, and amongst the reasons I’m not convinced about the idea myself. I only wrote it up because I don’t think we have a strong case that NPM-style namespacing can’t be done, which means this PEP shouldn’t frame the situation that way.

Instead, it should make the case that for backwards compatibilty reasons, any NPM-style namespacing proposal would need to be built on top of a Nuget-style prefix reservation proposal anyway in order to control how the explicitly scoped namespaces interact with the top level flat namespace. Since prefix reservations are a necessary, and potentially sufficient, step, it makes sense to start with just that part of the proposal, and leave the explicit namespace option until later (if anyone is sufficiently motivated to pursue the idea, and is able to make the case that adding explicit namespace syntax is worth the hassle of introducing it).

On this front, I’ve started thinking of “reserved prefixes” as “implicit namespaces”: the namespace is there in the package name, but there’s no dedicated syntax making it stand out. In foo-bar-baz, the prefix may be foo or it may be foo-bar, or there may be no prefix at all.

The NPM-style syntax is then a potential follow-up “explicit namespaces” feature: the namespace portion of the package name is made syntactically distinct from the rest of it. foo-bar-baz remains ambiguous about its namespacing, but @foo/bar-baz is specifically the bar-baz package inside the foo namespace while @foo-bar/baz is specifically the baz package inside the foo-bar namespace. Only one of the three would be permitted to exist on any given repository (since they would all normalize to foo-bar-baz), but attempting to install the forms with the explicit namespace syntax would fail if the project prefix in the repository API didn’t match the requested prefix. (As Pradyun noted, that could potentially be handled on the server API side by exposing the metadata for namespaced projects under both prefix-projectname and @prefix/projectname. So in the example, requests for foo-bar-baz would always work, but requests for @foo/bar-baz and @foo-bar/baz would only work if they matched the actual repository namespace setup)

Thanks @pradyunsg, this nicely articulated something that had been bugging me, but I hadn’t been able to consciously frame, let alone put into words.

As Pradyun noted, when it comes to project names, we have PyPI policy & procedures defined by PEP 541, and technical simple repository API documentation defined by PEP 503 (HTML) and PEP 691 (JSON).

PEP 752 is currently combining a proposal for “how namespace prefix reservations should be published in repository APIs” and a proposal for “how namespace prefix reservations will be managed on PyPI” into one document. While it does make sense to publish both proposals concurrently (it doesn’t make sense to approve the technical spec without some sense of how the namespace grants might be managed in practice), the relevant approval processes are different:

  • the technical proposal primarily needs developer consensus that the feature is worth implementing and the specific API is viable (i.e. the usual process for Standards track PEPs)
  • the policy proposal needs to be formally reviewed and signed off by the PSF to make sure they’re on a sound legal footing in applying it to PyPI (i.e. the process followed for PEP 541, where the BDFL-Delegate was the PSF’s infrastructure lead)

(For easier review purposes, PEP 752 could draft the policy proposal inline but it should still be clearly separated from the technical proposal. Adding an Appendix could be a good way to go)

2 Likes

It would be helpful to know what exactly I should extract/what is “policy”. An enumeration would assist me very much.

I like this and after we figure out how to split the document up I will begin using this language.

Do you think this meshes well with the concept of “implicit” and the potential future “explicit”?

It’s probably easier to approach it from the other direction and identify the purely technical parts together with the key assumptions informing the design. Everything else will be draft policy that can go in an appendix.

My suggested key assumptions for splitting out the technical parts:

  • API proposal only applies to repositories that allow creation of new projects. Pure mirrors and supporting repositories like Pi Wheels that take project names from a base repository are not affected, since they don’t allow any new projects to be created.
  • API proposal assumes that the repository defines named entities that hold authority over projects and packages. For technical API design purposes, these owner IDs are opaque strings. In the draft PyPI policy, they would map to user IDs and/or organisation IDs. The distinction between users and orgs (if any) would exist inside the opaque strings, so it won’t affect the technical API design. The fact owner IDs are repository specific is the reason it doesn’t make sense to mirror namespace grants to a different repository.
  • projects are assumed to have at most one owner, but may have multiple maintainers
  • namespace grants are assumed to have exactly one owner, but may have multiple authorised publishers
  • publishing approval within a registered namespace grant may be:
    • open: any publisher may create new projects under the namespace
    • restricted: only the namespace owner, and publishers they specifically authorize, may create new projects under the namespace

Based on those assumptions, the technical API design would need to cover:

  • defining this as a minor JSON API version bump (from 1.0 to 1.1). The HTML API version will remain unchanged as the new fields are only defined for the JSON API and not added to the legacy HTML API.
  • reporting project ownership via a new owner string field (already in the PEP)
  • reporting associations between projects and namespaces via a new namespace attribute (as proposed in the PEP, but with the field details changed):
    • prefix: the reserved prefix that defines the relevant namespace grant. When multiple potentially relevant namespace grants exists, this should be the longest grant that matches the project name.
    • restricted: is publishing to this namespace restricted or not?
    • authorized: is this project published by an authorized publisher for the namespace?
  • the expected interpretation of the five possible namespace states:
    • no namespace field: there is no namespace grant associated with this project
    • restricted: false and authorized: false: unofficial package in an open/public namespace
    • restricted: false and authorized: true: official package in an open/public namespace
    • restricted: true and authorized: false: pre-existing or no longer official package in a restricted/private namespace
    • restricted: true and authorized: true: official package in a restricted/private namespace

My own preference would be for the technical API design to also cover a parallel JSON API (separate from the existing project API) that defined the following scheme for describing namespace information:

  • prefix: the reserved prefix that defines the relevant namespace grant
  • owner: the specific entity that is responsible for the namespace
  • restricted: whether publishing to the namespace is restricted or not
  • authorized_publishers: a list of the additional entities (if any) that are permitted to publish “official” packages for the namespace
  • namespace_prefix: the reserved prefix for the root namespace grant that defined this namespace (omitted for root namespace grants). When multiple potentially relevant namespace grants exists, this should be the shortest matching namespace prefix.
  • namespaces: a list of nested namespace grants (if any) that exist within this namespace

Actually providing that API would be a SHOULD requirement rather than a MUST. Repositories would also be free to offer “hidden” namespace grants that enforce the upload restrictions, but aren’t included in the public list.

I think so, as the two concepts relate to different aspects of the namespacing proposal. Specifically:

  • the open vs restricted distinction relates to the restricted boolean in my API suggestion (describing restricted: false and restricted: true respectively). The distinction primarily matters at package upload time.
  • the explicit vs implicit distinction relates to the authorized boolean, where the explicit namespace syntax (if it were to be proposed and accepted) would only work for projects with authorized: true set, while the implicit namespace syntax would work regardless of whether the project was authorized or not. As a result, the implicit syntax is best when artifacts can be authenticated by other means (such as exact hashes and/or TUF metadata), but the explicit syntax would potentially be desirable when a project losing (or never having) authorization really should result in a resolution failure. The distinction primarily matters at package resolution time (which may not be installation time due to the common use of artifact hash based lock file formats).
4 Likes

Thanks for loosening this (don’t forget to update the ‘Rejected ideas’ section), but I think the current language is still not great:

Root grants given to community projects SHOULD only be shared but is ultimately up to the reviewer of the application.

I would assume most projects, community or otherwise, will want ‘private’ prefixes, because restricting who can upload new packages with a prefix is - IMO - 90% of the point. By all means ask more large projects about that, but it sounds like you’ve heard that from at least two projects already, and I think @minrk said something similar for Jupyter:

The current PEP wording still suggests that ‘shared’ prefixes are right for almost all community projects, and gives no guidance as to when a private prefix might be appropriate.

I think the PEP would be better if all prefixes were private by default, and organisations that wanted to open up a (sub)prefix for anyone to upload could do that themselves through PyPI after getting the grant.

Also, a smaller note:

Reviews of corporate organizations applications MUST be prioritized.

Robotically applying this could mean that community organisations never get their applications reviewed, if the corporate queue never empties. I would hope that PyPI admins wouldn’t actually follow this to the letter in that scenario - but then someone could say ‘you approved a community organisation yesterday, but PEP 752 says…’

1 Like

I agree with this split (and thanks @pradyunsg for articulating the issue here). Also I’ll note that the whole organisation feature of PyPI was implemented without any PEP (for the same reason as we are talking about here, it’s purely a PyPI implementation and policy issue). I’m uncomfortable with the idea of a PEP defining a standard API that uses a feature that isn’t itself standardised.

I don’t think we have an option here to retroactively try to standardise organisation support, so at a minimum I’d like the technical PEP to explicitly state what an index that doesn’t support organisations should do in the API (presumably all fields will be optional and the answer will simply be to omit them, but let’s be explicit).

6 Likes

I was thinking of an “informal community” case, e.g. pydantic- seems to be a somewhat popular prefix for packages related to/extending Pydantic, which is itself owned and maintained by a company. FWICT these pydantic- community projects are not generally mentioned or referenced by the upstream, suggesting that it’s a purely user-driven community without involvement from the parent project.

Agreed! I think this could be as brief as "Python packaging has both normalized and non-normalized names; packages with names like foo.bar are normalized to foo-bar, which in turn may represent “bar in namespace foo if foo is a namespace.”

I agree that publishers might not (immediately) want this, but to push back a little: I think having top-level prefixes that disambiguate index namespaces solve a lot of the potential problems raised so far while also sidestepping a lot of potential sources of community disruption (such as projects having to leave a namespace that’s been given to an unrelated org or company, or having to change from -suffix to prefix- naming).

(I think my bottom-line view here is that fixed prefixes a la corp- aren’t the only way to solve this, but one of the least disruptive if the goal is to design this PEP without any new package name syntax. But I agree with @pradyunsg’s comment about a new syntax not being too difficult to handle on the index side itself.)

1 Like

Hi all,

I maintain the Chaos Toolkit which has spawned a few community projects over the years. By convention, most of these projects use the chaostoolkit- prefix.

I’ve read the PR and various messages on this thread but I have a few questions I cannot find answers to:

  1. on the approval criteria for community organizations, the PR says:

Organizations that are not corporate organizations MUST represent one of the following:

  • Large, popular open-source projects with many packages

But then the question is how do you protect before that point? Chaos Toolkit is niche and probably not massive enough to warrant such a grant. But assuming it can reach the right status, it seems counter intuitive that I can only protect the community once I’ve become big enough

  1. Why using the name to carry such a semantic rather than solely rely on the metadata? It seems to me that overloading the name might make it a burden in the future. It also prevents packages with a different name to belong to a given namespace when the project is not named after that prefix.

  2. Could we define “actively” in:

The organization SHOULD be actively using the namespace.

  1. Is this valid for aws then?

The namespace SHOULD be greater than three characters.

  1. Would there be any merit to let organisations associate certs/keys and sign packages they deem to be signalled/protected instead? This could be highlighted to users as well.

  2. This seems quite bad to me:

Projects that are not tied to a grant owner and are part of a private grant (i.e. existed before the grant) should have an indicator that conveys inauthenticity or lack of trust. A good choice might be a warning sign (:warning:).

So I created a project for aws 10 years ago, I’m not part of a private grant (that I’m therefore not even aware of it) and suddenly I should be tagged with “inauthenticity or lack of trust”? Am I reading this wrong? That feels deeply wrong to me.

  1. As the PR mentions, there is a lack of resources for accepting new organizations (I have submitted one in late June and it’s still in the queue for instance but I fully appreciate this is the way it is). But this means the PR already narrows the audience for this feature to paying organisations. I migt be missing a lot of context here but is this a trend we are meant to see more and more in the future?

Overall, while I can appreciate that this feature might bring some value, it feels it’s mostly of interest to a rather specific set of users (large organizations which naturally need to protect themselves) but I’m not quite clear what we, maintainers of medium-ish set of packages, actually gain and I’m afraid the fact you can end up in conflict with a private grand could cause damages.

2 Likes

But then how do you prevent people from squatting on a name at the same time?

It integrates into existing systems. The current proposal, for instance, doesn’t require the entire packaging ecosystem to be updated in order for this to be supported, just PyPI. But in spite of that, you can potentially know that some package is official by name alone w/o having to go and manually check on PyPI.

Look at it from the other perspective: chaostoolkit gets a grant to signal official projects, but you have pre-existing community projects that are allowed to continue with their names. If you don’t have any indicator then people may overlook that those projects don’t fall under your grant which is a potential risk when using naming to fight supply chain attacks via name confusion.

1 Like

How? Considering the name itself doesn’t tell you if:

  1. a grant actually exists for that name
  2. that package belongs to said grant (if it was created before the grant)

If you really want to make the most of these security features, you still have to go to Pypi to figure it out. At that stage, the name is still not helping since it’s the metadata of the package and the Pypi repository which will be the source of truth.

Unless pip starts showing this information somehow? That would be at least something I find useful if pip told me “hey this package you want to install doesn’t belong a namespace owned by this organization, do you want to carry on?”. My knowledge of pip internals is too shallow to know if that could be done without a specific round trip though.

Is there a place where this “potential risk” is quantified?

The downside of suddenly setting a rather negative indicator on packages that have been there for years and well respected is very strong. I have projects in the chaostoolkit ecosystem that aren’t official but well cared for and I would hate that suddenly they get penalized because they aren’t part of a chaostoolkit grant. As someone who cares for that particular community, my only option would be to ignore the grant altogether and therefore not benefiting from the security improvements mentionned here.

I’m also curious what’s the benefit for the most downloaded packages which simply don’t belong to an organization. Are these projects left out entirely of the announced benefits purely because they don’t belong to a paying organization/large OSS community organization?

A project such as boto3 is interesting. It’s the most downloaded package, and we can assume it is because it is downloaded from automated builds all around the globe. Yet, its name implies it wouldn’t belong to an aws namespace. I guess a boto namespace could be created. But at the end of the day, I can already determine that this package belongs to a well-known user Profile of aws · PyPI (funnily enough, it seems AWS doesn’t have an organization). So what would a namespace offer?

Overall, I’m just confused about the data points that led to this PEP and I’m concerned about its unexpected consequences.

3 Likes

Not if my project’s own documentation says, “the <…> prefix is reserved and thus projects with that prefix are official and verified to be owned by us.”

No, and I don’t even know how you would go about quantifying it beyond running a user study.

It depends on what package you’re talking about. Solo projects that don’t have related/sibling packages are never going to benefit from namespacing anyway since their namespace has a single entry already.

Note that while the default state of pre-existing packages under a reserved prefix would mark them as unofficial, it is expected that PyPI would offer a way for grant owners to indicate that use of the prefix is authorised for that project even though it is published by someone other than the grant holder.

It probably wouldn’t happen very often with restricted corporate grants, but community projects with open grants might be more likely to extend that level of trust to another publisher.

1 Like

Ok, but. . .

  1. That still means you have “manually” check somewhere rather than knowing just from the package name itself.
  2. It still doesn’t really tell you, since anyone could put that in their documentation without actually reserving the namespace. Okay, they probably wouldn’t, since it would be sticking their neck out and people would get mad at the org when they found out it wasn’t true. But still, it seems to me the point of this namespace thing is for the ownership to be indicated by a trusted third party (i.e., the package index).
  3. It seems likely that projects will rarely be able to say something that clear, because even if the namespace is private and restricted, there could still be grandfathered-in packages that aren’t controlled by the namespace owner. So in many cases the project docs would have to say “the prefix foo- is reserved and everything there is official. . . except for foo-bar and foo-whatsit, don’t trust those ones”.

I fully appreciate the amount of work here but, yes I would assume a user study would have made sense for this PEP. At least from well-known corporate orgs and large community projects.

I’m also assuming the Pypa team has a indicators about reports where this was an issue.

2 Likes

it is expected that PyPI would offer a way for grant owners to indicate that use of the prefix is authorised for that project even though it is published by someone other than the grant holder.

I support this approach because there are many third-party and community packages available at Search results · PyPI for example, but within that query there are some that I encourage users to utilize when facing challenges or looking to avoid duplicating existing solutions. While we couldn’t validate all, specific ones that have been vetted shouldn’t be flagged in some negative way…