PEP 752: Package repository namespaces

sethmlarson · August 19, 2024, 6:14pm

Thanks for writing this up @ofek! I’m in favor of NuGet-style namespace implementation and tying a namespaces’ status to projects (instead of releases or files) and having that status be point-in-time (e.g. is this namespace active right now?) instead of historical.

Visualizing a namespace as a list of rules, where the root namespace is the entry point into “deny uploads outside organization” and then having an optional list of other name matches (either exact foo-contrib-stubs or wildcard foo-contrib-*) to delegate as “public” makes sense to me. I also like the idea of having the full list of “existing” projects encoded this way in the UI (perhaps greyed out, to show that these delegations can’t be one-click removed).

By adding the ability to “allow” a single name to be published within a namespace by non-organization users it lets owners control the namespace more carefully if they’d like to without the complete “open season” that a wild card would represent.

ronaldoussoren · August 19, 2024, 7:49pm

Why is this feature limited to organisations?

I maintain a set of related packages with a common prefix and might want to use this feature for the reasons as sketched in the PEP.

ncoghlan · August 20, 2024, 1:40am

Because we don’t want the PyPI administrators to be overwhelmed with reservation requests.

There’s already a vetting process in place as part of registering organisations, so the PEP imposes the requirement that people go through that process first before gaining access to the namespace reservations feature.

As a DNS analogy: uploading a project is akin to registering a domain name with a registrar, while requesting a namespace reservation is akin to applying to become a registrar.

This sounds like something else that could be tied to the orgs feature: in addition to letting orgs list entire projects and reserved namespaces as official, let them also say “maintainer X represents organisation Y on project Z” to indicate when an org co-maintains a project even if they don’t control it outright.

If you wanted to get fancy, let each PyPI user maintain their own lists of “trusted orgs” and “trusted maintainers”, then display ticks in the web UI after trusted names and question marks after unlisted ones.

That feels more like a PyPI design discussion than a PEP, though (unless there’s a proposal to standardise an API for reporting that trust info)

After thinking about this a bit more, I’m coming around to the point of view that rather than there being a blanket rule, it makes more sense to let the PyPI admins make the final determination on a case-by-case basis. Instead, only the default expectations would be documented:

community orgs would be expected to typically define a public root reservation, potentially with reserved private child namespaces. A genuinely convincing argument that a private root namespace is beneficial to the affected community would be needed for exceptions to be granted (e.g. reducing security risks for potential typosquatting targets in a namespace with few or no other users).
corporate (aka paying) orgs would still need to demonstrate a legitimate claim to their requested reservation, but if the claim is granted, the org would be able to freely choose between “private by default, potentially with public exceptions” and “public by default, potentially with private exceptions”.

Tangent: is public/private the right word pair here? “Private” is being used in a different sense from the way we use it when referring to “private packages” and “private repositories” (where it is more a synonym for “unpublished”). I guess public spaces can exist on private property when talking about land (think shopping centres, entertainment venues, etc), and that aligns with the way “private namespace” is being used here (it’s still public for read access, but write access is privately controlled). There also aren’t any other obvious candidates in English to consider as alternatives.

hauntsaninja · August 20, 2024, 4:32am

open/closed maybe?

BrenBarn · August 20, 2024, 6:19am

This is an interesting proposal but I’m not sure whether it accomplishes what I’d hope for from namespaces.

I don’t really understand the purpose of public namespaces. The PEP says:

The owner of a grant may choose to allow others the ability to release new packages with the associated namespace. Doing so MUST allow uploads for new packages matching the namespace from any user but such releases MUST NOT have the visual indicator.

This seems to say that a public namespace requires that the owner have no control over who uses it. What then is the point of registering such a namespace?

Is the idea that other people using the namespace won’t get the “visual indicator”, but only packages uploaded by the namespace owner will? My feeling is (like @steve.dower said) that that suggests what we really want is some kind of “verified user” thing rather than a namespace. If I trust SomeOrg to upload someorg-package I probably trust them to upload otherpackage as well, whereas I can’t trust someorg-whatever (if it’s a public namespace) since it may have no relationship with SomeOrg. So what I really want to know is “did SomeOrg upload this”, not “does the package name begin with someorg-”.

The PEP also says:

It is possible for the owner of a namespace to both make it public and allow other organizations to use it. In this case, the permitted organizations have no special permissions and are essentially only public.

If the namespace is public, can’t anyone use it anyway? What does it mean to say “the permitted organizations … are essentially only public” (i.e., not that the namespace is public but that the organizations are public).

I agree with comments by some others that the inability to affect existing packages may mean namespacing causes more confusion than it alleviates.

More generally, it took me a while reading this thread and the PEP to realize that this proposal doesn’t really handle what I think of when I think of namespaces. To me a namespace is a space that names live in; what is described here is more like a restriction on names. The difference is that with name spaces, the space name(s) and the “object” (e.g., package) names are distinct, with the latter living inside the former. This is stuff like files in directories, or Python attributes on objects: You can have foo.blah and bar.blah and in both cases the name is blah, and the foo/bar part is not part of that name, but a container for the name.

In this PEP, all names would still be global and unitary; it would just restrict what people can name their packages. That’s not necessarily bad but personally I’m not sure it’s enough of a gain to be worth it rather than waiting for our chance to employ a full-fledged namespace system.^[1]

I tend to think the opposite. I think there are a lot of issues with packaging that won’t be solvable until and unless we take this kind of approach, basically saying “you can keep using the old way, but if you do you won’t get the new features” and thus shave off the rough edges in terms of what workflows are supported. (Ideally we can do that without shaving off too much functionality, just cleaning up the ways of accessing it.) I also think that many users will be perfectly happy to switch from pip install blah to somenewtool install @someorg/blah as long as it still gives them a package that works. As with other packaging issues, there is likely an 80/20 issue where a small number of packages will have a large number of issues, and the important part is to make sure the experience is smooth for the most package users (not necessarily the most packages).

What I mean by this is a system where there is a namespace someorg and in that namespace can live a package; that package may be called mylib and if so you still ask for it with install @someorg/mylib, but what you get is a package called mylib that has some extra metadata that says “this package is part of the @someorg namespace”. That is, the namespace and package name are structurally separated and the namespace is not part of the package name per se. ↩︎

ncoghlan · August 20, 2024, 10:45am

(on the topic of whether there are any potential alternatives to public/private worth considering)

open/restricted could definitely work. I’m not sure it’s really any better than public/private though (that was the problem with all of the potential alternatives that occurred to me: they were at best only arguably as good as public/private; they definitely weren’t clearly better).

jezdez · August 20, 2024, 8:41pm

To be honest, given that this easily straddles the PSF’s efforts around PyPI (e.g. the PyPI organization feature), I’d appreciate it if we’d do this by the book for fear of tainting the draft with concerns of conflict of interest, or at worst setting a precedence.

ofek · August 20, 2024, 9:48pm

By “by the book” do you mean find a different person to be the delegate?

jezdez · August 20, 2024, 10:18pm

Yep, that’s what I meant.

ofek · August 20, 2024, 10:22pm

Okay! I am looking for a new PEP Delegate, if anyone is interested

I’m not sure if this helps assuage your concern but before opening the PEP I shared what was happening amongst many folks and actually had a meeting with Deb and Loren from the PSF the week before opening the PR.

Basically I’m saying I tried to make sure this was broadly useful to the community, folks weren’t in the dark about the proposal, and that organizations/billing/donations were already set up.

jezdez · August 20, 2024, 10:36pm

Not as such, since this is the first time I hear about it.

trishankatdatadog · August 21, 2024, 1:16pm

Thanks for the PEP, @ofek: namespaces are, indeed, a needed idea!

A few comments FWIW:

I still don’t see the point of child grants. They seem to complicate things for I’m not sure what benefit. Would you mind enumerating a few use cases? For example, is it to allow an existing, popular package to continue to exist under a now private namespace?
How would a private namespace interact exactly with existing packages? For example, is it using child grants?
I wonder if there are security implications we hadn’t thought about yet, particularly due to maintaining backwards compatibility with flat namespaces. For example, how should package managers handle mixing packages under namespaces vs not? (One way appears to be through parsing repository metadata.) Also, what happens to distributions that were previously not under any namespace? To me at least, an explicitly demarcated namespace (ala NPM) is better than trying to infer from additional metadata.

Edit: I just read Donald’s replies to why an NPM-style scope wouldn’t work for PyPI, and they make sense to me. In that sense, I agree that backwards-compatibility is a hard requirement here. Still, I am slightly concerned about security implications we may have missed with flat namespaces.

ofek · August 21, 2024, 8:56pm

Thanks for all the feedback, everyone! Since last time, I landed a PR which added more examples of projects (wow how did I forget Django and pytest) and another PR which did the following:

Changed most uses of the word “package” to “project”.
Explicitly state that PEP 708 does not offer protection against these types of dependency confusion attacks.
Changed the semantics to be based on the project rather than individual artifacts. It appears it’s more useful to consider the current namespace permissions rather than if a version was part of the namespace at the time of release.
As a consequence of 2, removed recommended fields in the Simple API.
Added optional top-level owner and namespace keys to the JSON API.
Added NPM-style scoping explicitly as a rejected idea and enumerate all of the rationale.
Rather than giving permission to users, made it clear that authorization for uploads is about identical project and grant owners.
Made it clear that private root grants may create public child grants and the reverse is only possible when the owner also happens to own every project that matches the namespace.

Please let me know if I’ve forgotten anything so far!

I didn’t do this for now to reduce complexity as there is nothing stopping us from doing this in the future.

Alyssa Coghlan:

dstufft:

I wouldn’t put any stipulations on whether a community project’s namespace is public or private. I’d just allow that up to the organization (whether they’re a community org or a corporate org) to make that decision for themselves.

I took that part of the PEP as assuming that community organisations would be low overhead to set up with minimal oversight, so allowing them to reserve private namespaces would be problematic from the point of view of potential malicious disruption to the smooth operation of the package ecosystem.

If a community organisation did want to make their root namespace private with specifically carved out public child namespaces then they’d need to apply to the PSF for reclassification as a corporate organisation (potentially requesting a waiver of any associated fee, but otherwise providing all the same contact information and assurances as any other registered corporate organisation). That doesn’t seem like an unreasonable barrier to me - if you’re reserving parts of the Python packaging ecosystem namespace, the PSF really should know exactly who they’re dealing with.

I share this sentiment and chose to keep the public enforcement for now.

Alyssa Coghlan:

Slightly tweaking @dstufft’s suggested field names:

    "namespace": {
        "name": "foo",
        "namespace-owner": "foo org", # Replaces `owner` field
        "project-owner": "foo org",   # New field
        "approved-project": true      # Replaces `owned-by` field
        "public": false
    }

Instead, I simplified it to the following:

    "owner": "project-owner", # useful outside of namespace stuff
    "namespace": {
        "name": "foo",
        "owners": ["org1", "org2"], # replaces approved-project
        "public": false
    }

Please correct me if I’m wrong but I think for matching projects this captures every case:

The project is “official”. This means the project owner is one of the namespace owners and public can be true or false.
The project is “unofficial”. This means the project owner is not one of the namespace owners and public is true.
The project existed before the namespace grant. This means the project owner is not one of the namespace owners and public is false.

I might be misinterpreting what you’re saying but in case you didn’t notice there is an option to make any namespace grant public so new Jupyter extensions would not need to pick different prefixes than now. In fact, grants for non-corporate organizations are required to be public in part to reduce such churn as you mention (private might be allowed eventually but for now I don’t see a compelling reason).

Was that clear before?

I changed it to per-package repository i.e. PyPI versus some other index. I was trying to prevent only the latter. How would you recommend that I make this more clear?

No, because grant exceptions for non-corporate organizations are currently always public.

I think globs are too complex for what is required so for now I haven’t added this feature.

Yes, this was my rationale. It’s far too impactful to let everyone carve out a slice of a flat namespace and we cannot get rid of the flat namespace.

I’m open to a different name for this feature but currently I think most people would understand namespace in the context of a package repository to mean special prefixes.

Google reserves a top-level namespace google. They have a large number of organizations, one of which is GCP. Only teams within that organization should be allowed to publish projects that match the namespace google-cloud.

There is no relation; existing projects remain untouched.

barry · August 21, 2024, 10:45pm

This is closer to the meaning of namespaces in Python’s import system, although “namespace package” is a very specific way of composing import system namespaces.

But then, we already multiply overload the term “package” anyway. Context is everything^[1]

I lament that too, and have for as long as I can remember. I do think that largely that ship has sailed, but also wonder if there isn’t some other way to create a database that helps navigate that mapping. Of course, it isn’t even 1:1 today since PyPI packages can export any import path they want, even if it collides with other PyPI packages.

Of the bikeshed colors presented so far, I like this one the best. Let’s keep brainstorming!

I was tempted to say @contextlib.contextmanager is everything ↩︎

woodruffw · August 22, 2024, 12:40am

Thanks for opening this PEP @ofek! I’m broadly in favor of package namespacing, and I think you’ve made a very judicious case for preferring a backwards-compatible approach rather than an “NPM style” approach of adding new domain separators/delimiters.

Apart from the PEP’s technical fundamentals, I have some policy/procedural questions about how you and others see the namespace set being maintained:

Given that there are already packages with disjoint owners forming an informal namespace, is there an expectation that PyPI will retroactively enforce the namespace? In other words, if foo is a product from FooCorp and there’s a large foo-* ecosystem of packages, will FooCorp be allowed to register foo- or will they be expected to pick a new name?
How will PyPI handle forward-looking namespace requests? Imagine for example that bar-* is a cluster of lightly maintained packages related to an older OSS project, and BarCorp (not related to the original project) comes along and wants to take over the bar- namespace. Does BarCorp get the name in that case?

(I think these two cases are very similar, but different in a few important ways: in the first FooCorp has some prior claim to the foo- prefix, while in the latter BarCorp has a brand-name interest but no prior claim.)

Separately, on a technical level:

How does this play with PyPI’s existing normalization behavior? A lot of users probably aren’t aware that foo.bar and foo-bar are the same package name, and users of the former may be surprised to discover that new (to them) semantics are assigned to a part of their package name.
For “open” namespaces: do we give people a way to opt out of their package being visually included in a namespace? A scenario where I can imagine this mattering is something like community/project/org disputes: a project might want to preserve the name foo-frobulator to minimize disruption to users but not be visually grouped with the foo org.

Thanks again for opening this PEP, and apologies if my questions overlap in part or in whole with earlier responses – I did a scan over the earlier comments, but I heartily apologize if I missed similar questions.

woodruffw · August 22, 2024, 12:49am

And related to my last comment: have you considered an approach with fixed prefixes?

In other words, PyPI could declare two (or more) global, top level prefixes:

corp- (or com-) for paying/corporate PyPI organization owners
org- for non-profit/community organizations

Because these top-level prefixes would be unique and novel, the procedural challenges about handling pre-existing project names become moot: a new namespace will always be globally unique by construction, rather than by convention/enforcement.

By way of example: if FooCorp wants a namespace, it could be given corp-foo-, under which it could define child grants per the current semantics in the PEP.

I’m curious what you and others think of this idea – I think it has advantages in terms of enabling uniqueness out-of-the-box and giving PyPI/other repositories a way to “type” namespaces, but I can also see (strong!) arguments that it’s too long/verbose and also potentially is confusable with Java-style reverse domains.

Edit: I just realized this idea doesn’t work very cleanly, since projects can already register stuff under the org- etc. prefixes. So PyPI would need to assign new prefixes that are already globally unique, and there probably aren’t many of those left that are also short and descriptive. That, or consider the sunk cost of org and corp (or similar) small relative to an unbounded number of top-level prefixes.

ncoghlan · August 22, 2024, 1:49am

Given that the PSF would be shouldering the eventual administrative burden here, maybe we should be asking Ee to be PEP delegate as the PSF’s infrastructure lead?

Considering how this version would look in the metadata API, I’m genuinely liking the notion of “restricted” namespaces vs private ones.

Rather than “public: false”, the metadata for restricted namespaces would say “restricted: true” to indicate that publishing controls were in place for that namespace.

Similarly, open namespaces would say “restricted: false” rather than “public: true” to indicate that there is an organisation setting expectations for that prefix, but abiding by those guidelines is voluntary rather than enforced.

The three per-project states would still be as @ofek suggested:

project org is a namespace org: approved publisher
project org is not a namespace org (or project has no org), namespace is open: unofficial project
project org is not a namespace org (or project has no org), namespace is restricted: approved project (most likely due to existing prior to the namespace grant, but also covering situations like formerly official projects being retired and turned over to community maintenance without forcing a disruptive name change)

On “owner” as a field name: maybe we should use more neutral field names like “org” at the project level and “orgs” at the namespace level? It may not seem like a big deal, but open source communities often have a fractious relationship with authority and there are plenty of levels of stewardship that would justify an org being given the authority to curate a namespace or endorse projects where calling it “ownership” would be too strong a term. By keeping the technical terminology as purely descriptive as we can, it may help to pre-emptively smooth feathers that may otherwise get ruffled. (I’m OK with namespaces themselves having owners, just as projects have owners: admin authority in PyPI genuinely does belong to specific people and organisations, no matter how any given wider community works. This concern is just about the way the API describes the relationships between projects, namespaces and organisations in general, which may be looser than PyPI’s access controls)

Edit: a pathological case with this structure did occur to me, where a namespace grant has an absurd number of orgs listed. So maybe the namespace level API field should just be “authorized_org: true” or “authorized_org: false”? We don’t actually care about the full list of namespace orgs at this point in the API, just whether the project org is one of them. There could be a separate namespace API to query the full list of approved organisations.

dstufft · August 22, 2024, 1:59am

I think it’s somewhat of a semantics thing. You could argue that foo-* is a “space that names live in” as well (and remember, that maps to foo.* as well). We sit in an awkward place because technically PyPI has a flat namespace

Nuget calls their feature “reserved prefixes”, which may be a better name than “namespaces”?

On the other hand, calling npm style namespaces “namespaces” may also be confusing, because namespace packages in the import system just look like foo.bar, which would map to PyPI as foo.bar (normalized to foo-bar), and it doesn’t really have a concept quite like npm style scopes.

So maybe neither thing should really be called namespaces And it should just be reserved prefixes vs package scopes. Either way, naming is almost certainly the very definition of bikeshedding so I don’t really care too much what we call it.

One thing that recently occurred to me is the two options aren’t technically mutually exclusive, although having both definitely increases the chance for confusion.

In other words, there’s no technical reason why eventually we couldn’t allow something like @foo/bar, where @foo is some org or user ^[1], but that within the “unscoped” set of names, that we still support nuget style prefix reservation.

There’s still other problems to solve with that (@foo/bar being similar to foo-bar, dealing with how to map @foo/bar to the import system, etc), but choosing one doesn’t technically preclude the other one ^[2].

Brendan Barnwell:

I tend to think the opposite. I think there are a lot of issues with packaging that won’t be solvable until and unless we take this kind of approach, basically saying “you can keep using the old way, but if you do you won’t get the new features” and thus shave off the rough edges in terms of what workflows are supported. (Ideally we can do that without shaving off too much functionality, just cleaning up the ways of accessing it.) I also think that many users will be perfectly happy to switch from pip install blah to somenewtool install @someorg/blah as long as it still gives them a package that works. As with other packaging issues, there is likely an 80/20 issue where a small number of packages will have a large number of issues, and the important part is to make sure the experience is smooth for the most package users (not necessarily the most packages).

Part of the reason why I think the way I do, is that we’ve previously had attempts to basically do that bifurcated ecosystem to solve issues… and almost universally people just ignored the new way because there was a huge chicken and egg problem. Nobody wanted to be the first package that jumped to the new way of doing things and break all of the users who were still using the old way.

I mean one of the biggest examples of this is Python 2 vs Python 3, the original expectation people had was that people would just migrate code to Python 3 and leave Python 2 behind, but very few people were willing to do that. Eventually the strategy that worked and that was realistic was one where people could straddle the line and support both the old and the new way while the ecosystem slowly migrated to the new way.

So from my experience, both with Python 2 → 3 and with varying attempts at break-the-world-and-fix-all-the-packaging-problems endeavors in the past, is that you’re signing yourself up for either nobody to use your new thing, or a decade+ of painful transition, so you better be really sure it’s worth it and you better have a thorough migration plan in place.

For example, I cannot imagine a single one of the named orgs (Datadog, Microsoft, Google, AWS) or one of the example open source projects (Django, Mkdocs, Jupyter) being willing to use the new namespacing feature if it meant that only people using the very latest and greatest pip (or even worse, some other non-pip tool) could install their thing ^[3].

My understanding is existing packages just get allowed to exist, they do not get a child grant (child grants would allow them to create new packages under their existing name).

Essentially the restriction applied by a private namespace is only applied when a new project is created, it does not apply at any other time.

TBH I’m not entirely sold that package managers need to do anything. The way I view this feature is largely around applying restrictions in PyPI as to who can register a name, rather than being something that package managers will use much (if at all).

Maybe as a visual indicator in search results?

I’ll be honest, I’ve had nothing to do with the organization code in PyPI, so I’m not exactly sure what the differences are between company. From looking at the admin it appears it’s just a flag on the account (and presumably payment information on file).

My expectation is the answer will be “PyPI Admins will use their judgement”. The PEP allows this to happen, but existing packages using a name like foo-bar will continue to be allowed to use their existing names (but not register new names).

I believe they’re only part of the namespace if they’re owned by the owner of the namespace.

Other than the issues around users and orgs having separate namespaces on PyPI so you need some kind of discriminator or we need to come up with a strategy to flatten those two namespaces together into one combined one. ↩︎
If we could wave a wand and either have prefix reservation or package scopes from the start, then there probably wouldn’t be a huge motivation to add the other-- but if we had package scopes from the start we probably wouldn’t have unscoped packages to begin with. Of course, having two things that can sort of accomplish similiar (but not quite the same) goals likely does add to the confusion, so it would be preferable to have only one, but we can only exist in the world as it is and try to move forward! ↩︎
That’s not saying that other tools don’t matter, just that pip is obviously the 800lb gorilla and if you can’t support pip and use some feature, then the feature might as well not exist for most packages distributed on PyPI. ↩︎

dstufft · August 22, 2024, 2:00am

No problem, I’m fine having someone else do it.

I’ll post in the PyPI Admin back channel to see if any of them want to do it, otherwise I’m sure we can scrounge someone else up.

ncoghlan · August 22, 2024, 2:13am

One idea that did occur to me is that @foo/bar could still normalise to foo-bar for file naming and metadata generation purposes, but with the convention that the package import name would be “import bar” rather than “import foo.bar” or “import foo_bar”.

So it would just be a way of conveying information about the distribution package to import package name mapping rather than about breaking up the flat namespace (beyond the level of separation that PEP 752 grants)