PEP 752: Package repository namespaces

kpfleming · August 23, 2024, 12:03pm

Slight error here: this should be “Matches the name of a standard library module.”, since the list was a series of reasons why a package name could be disallowed.

sethmlarson · August 23, 2024, 6:32pm

I’m also against inventing new syntax for installers or further muddying the linkage between “what you typed into pip install X and what package you are using / need to vet on PyPI”. In my mind having a prefix is enough? It protects users that can at least type the first word correctly and prevents typosquats/malicious packages from ever appearing to be from a organization protecting their reputation.

This wasn’t suggesting “globs”, the foo-* syntax was showing a way that the UI (and/or API) could present a “prefix” versus a name that doesn’t allow further suffixing.

I like this naming! Matches the mechanism much closer than “namespaces”.

I’m not 100% convinced on this either, I’m in agreement with @dstufft. My mental model for making “policy” about the origin of packages on PyPI involves pushing “identity” outside of PyPI as much as possible and using PyPI as a convenient place where all the attestations are hosted (with TUF applied so we have integrity of all those attestations and artifacts).

brettcannon · August 23, 2024, 11:07pm

Another approach is to add metadata such that projects can declare the package names they provide, but this is getting off-topic.

yoavdw · August 24, 2024, 1:46am

That’s the problem though - the prefix isn’t actually reserved, because there are existing packages who already have it, and aren’t by the organization.

This makes all the benefits of this idea a lot less useful, since the prefix doesn’t actually promise anything about the package.

BrenBarn · August 24, 2024, 5:50am

That’s true, but that’s a problem regardless of what we call it.

dstufft · August 24, 2024, 5:56pm

Certainly the more existing packages within a prefix the less useful that reservation is.

I don’t think that makes it not useful though? There’s metadata and UI indicators to help differentiate between the two cases, and the reservation acts to “stop the bleeding” so no new packages can use that prefix.

Given we’re retrofitting this near 20 years later, I don’t think there is a solution that isn’t without problems. I’m in favor of the reserved prefix idea because I think it gives us something like 80% of the benefit, with pretty minimal problems.

The worst of the problems being that there are possibly existing packages using that prefix not owned by the org. I find that a far simpler problem than all the ones around trying to introduce npm style scopes at this point, and there are 3 pretty straight forward options:

Disallow claiming a namespace that has any prior use outside of that org.
Forcibly rename projects using that namespace out of it when it is claimed.
Allow the namespace to be registered, blocking new registrations but not existing.

The first of those is technically possible, but I think it’s a hard sell, particularly around the fringes (e.g. what if an org has 500 packages named foo-bar and there is one other packge named foo-frob that isn’t theirs).

The second one I strongly dislike, as it’s turning a prefix delegation into a contentious issue that is bound to cause hurt feelings and drama every time it happens… .and I think it’s also just wrong to do? ^[1].

The third of those is the one the PEP takes, and I think represents a reasonable middle ground. My understanding is that there is no guarantee that any org would get a prefix that they ask for either, so if an org is asking for one that would overlap with a significant number of existing projects, the PyPI team is empowered to decide if adding the reservation will actually help or not.

To some extent, the first option also boils down to this option, except without support from PyPI for doing it orgs would be incentivized to engage in the legal processes or bullying or paying people money, etc. ↩︎

yoavdw · August 24, 2024, 7:28pm

Thank you Donald for addressing this thoroughly!

I agree with the problems raised with options 1 and 2. Specifically option 2 sounds the worst, and I’m very against it.

Option 1 seems like it would be worst for large organizations (For example, Google), but could work for smaller ones. I think we would need some real world data to see how good it is. Specifically, we could have an “expedited” version of PEP 541 name transfers, kind-of mixing option 1 and 2, to assist organizations with claiming a prefix, but also not forcing any rename. Assuming the organization is willing to pay, it might be viable for a lot of organization to go that route.

Though I’m not sure even if we went with option 2 instead of 3 that would solve all my issues with this approach. Your options 1 and 2 made me think of reserved prefixes without existing projects polluting it, and exploring it a bit more, I’ve formulated an additional reason why I want the NPM approach

One of my issues with reserved prefixes, the approach that maximizes backwards compatibility, is exactly that. I think the visual indicator of an @google/cloud-storage is a much, much, more obvious one than an indicator on PyPi. I don’t know exactly how this marker would look, but I’m pretty sure that if this PEP went through with reserved prefixes, I would have no idea those are even a thing.

I would probably assume it’s something like this:

Really, I think for most users, this idea is going to only be noticed as that. Verified “ticks” on some projects, regardless of their names.

That’s why I think a new character for scoping is the only thing that will make users aware that namespaces are a thing. Which will actually make it work for the intention of this PEP: “signal a verified pattern of ownership”. Otherwise, I think a verified “tick” works just as well as a reserved prefix.

yoavdw · August 24, 2024, 7:33pm

And as a PyPi, NPM, and NuGet user (though admittedtly a bit less of NuGet), I never even knew NuGet had namespaces, while it was very obvious to me on NPM.

EpicWink · August 24, 2024, 11:31pm

I like this idea of NPM-style namespaced name’s as an alias for traditional project names. What are the problems with it?

It seems both this and the OP’s idea (PEP 752) could be introduced, but it would be duplicating functionality.

Names on PyPI must start with a letter, right? That means to guarantee no collisions, the NPM alias idea needs those aliases to start with a symbol.

dstufft · August 24, 2024, 11:50pm

I’m not a huge fan of the specific UI affordances Nuget uses, but it’s the blue checkmark in search, or the “Prefix Reserved” on the details page:

See this post (and to be fair, the rebuttal/response is here).

I think it largely comes down to which trade offs and which downsides we’re willing to live with.

I will say if someone wanted to write out a proposal for npm style scoping, I’m more than happy to read it (not as a PEP-Delegate of course!), but as someone who knows the internals well and can help point out what will work (or not) on a technical level.

ncoghlan · August 25, 2024, 3:23am

I think something like this could work:

Dedicated syntax for reserved prefixes

The syntax for project names is amended to accept @prefix/projectname. In all name fields in interoperability specifications, file names, and URLs, this is normalised to prefix-projectname. A new optional Name-Prefix field is added to the core metadata. If set, this field must match a portion of the name up to, but not including, the hyphen separator between the prefix and the rest of the name.

When rendering distribution names for display to users, if Name-Prefix is set, the name SHOULD be rendered as @prefix/projectname rather than prefix-projectname.

When a package index supports prefix reservation, it MUST restrict use of Name-Prefix to approved publishing organisations for that namespace. The field MUST NOT be used outside the scope of an approved namespace grant.

Note that this restriction is applied at the time of artifact publication. It indicates that at the time the artifact was published, namespace access was authorised, even if that access has subsequently been relinquished or revoked.

I’m not sure if I think that’s a good idea or not, but it’s likely worth mentioning in this PEP as a potential future way to add explicit prefix separation, even though this PEP leaves the prefix implicit at the project naming level.

yoavdw · August 25, 2024, 9:58am

I really like that!

Both to make sure I understood correctly, and to make things clearer:

A user installs with the command pip install @foo/bar.
pip actually tried to download foo-bar
If the package index supports it, they return a metadata field of Name-Prefix if the package is by a verified organization.
If this field is returned, pip validates that it starts with foo.

Since the Name-Prefix field is not part of the installation name, but just used on the client side to verify metadata, this means an existing package called foo-bar will prevent an organization from a creating a @foo/bar package, but not a @foo prefix.

If this is what you meant, it sounds good to me and addresses my concerns. Just want to clarify you didn’t mean something else.

It solves the issue with my original idea where the customizable alias was confusing. It also let’s users know @foo/bar is verified, without disrupting existing packages. That sounds awesome.

While we’re exploring this idea, it’s also worth discussing backwards compatibility and early adoption, as that was a big concern raised with NPM-style namespaces.

On the organization side, there’s no drawback to registering a name prefix, and no backwards compatibility concerns.

For users developing applications, this will probably not be a problem. I assume within a year or two of releases, a big chunk of teams will be able to easily enforce all the developers and CI/CD tools use a compatible pip version, and will have no fear of adopting it. (In my team, for example, I think everyone currently uses pip>=23).

For users developing libraries, I fear a dependency or sub-dependency will introduce a @foo/bar dependency, preventing users on older pip versions from using it. I think a reasonable assumption, following Python’s end-of-life policy, is that if a library’s minimum Python version bootstraps a pip version that supports name prefixes, it’s safe to use them. Maybe libraries should be discouraged to use name prefix installs indefinitely, though I’m not sure about that part.

I think it’s important to acknowledge this would be an additional pain point for python users for a while, though I don’t think it’s a huge one.

pf_moore · August 25, 2024, 10:34am

How would pip do this? And what would pip do if it’s not the case? Remember:

Pip doesn’t download the metadata initially, it selects a candidate version based solely on what’s in the simple index API (or for --find-links, what’s in the filename).
Not all versions of a project will necessarily have the same Name-Prefix metadata.
If pip finds the Name-Prefix doesn’t match later in the install process, it may not be able to backtrack to a different version (the interaction between the resolver and the finder is limited).
To determine Name-Prefix for a sdist could require a build step, and doing a build just to reject a candidate is extremely expensive.

Of course, for “pip” here, you can pretty much read “any installer”. I don’t know how uv’s resolver works, but I imagine the same concerns would apply somehow.

The main point here being that this proposal would result in a lot more breakage than the proposal in PEP 752. It’s quite possible that it would be possible to address all of this, but I like PEP 752 precisely because it takes great care to minimise the impact on existing uses and tools.

Maybe, but it’s still breakage that we’d have to deal with (people would raise issues on the pip tracker that we’d need to respond to).

ncoghlan · August 25, 2024, 10:48am

Yes to the first two parts. Probably not to the last two parts, as it still wouldn’t be mandatory to set the “Name-Prefix” field in the metadata of specific artifacts even if such an optional field was defined.

Instead, if pip (or another installer) were to perform live validation of an explicit namespace prefix, it would use the PEP 752 API metadata to check for a currently valid namespace grant at the project level rather than checking any specific candidate artifact’s internal metadata.

I do think there’s some potential merit to the concept, but I see it as something that could be built atop PEP 752, rather than as a direct alternative to the basic concept of prefix reservation.

yoavdw · August 25, 2024, 10:54am

Thanks for the quick answers, @pf_moore and @ncoghlan!

Sorry, it appears I misunderstood and over simplified how this would work. I understand this may be harder than I initially expected to integrate as part of this PEP.

I started reading pip’s internals guide last night to get a better grasp of how things work, but I will try to read some more about how it actually works and how pip reads metadata to understand how hard this will be.

takluyver · August 25, 2024, 1:57pm

Root grants given to community projects SHALL always be public [i.e. no restrictions on other people uploading packages with that prefix]

I appreciate that it’s good to have incentives for companies to pay for PyPI, and that the admins don’t want to be swamped with requests to reserve prefixes, but developing a significant security feature and then saying that only paying companies can turn this on really feels like the wrong direction.

I understand that you can still see which projects belong to a community organisation associated with a prefix. But I can already see on PyPI that e.g. notebook is owned by the jupyter organisation, so this isn’t a significant change. The added value for namespaces is knowing that any* jupyter- package comes from the Jupyter project.

(* obviously pre-existing packages owned by other people are a wrinkle, but preventing arbitrary new jupyter- packages from random strangers is still a big win)

What are the possible outcomes of this rule?

People learn that corporate prefixes like google- or microsoft- are ‘safe’ but community ones like jupyter- or mkdocs- are not. PyPI gives corporations a significant advantage over community projects (plus potentially academia, government & charities, unless they’re willing to pay up to get ‘corporate’ status).
People see that e.g. jupyter- is a registered prefix, but don’t realise the difference between public & private prefixes, so they pip install jupyter-big-wooden-horse, mistakenly assuming that it comes from Project Jupyter, and bad stuff ensues.

As this proposal stands, I’d be reluctant to register a public ‘namespace’ at all, because of the risk of people thinking that I’m in control of it. But even if community organisations don’t request a namespace, once this system launches and people learn about it, some will assume that e.g. the jupyter- prefix belongs to the jupyter org.

This distinction between ‘corporate organisations’ and everyone else seems like a terrible idea. At a quick glance, it seems like neither NPM nor NuGet have an equivalent distinction. Please reconsider this.

ofek · August 25, 2024, 2:08pm

Since last time, I landed a PR yesterday which did the following:

Changed the Delegate from Donald Stufft to Dustin Ingram.
Renamed private/public terminology to private/shared.
Allowed community projects the ability to have private namespaces but still should be shared-only. The determination is at the discretion of reviewers. This was requested by various projects such as Apache Airflow and OpenTelemetry.
Added a separate visual indicator for projects that are part of a shared namespace and are not owned by a grant holder.
Added a separate visual indicator for projects matching a private namespace that existed before the grant.
Made it explicit that root grants cannot overlap e.g. if there is a grant for foo-bar then one cannot apply for foo.
Adjusted language about the application review prioritization of corporate organizations.
Clarified that it is forbidden to replicate namespace configuration from other repositories.
Clarified the link between name-squatting attacks and dependency confusion.
Require a page in the UI for each namespace with metadata like the current owners.
Rather than encouraging a page that enumerates every namespace, this is now expressly forbidden.
Added a recommendations section that lives outside of the concrete specification.

Again, please let me know if I’ve forgotten anything so far!

An informal namespace is an interesting edge case; do you mean no ecosystem and just people using foo as a convention? I would assume a grant would be given in the case where it’s just a convention rather than community, although I would be interested in concrete examples!

Yes, I would assume so if the other project is lacking maintenance or, far more importantly, users. I can add this to the new recommendations section in the next PR.

This would be something nice to add in the “How to Teach This” section although I don’t think it should be in the context of namespaces in particular but rather as an ecosystem. I’m open to suggestions!

I am against this generally because it has the potential to cause confusion for users, which is more important than the desires of maintainers.

I don’t know about this, I’d have to think more. My intuition is telling me it’s not worth it because publishers would be unlikely to want their packages prefixed in that way.

If projects were only owned by organizations that makes sense but the top level field would also apply to users so I think it’s better to keep the same name.

I totally missed this in the second round of feedback PR but it will be in the next, thank you!

I’m ambivalent on switching away from the namespace terminology. At least currently I don’t see a significant benefit.

This should not be a MUST because it’s possible for a well-known prefix to not be the name of the company but rather the product is what is popular.

Can you please explain a bit more?

If the namespace is private, as I’m sure Google’s would be, then yes.

Alyssa Coghlan:

The syntax for project names is amended to accept @prefix/projectname. In all name fields in interoperability specifications, file names, and URLs, this is normalised to prefix-projectname. A new optional Name-Prefix field is added to the core metadata. If set, this field must match a portion of the name up to, but not including, the hyphen separator between the prefix and the rest of the name.

[…]

Note that this restriction is applied at the time of artifact publication. It indicates that at the time the artifact was published, namespace access was authorised, even if that access has subsequently been relinquished or revoked.

I’m not sure if I think that’s a good idea or not, but it’s likely worth mentioning in this PEP as a potential future way to add explicit prefix separation, even though this PEP leaves the prefix implicit at the project naming level.

I originally attached meaning to the state of grants at release time but based on feedback folks seemed to find more value in and be less confused by project-level metadata.

As for the NPM approach, I haven’t yet heard a clear cost-benefit analysis refuting the reasons for rejecting that approach although I’m open to more feedback!

I definitely agree that this proposal does not preclude scoping syntax and in fact would provide an iterative improvement strategy.

This is changed based on feedback from projects desiring a namespace.

There is now no distinction other than that there is a higher bar for accepting applications of community projects e.g. if you have no users because you just came up with an idea at a hackathon then it’s extremely unlikely you would be granted a namespace.

ncoghlan · August 25, 2024, 3:09pm

I think that terminology is genuinely worse: with “shared” taken to mean “open to the public”, what would you call a private namespace where the owning org has granted other orgs permission to upload new projects under the namespace?

I was actually puzzled about this with the original wording. What should a supporting repository like piwheels.com do with reserved prefixes? I think ignoring their existence is the right thing for piwheels.com to do, but I don’t think the PEP clearly explains why that is the case.

If the wording is supposed to mean “If you care about reserved prefixes on PyPI, you also can’t trust any binary builds that are built by someone other than the PyPI project owner and then published elsewhere”, it doesn’t actually say that.

That seems excessive given the stated motivation (to avoid leaking company confidential information earlier than intended). It also seems genuinely problematic if the only way to find out if a given name is reserved is to try it and see rather than being able to check against a published list of active prefix reservations.

It’s standard for companies seeking formal registration of things like trademarks to weigh up the balance between getting those protections in place as early as possible against the fact that the existence of the application is public information (at least in the jurisdictions I checked: Australia, the US, and the EU). This feels like a similar situation: if you want to take away a piece of the commons by reserving a namespace prefix, you have to be public about the fact that you’re doing it. If you don’t want the information to be public, then you don’t get the protections.

PyPI may allow companies to have pending reservations under review that aren’t public yet, but also aren’t immediately enforced when granted, but that would be purely between the PSF and the paying orgs (since any such feature wouldn’t have a public API by definition).

If the namespace level field becomes something like authorized_project (as a boolean) to avoid the pathological “too many orgs to reasonably list” scenario, then the project level field name wouldn’t matter for the PEP (since API clients would just be checking the boolean value provided by the server rather than doing their own client-side check)

ofek · August 25, 2024, 3:26pm

I would view both of the following situations as “sharing” because there is a centralized owner:

An organization shares a grant with another organization in order to allow package uploads.
An organization sets a grant to shared thus allowing anybody to upload packages.

I thought that shared would be good because ultimately a grant has a single owner and is able to share permissions. What would you recommend that also conveys there is one true owner who determines how a grant is used?

True, I will update to explain the rationale. Thanks!

I was also a bit saddened by this because I love discoverability for users but based on talking to certain other folks who maintain a very similar service, this will not work operationally.

That’s a good point, this will be remedied in the next PR. However, outside of this proposal I still think it is useful to have the owner of a project returned by the API.

dstufft · August 25, 2024, 8:32pm

Alyssa Coghlan:

The syntax for project names is amended to accept @prefix/projectname. In all name fields in interoperability specifications, file names, and URLs, this is normalised to prefix-projectname. A new optional Name-Prefix field is added to the core metadata. If set, this field must match a portion of the name up to, but not including, the hyphen separator between the prefix and the rest of the name.

When rendering distribution names for display to users, if Name-Prefix is set, the name SHOULD be rendered as @prefix/projectname rather than prefix-projectname.

When a package index supports prefix reservation, it MUST restrict use of Name-Prefix to approved publishing organisations for that namespace. The field MUST NOT be used outside the scope of an approved namespace grant.

Note that this restriction is applied at the time of artifact publication. It indicates that at the time the artifact was published, namespace access was authorised, even if that access has subsequently been relinquished or revoked.

I think the place where I struggle to see this working, is how would pip know whether or not a project is expected to be available via @prefix/project name or not (and @pf_moore touched on this as well).

The fact that pip (and other clients) are responsible for normalizing @foo/bar to foo-bar, means that they need to know ahead of time whether foo-not-bar is intended to be part of the namespace. They can’t rely on in-the-package metadata because the “negative” case is the important case here and a bunch of historic packages under foo-bar won’t have that metadata.

I guess we could add a bit of metadata to the /simple/$project/ page, which would tell a client to treat an install for @foo/not-bar as a 404 even if there were results for foo-not-bar?

Would we prevent new foo-* projects from being released under this model? IOW would this basically be reserved prefixes with a special syntax or would we still allow people to register arbitrary names, so that the owner of a prefix doesn’t know if @foo/whatever is going to be available to them in some amount of time?

I think that with the metadata added to the API to indicate whether @foo/not-bar should be a valid name for foo-not-bar or not this proposal would be technically feasible.

I don’t think I particularly like it (though I like it better than other attempts at NPM scoping I’ve seen), primarily for two reasons:

The mechanism for ensuring that foo-not-bar isn’t available under @foo/not-bar being a flag that basically says “if this condition holds true, disregard the rest of this message body” feels very much like the kind of thing that is easy to mistakenly “lose” somehow. Particularly in cases where names are being shuffled through a variety of different projects which may or may not end up emitting the normalized name rather than the “scoped” name, and I fear that we’ll end up situations where different combinations of tools can allow arbitrary people to post to @foo/* unless we treat this as basically PEP 752 but this extra syntax ^[1].
In either case, we effectively still have a flat namespace, it’s just some packages have a extra name that can be used to refer to them by. However that extra name is basically not supportable by any tool released prior to some future hypothetical date. So to me it feels like largely a syntax that very few people will actually use, since we’re still effectively in a flat namespace and compatibility for the new syntax would be very low.

I think requiring a build step would be a hard blocker ^[2], because a large portion of the purpose of the feature is to prevent typo squatting attacks, and if you’re executing the build step of an attacker’s project then the attack was successful.

Important to note, that you already can’t find out if a given name is available for you to use without just trying it.

We already don’t have a “real” way to enumerate which names are in use on PyPI ^[3].
Even if a name isn’t registered on PyPI, that doesn’t mean a registration for that name will be successful ^[4].

So I don’t think this particular thing is adding any new special thing.

And in the case where it’s PEP 752 with extra syntax, I don’t personally find the extra syntax very compelling as a feature on it’s own. ↩︎
Though avoidable by lifting the metadata up to the API response at the project level. ↩︎
There technically is a way, but they’re very much relying on very specific implementation details of how the queries are constructed on PyPI and differentiating between 404 or an empty response… and there’s no promises that won’t change in the future. The closest to an official way to tell is to iterate over the entire change log from the begining of time and try to reconstruct the current state of the database from it. ↩︎
The prohibited names table rows are not public, the “similar name” feature uses predefined rules but determining if a name would match them requires iterating the entire list of registered names which isn’t possible. ↩︎