PEP 752: Package repository namespaces

yoavdw · August 22, 2024, 3:24am

Thinking about this more, and please excuse me if this is impossible on PyPi’s side, but what if we had the following system for migration to NPM-style namespaces:

The namespacing will be NPM-like - foo/bar.
Any package that uses the new namespace feature will be required to select a single server-side alias for their package, defaulting to foo-bar for foo/bar, but allowing a different alias if that happens to be taken or if a change is to be avoided. This alias will be displayed on PyPi.
This means users that haven’t updated their pip and don’t care about the namespaces feature wouldn’t need to change anything. They could still download any package using the old syntax, and if an existing package is moved by an org to the new namespacing feature, they can choose the old-name to not disrupt existing users.
Users that did update their pip can be sure that when they download a package starting with foo/, it is an official package by the foo organization, and not an old, unrelated package.

trishankatdatadog · August 22, 2024, 3:30am

In my mind, there then needs to be a way to securely audit that this property has not been even accidentally violated. The best way I can think of right now is to securely delegate namespaces to their respective owners (perhaps using something like PEP 480), and then recording these delegations on a tamper-evident log (e.g., Sigstore) so that independent third parties can monitor whether these desired security properties (e.g., existing packages continue as usual, but new packages belong under a namespace) continue to hold true for packages under naemspaces.

dstufft · August 22, 2024, 4:04am

I’m going to start with this part of your post first

I think namespaces provide two broad, but related, benefits:

They help protect users from an attacker who is typosquatting and pretending to be someone else by using a well established naming scheme (e.g. aws-*) to trick users into installing a malicious package.
They help protect organizations from reputational risk, from attackers using their name as part of an attack.

These two properties are basically the same thing, but from two sides of the equation. This idea would let users decide (assuming the company opted in themselves) to engage that protection, but it doesn’t offer any options for organizations.

There’s also already been typosquatting attacks against the way NPM scoping works.

In that case, the package names in question were @scope/name and they were uploading names like name (dropping the scope). It’s not clear how successful that attack was or would be, but the packages were downloaded around ~50 times each over the 2 days they were up.

I believe the way this would have to work at the protocol/standards level is that the flat namespace would essentially be the “real” package name, and the scoped name would be an alias. It’s been awhile since I’ve looked at this code, but I’m pretty sure that pip validates that the package name matches the name of the metadata inside of the package. PyPI also cannot modify any contents inside of the package to “fix” this for users.

We would probably need to add this dual concept of two names to the METADATA standard itself so that pip could verify that the file they got was what they expected.

PyPI would then presumably have /simple/foo-bar/ and /simple/@foo/bar/ (or maybe the / would be escaped, IDK)… and I guess would have to rename the files so pip knows which ones are valid? Or would have to include both names in the API response and have people switch from parsing filenames?

I don’t want to say it’s impossible to make this work, because I don’t think that it is, but I think that there is a lot of subtle nuance there that someone would have to sit down and work through, and I think it would complicate our model by a fair amount.

More fundamentally though (and this may just be a me thing!), I find the concept of packages effectively having two names to be really confusing, and when those two names can be different (e.g. is there anything stopping someone from having @foo/bar aliased to blah-thing?) I think that makes things really hard for users to reason about.

Why? If someone is in a position to violate this they’re in a position to upload arbitrary packages to PyPI anyways. Why is validating that for namespaces special when validating it for names themselves currently isn’t possible? I’d suggest that we’d presumably handle namespaces when we handle it for projects in general.

We also already place other restrictions on uploaded names that people can’t attest that we haven’t violated either (for instance, we block registrations of names for any stdlib module, including new ones).

I’d suggest that tamper evident log of changes to PyPI is it’s own topic/PEP.

yoavdw · August 22, 2024, 4:29am

I’m not sure I understand how this differs from NuGet style namespaces. Because the NuGet style will keep existing packages with the namespace, an existing package called google-something could still be used for hurting Google’s reputation in the future. In NPM style namespaces, users will be more suspicious that all of their google packages start with google/ except one.

Yeah, I assumed this is how it would work. That the canonical name would actually be the flat one, with an option to install using the scoped name.

That’s fair. I’m not one to judge how hard these changes would be.

I think a lot of this will go on how this is presented. I understand that this is not true at all on the backend, but I would want users to think of it not in the sense that the package has 2 names, but just another normalization that will be applied, with the clause that all packages installed using the new prefix are actually verified to be by that organization.

It could get a bit confusing when a user adds a new requirement to a project using the new foo/bar style, and then someone else working on the project clones it and tried to install using an old pip version - which will fail. That seems like it will be the biggest migration pain here, but I think it will usually be solved with a fairly simple google search.

The @foo/bar being aliased to blah-thing should hopefully almost never happen - and the choice to make it configurable is just there in-case foo-bar is already taken. I don’t know if anything is stopping an organization from doing this, besides it being really confusing for their users.

Overall I agree with your points that this complicates things a lot. I don’t know the insides of PyPi and the metadata standards, but I do think this has a more advantages than disadvantages from the user side, and that it addresses a lot of the issues brought up in this thread - so it might still be worth it.

hugovk · August 22, 2024, 4:30am

Does the PEP-Delegate need to be a core team member?

PEP 1 says a core dev can volunteer:

The final authority for PEP approval is the Steering Council. However, whenever a new PEP is put forward, any core developer who believes they are suitably experienced to make the final decision on that PEP may offer to serve as its PEP-Delegate by notifying the Steering Council of their intent.

But also can be a community member if no core dev volunteers:

If no volunteer steps forward, then the Steering Council will approach core developers (and potentially other Python community members) with relevant expertise, in an attempt to identify a candidate that is willing to serve as PEP-Delegate for that PEP.

pawamoy · August 22, 2024, 10:10am

Thanks for the update @ofek!

Such projects are uniquely vulnerable to dependency confusion attacks.

For example, […]

Although PEP 708 attempts to address this attack vector, it is specifically about the case of multiple repositories being considered during dependency resolution and does not offer any protection to the aforementioned use cases.

I’m probably nit-picking, and I’m not a security expert, so feel free to disregard my comment, but IIUC dependency confusion attacks are exactly and only the kind where the attacker reserves a public name (on PyPI) that matches the name of an internal, private project (not published on PyPI) used by a specific company. This description seems to match the description in the linked article by the way.

To me, the “aforementioned use cases” are not dependency confusion attacks, but rather name-squatting or popularity-squatting attacks (calling them typo-squatting would be a stretch IMO).

So, in short, I would rephrase as:

Such projects are vulnerable to name-squatting attacks. For example […].

Furthermore, internal projects of corporate organizations can be vulnerable to dependency confusion attacks, which reserved prefixes would alleviate too, in a different way than PEP 708. For example, […Corp E has corp-pkg-abc internal package, attacker tries to publish corp-pkg-abc on PyPI, they cannot because the corp or corp-pkg prefixes are reserved by Corp E…].

Namespaces are per-package repository and SHALL NOT be shared between repositories.

Hmmm, maybe I would add a second sentence:

Namespaces are per-package repository and SHALL NOT be shared between repositories. Each repository is responsible for managing namespaces the way they want, and repositories are not expected to replicate namespaces from one repository to another.

minrk · August 22, 2024, 11:51am

Sorry, I had definitely misunderstood the public/private namespace distinction, since the definitions in the PEP are not what I expected those words to mean (the “open/closed” labels some folks have mentioned feel more intuitive to me). I believe I understand now, thank you.

IIUC, really the only thing public namespaces get is a visual indicator on ‘official’ packages (and in metadata), not any real significance to the names or namespaces themselves. In that case, I think this is a small positive impact for projects like Jupyter (visual indicators are nice), but since only private namespaces appear to address typosquatting, dependency confusion, etc., I’m not sure Jupyter adopting a public namespace would have any real impact on users. The main real user pattern I see being affected for public namespaces is pypi.org search (which I definitely use!).

It might be worth addressing this significant difference in Public/Private scope in the Motivation (or Public Namespaces section) because I don’t really see the goals in the motivation as addressed by public namespaces, and I started with the incorrect impression that Jupyter might have the benefits laid out in Motivation (typo squatting, dependency confusion), when it really won’t.

For Public namespaces, I might also suggest adding to User Interface a visual indicator that a package that is in a namespace but not official. It’s hard to make a decision on the absence of information in UI, so if someone visited a tyop-squatted package within a Public namespace, there is no indication that it’s not official, only an absence of official indicator, which I suspect most folks wouldn’t know to look for.

dstufft · August 22, 2024, 1:38pm

@dustin has graciously agreed to be the PEP Delegate, I’ll throw an email to the SC to make sure they sign off on it, but I don’t foresee any concerns.

trishankatdatadog · August 22, 2024, 1:43pm

Right: to be clear, I didn’t mean that auditability and monitoring should be requirements for this PEP (for the reasons you mentioned above), but they should be at least be discussed in Security Implications IMHO.

trishankatdatadog · August 22, 2024, 1:51pm

You are right. To be super clear, maybe a good clarification would be something like:

“Many orgs are vulnerable to name-squatting attacks which can ultimately result in dependency confusion.”

jaraco · August 22, 2024, 3:02pm

I’ve been working on just such a database. In coherent-oss/coherent.build#3, in order to infer dependencies from imports, I’ve created a MongoDB database. The draft implementation is in the import-detection branch.

In the pypi module, there’s a function distribution_for, which takes an import like jaraco.compat.py38.r_fix and resolves it to the jaraco.compat package in PyPI using a MongoDB database that’s world-readable. You can grab that connection string and paste it into MongoDB Compass and explore it.

Currently, the database is only populated with the 8000 most popular packages as published by hugovk (of which 7130 had resolvable “root” packages), plus a few less popular ones used by coherent.build.

Feel free to explore that and let me know if you find anything surprising.

barry · August 22, 2024, 4:35pm

Sounds good to me, thanks @dustin ! Please do submit a PR to switch that over and you can ping me as PEP Sponsor^[1] for review.

and SC member; you don’t really need to bother the whole SC ↩︎

ofek · August 22, 2024, 4:49pm

I have to address other feedback so I can include that change if it’s alright with you!

dustin · August 22, 2024, 5:33pm

Please do!

bavalpey · August 22, 2024, 7:39pm

While I like the spirit of the idea, I am thoroughly against it because of the ramifications it would have. It upsets existing standards and retrofitting a policy like this onto an existing packaging system has consequences. It also unfairly favors established corporations and projects, increasing the barriers for new projects/corporations to gain traction. It should be fairly evident why this is undesirable.

The guidelines outlined are highly subjective and apt to be inconsistently applied

The namespace MUST NOT be something common like tool or apps.

Who is going to define “common”? There should be explicit criteria for what constitutes “common” that isn’t subjective. This also doesn’t play nicely with existing names. Your opening example goes against this stipulation: “types-” Well, “types-” is pretty common, and probably shouldn’t be reserved to an organization. Is prefixing your word with “py” enough to make it no longer common? “test” is a really common word, but “pytest” should be a namespace?

The namespace SHOULD properly and clearly identify the reservation owner.

Ideally, this means that all namespaces should be prefixed with an identifier for the organization. There should be clearer guidelines. Also, this point should be a MUST, not a SHOULD.

The organization SHOULD be actively using the namespace.

What does “actively using” mean? Does it mean that the packages are actively maintained? Does it mean they are being updated frequently?

There SHOULD be evidence that not reserving the namespace may cause ambiguity, confusion, or other harm to the community.

The proposal makes this self-evident. Not reserving the namespace will open a package up to dependency-confusion attacks. Unless you’re looking for the evidence to be an existing example of a malicious package using dependency confusion? That sort of defeats the purpose, as namespace reservation should be a preventative measure, not a reactionary one.

Organizations that are not corporate organizations MUST represent one of the following:

Large, popular open-source projects with many packages

How are you going to define large? A user threshold? How will this work for keeping projects active? I don’t see a fair way of implementing this. This whole proposal just seems to needlessly favor established projects, and as a consequence will make it harder for projects to become established! It’s a catch-22.

NPOs/NGOs that actively publish packages like Our World in Data

Universities that actively publish packages

Government organizations that actively publish packages

NPOs/NGOs that actively publish packages like Our World in Data

Again, what does “actively publish packages” mean?

Other concerns

I agree with @steve.dower.

The list of maintainers is the critical indicator. I’d prefer to see “verified” ticks on those before any namespace registration feature at all.

But disagree with this:

I think any registration should come with a decent sized (thinking five figures annually) bill paid towards PyPI support and maintenance. If you want to claim a chunk of the namespace for yourself, you’d better really want it.

While this helps avoid abusive with blanket-reservations, it is a disservice to upcoming open source projects that don’t have corporate backing. I think what will end up happening is that you will have an open source project start off, and then they get to a size where it’s popular enough they can reserve a namespace (a community namespace?), but at that point will have to change their name so they can reserve a namespace that isn’t already taken.
Also, with the subjective nature for permitting namespace reservations for corporate organizations, it is possible for such organizations to stifle competition by proactively reserving a namespace that is up and coming, doing the bare-minimum to kill off a project that has already gained traction. (In part, this is why it is critical for the requirement 3 to be a MUST)

Critiques

The title of the PEP is misleading. “Package Repository Namespaces” does not really describe what the PEP is suggesting. Based on name alone, I would expect the PEP to define what a “package repository namespace” is and how it should work (and nothing else). This PEP is not doing that. The PEP is about adding a way to reserve a package prefix. It does not define what a package repository namespace is anywhere within. Should be renamed to “Reserved Namespaces in Package Repositories”

Also, I don’t really see the advantage offered by public namespaces. What is their use case? What exactly does a public namespace do? If I understand correctly, a public namespace does not prohibit new packages from outside the organization in its namespace.
If this is indeed the case, how does this fit in with any of the motivation? This doesn’t really do anything to stop “typo-squatting” or “dependency confusion”, unless there will be manual monitoring of packages for a public namespace.

dstufft · August 22, 2024, 8:57pm

I don’t understand what “existing standards” you’re talking about here?

To my eye, it complements the existing standards, because it’s pretty prototypical for organizations which wish to produce a large amount of packages to prefix their name with one or more shared strings, typically some identifier for their company or product. This takes that de facto standard and enshrines it as an available as a de jure standard.

PyPI generally functions by trusting the volunteers running it to make good decisions in ambiguous situations.

Have we gotten everyone of those decisions correct in the 20ish years of PyPI? Of course not.

However, I think this model has worked well for us thus far, and I don’t see any reason why namespace/prefix reservations as described in this PEP would need to be special cased.

There is similar guidelines in trademark law. The idea is that broadly speaking, registering a namespace/prefix is something that is, at least in part, being done for the benefit of the overall consumer. So that they minimize the chances that someone is going to trick them by purporting to be someone else.

So this phrase is, IMO, largely about “would registering this prefix make things more or less confusing for the end user”.

I don’t see how this would make it harder for projects to become established. Every large established project has managed to do it so far without having their namespace reserved, and I don’t see how this would change the ability for someone to do that.

“Verified Ticks” are almost certainly never going to happen.

For one, people tend to get really confused by simple indicators like “Verified” that they imply some level of “the organization that hands these check marks out has vetted this person is trustworthy”, which PyPI has no way of actually doing.

For two, even if we did do that, that would actually be of net harm to new users and projects, since the only possible way we could have to verify someone is if they were a known entity in the community already.

This feature focuses on asserting who is allowed to own certain names on PyPI, it does not make any claims about whether the owner of those names are trustworthy or good people or anything of the sort. It provides no additional guarantees over what registering a name itself does, except that it applies to multiple names rather than a single name.

Namespaces are expected to have a human in the loop approval process (at least for the “root” claims). PyPI is largely volunteer run, and until just recently did not have a dedicated support person so support was entirely volunteer based.

This limitation helps ensure that this feature rolls out in a sustainable way. By tying namepaces to real life money, we essentially have those companies pay for the extra overhead they’re causing by having a human review their namespace grant, along with extra so that we can use that money to help fund the cost of having a human review namespace grants for OSS projects.

I don’t have any opinions on what the exact price should be, but if a corporation wants to claim a large swathe of our namespace, they should have to contribute to the funding of PyPI.

However, the PEP allows PyPI to grant namespaces for free to open source community projects so that they don’t require a financial backer. However, it recognizes that we simply don’t have the bandwidth (at least right now) to human review every random OSS project that might exist, but also that smaller projects likely aren’t going to be at risk of someone using their name to try and trick people.

Important to note here as well, is it’s a lot easier to get less restrictive over who we grant namespaces to than it is to get stricter. I suspect there will also be an initial wave of requests and anything we can do to restrict those to the names where people are going to be most helped by is a good thing.

If it works out well, and the load isn’t too bad, then opening it up more might be possible.

It serves two purposes:

It allows a project to not block people from registering names, but still provide an indicator whether a project within that namespace comes from the owner of that namespace or not (think third party vs first party addons).
It allows a project to have a namespace, but then open up a portion of that namespace to others.

ketozhang · August 22, 2024, 10:05pm

I share the sentiments here that dislike going away from "import name ≈ install name`. If you want them to be equal, then you must have a very long import name^[1]. I also see that migration is still necessary for both distributors and installers when you aren’t already following prefixes^[2].

Could the prefix (or some marker) not be part of the distribution name? Such that @foo/bar is the same distribution/project as bar^[3]. The marker serves as a “ownership constraint” rather than an distribution identifier during installs.

Org names can be very long! ↩︎
Is distribution mirrors/aliases easy to do? Doesn’t it need two pyproject.toml files? ↩︎
There are likely going to be redundancy in the namespace & prefix like @foo/foobar and @google/google-cloud-storage ↩︎

bavalpey · August 22, 2024, 10:53pm

To my eye, it complements the existing standards, because it’s pretty prototypical…

You’re conflating convention with standards. I’m not trying to be pedantic here. I’m simply saying that this changes what is currently valid. Before this proposal, I would be able to submit any package called “foo-bar” as long as that package didn’t exist. With this, that now changes. You have to make sure the namespace isn’t reserved. The “standards” allow anyone to upload a python package to pypi, as long as that name itself isn’t taken (though no guarantees are made as to whether the package is allowed to stay).

PyPI generally functions by trusting the volunteers running it to make good decisions in ambiguous situations.

When volunteers are used to moderate or act as gatekeepers, there really should be as few ambiguities as possible. This ensures a fair and consistent process, and ensures that the guidelines are not selectively enforced, but rather are
The difference is that in this case, the ramifications harm accessibility and can bias the software ecosystem.

There is similar guidelines in trademark law. The idea is that broadly speaking, registering a namespace/prefix is something that is, at least in part, being done for the benefit of the overall consumer.

I don’t want to get sidetracked with ideological debates here, but the idea that a trademark is done for the benefit of the overall consumer is, in my view, naive/flawed. A trademark might help reduce confusion, but that is definitely not the only motivation for filing it, nor a necessary outcome. Did trademarking the term “threepeat” really benefit consumers? No, of course not. It just meant that people had to pay royalties to use the catchy term (e.g., news agencies couldn’t use the term on air without paying royalties).

However, the PEP allows PyPI to grant namespaces for free to open source community projects

Where in the PEP is this written? If it’s just implied, then that’s not enough. The PEP is too vague with regards to grants.

One thing that is left out completely from the PEP is the downstream implications this would have on package installers. PEP 708 did a good job here. It made it clear how an installer would use the information.
This PEP does not do that.

Important to note here as well, is it’s a lot easier to get less restrictive over who we grant namespaces to than it is to get stricter. I suspect there will also be an initial wave of requests and anything we can do to restrict those to the names where people are going to be most helped by is a good thing.

I like this sentiment. But there should probably be some research on who is going to be most helped by this before the PEP is accepted. I don’t think it’s appropriate to make assumptions as to who is and is not going to be helped.

“Verified Ticks” are almost certainly never going to happen.

Slight miscommunication here. By “verified” ticks, I did not mean verifying users / organizations. I meant the visual indicators that are referenced in the PEP. As in, something that tells users that, for instance, “google-cloud-compute” is definitely part of the official “google-cloud” ecosystem and not a third party package that extends functionality for “google-cloud”

I may be misunderstanding the PEP. Would this PEP prevent any package from being uploaded that began with namespace-?
Perhaps what I am suggesting by “verified ticks” would be everything in this proposal, except only “public” namespaces.

This feature focuses on asserting who is allowed to own certain names on PyPI , it does not make any claims about whether the owner of those names are trustworthy or good people or anything of the sort. It provides no additional guarantees over what registering a name itself does, except that it applies to multiple names rather than a single name.

Sure, I get that. But if adopted, it’s not hard to see what the immediate downstream effects are going to be. E.g, the third comment in this discussion:

Package metadata itself within the artifacts is unchanged but I am proposing we add it explicitly to the APIs so that consumers may do fancy stuff like extra security protocols.

Yeah, exactly. So this “fancy stuff” will start to bias packages towards those who have reserved namespaces. i.e., those projects that have already been established and are known to be trusted.
It’s not hard to imagine a world where an organization chooses to only allow packages from trusted namespaces (which, by the way, will have to be verified somehow!). This means that owners of trusted private namespaces can easily start to dominate and stifle competition. Let’s say an up and coming startup is working on a package that improves some mechanism. Some company B that already has a namespace notices this, decides that they don’t want the competition, and so pushes forth a competing product.

Since the namespace is already trusted, they now have an immediate advantage. There is no red tape for the package from company B to be used, while there would be for the package from company A.

As long as you are maintaining some sort of metadata in the package index that will have to be audited and trusted by pypi, then you can’t separate out the unfair favoring this will have on already-established organizations.

dstufft · August 23, 2024, 1:43am

So what you’ve described isn’t even the current “standard”, because we already disallow registration of names that are otherwise available, for reasons like:

The desired name is “too similar” to another package (e.g. look → 1ook).
Has not been added to the list of prohibited package names (~55k names).
Does not match the name of a standard library module.

Most changes to the packaging ecosystem change something about what is currently valid (either to add or remove something), as that’s the entire point of making a change?

Hard disagree.

It’s nearly impossible to preemptively determine ahead of time every single situation that might occur. When you attempt to overly constrain the rules that the people who run this have to follow, you make them fundamentally less equipped to deal with the ambiguity that is going to exist regardless of how much effort we put into it.

For instance, if we draw the line and say only X number of users for an OSS project, but there is a project with 90% of the requisite users but which we have knowledge of people actively trying to upload names that a namespace would protect?

What if two projects both meet the criteria and want to try and claim the same namespace?

The amount of power a PyPI admin currently has is already huge ^[1]. Attempting to shackle them in this one specific instance is honestly just pointless IMO. Even if we attempt to spell out every scenario, ambiguous situations are going to come up and the PyPI admins are going to have to use their best judgement.

I mean, whether that guideline is being upheld or not or whether that’s the reason why any individual trademark was filed (or attempted to be filed) doesn’t really have much to do with the my statement, which was that trademark law has a similar guideline, which it does:

(1) Any person who, on or in connection with any goods or services, or any container for goods, uses in commerce any word, term, name, symbol, or device, or any combination thereof, or any false designation of origin, false or misleading description of fact, or false or misleading representation of fact, which—

You can argue about whether or not they are following that guideline, but it’s explicitly a part of the law that defines trademarks. Not that it matters much, since it was just an example of another situation where they had a similar guideline.

It appears it is not explicitly spelled out in the PEP.

The PEP depends on the fact that corporate orgs cost money and OSS orgs do not, which are an existing feature on PyPI, and only references money to indicate why corporate orgs have less restrictive guidelines than OSS orgs do.

The PEP could spell it out better, rather than depending on readers to know that on PyPI org accounts are paid for corps and free for oss.

I suspect most clients aren’t going to do much with this information tbh, but I could be wrong.

In my mind the main benefit here is the restriction on upload, indicators are a secondary benefit and clients don’t really have search anymore to my knowledge.

If you own namespace, and you have it set private, users will be prevented from uploading new packages that use the prefix namespace-.

If the namespace is public, users are not restricted.

In either case, there is an indicator to show whether a package is part of a namespace, and if so which namespace and which org owns that namespace.

It is actually hard for me to imagine this, since the same could have been said for the other numerous cases where people could opt in to some stricter level of security, and they basically never do because as long as one package in your dependency chain doesn’t have that, then it’s basically impossible to actually opt in.

I also struggle to imagine the clients actually adding that, but even if they did, I suspect the effect will be to turn it on, see half of your deps aren’t installable, and to turn it back off.

This honestly just feels like a slippery slope argument that isn’t likely to happen?

I’d argue that to the extent this bias exists, it already exists for the existing metadata such as “author”.

I can right now go and take over any package I want, delete any files I want, etc. I can make a pseudo namespace by just going and deleting any names that use the namespace, etc. Would I do those things? Well no, not unless there was a good reason for it (like malware!), but if I can do all of that based on my judgement, it’s weird that I can’t also hand out a namespace based on my judgement. ↩︎

BrenBarn · August 23, 2024, 4:52am

That does seem like a much better name to me. In fact now that you mention it I see that the PEP itself refers to “package name prefixes” right in the first sentence. As you say, it’s just a naming issue, so it’s not super important.

That’s certainly a fair perspective, but we can also note that we already have a bifurcated (or polyfurcated) ecosystem with packaging, because people have created a variety of tools that extend the standards (like Poetry), and even entire alternative ecosystems (like conda), and people still keep doing those kinds of things and attracting users to those alternative tools due to gaps in the functionality of the standards. So there is a chicken and egg problem but some people are willing to be the chicken and then we get some eggs.^[1]

Well, for one thing, I think some of the problems there came from certain choices that (at least in retrospect) were clearly unwise and added needless friction to the process (like removing the u string prefix), as well as from not realizing that people would want something like six that provided a compatibility layer. So to some extent that can be avoided by having a more careful piloting process for identifying good bridges to build across the transition.

But even so, I don’t think what you describe is incompatible with what I’m suggesting. I’m not suggesting that we dump PyPI overnight. But if PyPI were “feature-frozen” while an alternative repository existed that added new features while removing the most painful parts of the old system, then people could (if they wanted to) release to both during a migration period in the way you mention, similar to how packages would have parallel release cycles for Py2 and Py3.^[2]

Of course, the real problem is we already don’t have enough resources to comfortably run one package repository, let alone two. But hey, we can just jack that corporate namespace fee up a little. . .

Anyway, aside from that, one other thing I’ll say about the PEP is that it seems I wasn’t the only confused by the public-namespaces thing, so it would probably good to expand that part of the PEP a bit to make it crystal-clear what exactly it means to have a public namespace.

Along the same lines, I’ve been trying to think about the PEP in terms of “what information does it give to who”. As I understand it, if I’m a user thinking of installing a package called foo-stuff, knowing that foo- is a public namespace controlled by The Foo Collective tells me. . . absolutely nothing. That could be a package that predated the namespace, or it could be a package that came after the namespace but was still allowed anyway (since the namespace as public), or it could be an official Foo Collective release, but the namespace itself doesn’t help me distinguish those cases.

And if I know foo- is a private namespace controlled by Foo Corp., then I know. . . still nothing. Because foo-stuff could still be a package released before the namespace grant.

Now, it seems like the idea is that something else (besides the namespace itself) would give me this information? Like if I look at the PyPI page for a package, I’ll see some kind of indicator that tells me “this package is released by the namespace owner”? Or, in the private case, if I know the namespace grant date and the package’s original release date, and the latter is after the former, then I know it’s an official Foo Corp. package? The thing is I’m not sure how useful the namespace per se is here, because in my experience a lot of people don’t find packages by going to the PyPI page; they google (“google-search”? ) and find something on StackOverflow or some other third-party page that says “foo-stuff is a good package for all your needs” and then they just go to install it. So for that common case, there won’t be any meaningful risk reduction for the user in terms of avoiding name-squatting or the like.

I just had another thought, though which is: even if this feature turns out to be not that useful to anyone except big corporations, for whom having “official control” over the namespace is relevant, it could still be a win, since it could provide a revenue stream for PyPI that would be paid by those who can most afford it and are probably free-riding the most right now. So maybe I’m in support of this PEP after all. . .

I have, as usual, been thinking about conda in connection with this PEP. Conda has the concept of channels, and packages can be installed from different channels. It’s possible for users to make their own channels, and in theory this could have been used to namespace packages. In practice it hasn’t turned out to be a major thing for individual orgs or authors to release their own packages on their own channels. But it has resulted in the growth of separate channels for collections of packages that are to some degree curated (most notably conda-forge, but also for instance bioconda). This suggests (again) to me that often times, on the user end, people wouldn’t feel they needed individually-namespaced packages if there is one big namespace that has everything you want and is curated to exclude stuff you don’t want. So like it doesn’t really matter if you have to install something called google-official-razzmatazz or just razzmatazz, so long as you know that someone is vetting packages to make sure no one can sneakily put in something that’s called razzmatazz that isn’t the official one everyone is expecting. Anyway I put this into a footnote to avoid cluttering the main post, but my point is that some of these “alternative chickens” are actually showing how things can work even in a more evolutionary way, not necessarily some kind of precisely master-planned system but just a new system that explicitly addresses some shortcomings of the old. ↩︎
And, again, we already sort of have this with conda, and that alternative ecosystem has a number of major advantages. But the continual incremental changes to the PyPI-standard system discourage a shift because it seems to be “getting better”, even though there is no long-range expectation that certain base problems with PyPI or packaging standards will ever be fixed. ↩︎