PEP 752: Implicit namespaces for package repositories

A lot of the tone of the discussion here feels pretty grim, TBH. I think that’s because there’s no safety nets in the proposal[1], so everyone is concerned about worst-case situations in order to avoid them before it’s too late…

(There’s also a certain level of distrust of companies paying for privilege, which may be unwarranted, but again it’s extremely hard to find any way of getting assurances about that.)


  1. The main safety net is the vetting that would be done by PyPA admins. But recent history has given us strong evidence that the PyPA admins are so overloaded that it’s unreasonable to assume they have the bandwidth to handle that responsibility the way the community wants. ↩︎

4 Likes

At the top level, nothing significant.

Within a restricted grant, they offer a way to carve out a prefix for plugins and third party solutions.

Thanks, that makes sense. But coming back to my point above, I think this is better represented as a setting for grants rather than a whole separate type. I.e. once you get a top-level prefix, you can make a sub-prefix like foo-contrib-, then you can set that to allow either anyone to upload, or a limited set of people, but different than the parent foo- prefix.

I think the reason ‘open namespaces’ was a separate grant type in the first place is so that ‘restricted’ namespaces could be exclusive to paying customers. I think that’s a mistake, as discussed on the PEP 755 thread, and this is why I’m pushing back on them being a separate type.

1 Like

Many projects [2] support a model where some packages are officially maintained and third-party developers are encouraged to participate by creating their own. For example, Datadog offers observability as a service for organizations at any scale. The Datadog Agent ships out-of-the-box with official integrations for many products, like various databases and web servers, which are distributed as Python packages that are prefixed by datadog-. There is support for creating third-party integrations which customers may run.

That’s a lot of links to Datadog ; )


Among the motivations are prevention against typosquatting and dependency confusion. Assumption is that if a package on PyPI has the name prefix bigcorp- and I know that BigCorp – an organization that I trust – owns that prefix then I can trust this package. But as far as I understood, a pre-existing package bigcorp-trojan would be left untouched and allowed to keep existing under that name although it has no connection with BigCorp. Isn’t the proposal completely missing the goal it is trying to achieve? Should PyPI uploads get frozen temporarily to prevent a surge of name squatting from happening now?

Also in general I am not super convinced that prefixes help prevent typosquatting and dependency confusion. Is there data to this? How is it less likely that I accidentally install bigcrop-thing rather than bigcorp-thign?

3 Likes

I’d expect it to work that way at the implementation level, but at a policy level I’d expect the PyPI admins to want to be able to lock a top level grant as “must be an open grant”.

It’s a genuinely weaker form of grant, but one that still allows creation of restricted subgrants without going back to the admins to request permission for each subgrant that the organisation decides to define.

The PyPI admins’ ability to remove projects and packages that aren’t published in good faith isn’t going anywhere.

For the rest, the pre-existent project handling is there to cover packages that are being published in good faith. Each org considering seeking a grant would have to choose between a broader grant with exceptions, or a narrower grant with genuinely exclusive control. The PEP is designed to ensure that the structure of the grant is communicated clearly and unambiguously rather than specifying either of those choices as the universally correct answer for risk reduction (and only being exposed to “these established projects may be compromised by malicious actors” is a reduced level of threat relative to being exposed to that risk and the risk of malicious actors arbitrarily publishing new packages that share a prefix with your official ones)

Aha, are you saying that e.g. jupyter- would have to be open, but having registered it, the Jupyter project could then create a child prefix like jupyter-official- and make that restricted? I had assumed that an open root grant could only have open children, but neither PEP actually says this (though they don’t explicitly say the converse either).

If that is the case, I guess it makes ‘open namespaces’ not entirely pointless. But it seems like a weird, confusing twist - companies can restrict e.g. amazon- but community orgs have to request one level without restriction (jupyter-) and then make up a second level to restrict (jupyter-official-). :confused:

What’s the scenario we’re trying to protect against here? If PyPI admins don’t trust the requester, they can reject the prefix application entirely. There presumably needs to be some procedure for taking back a prefix that was granted but is now being used maliciously, or granted to a startup that went bust without ever publishing a package. Why would they want to approve a grant but lock it as open?

Whatever it’s trying to guard against, it leaves a distinctly sour taste that the PEP seems to assume community organisations are likely to be a problem, but not paying customers.

3 Likes

My point is that a reserved prefix is still no guarantee that everything in this prefix can be trusted as handled by the owners of the prefix. So the value of the main argument is greatly diluted in my opinion. One would still need to go check on PyPI for a visual indicator that every single project is truly handled by the trustworthy owner of the prefix.

Maybe what needs to happen is that in my installer (pip for example) or in my private PyPI proxy I can curate an allow-list of prefixes I trust, so that the installer and/or the proxy can be left in charge of verifying via the JSON API that only the packages that truly belong to the owners of the prefixes in my allow-list are downloaded and not the packages that just happen to have had the prefix since before the prefix ownership was granted.

1 Like

Both of those are probably equally likely to type, but in practice I would expect the first one to be easier to notice after it’s been typed.

In many use cases I would expect lists of dependencies to be sorted such that packages from the same namespace are right next to each other. This tends to help highlight differences near the beginning of the identifier.

But more importantly, the thing to keep in mind is that a prefix reservation protects you from typos in the part of the package name after the prefix, and this gets more valuable the longer and more incomprehensible the package name is.

# requirements.txt
bigcorp-argon2id-hash-provider
bigcrop-bcrypt-hash-provider
bigcorp-pbdkf2-hash-provider
bigcorp-scrypt-hash-provider

It’s fairly easy to see the typo in bigcrop-bcrypt-hash-provider in part because it stands out against the wall of bigcorp-* packages. But I’d probably never notice the typo in bigcorp-pbdkf2-hash-provider if I hadn’t put it there on purpose. If BigCorp has bigcorp-* reserved, that package name just doesn’t exist on the index.[1]


  1. Setting aside the whole “grandfathered package name” issue or the potential for supply-side typos. ↩︎

2 Likes

Continuing from @jamestwebber’s post in another thread:

AFAICS, a big problem here is that the third-party repo is not default. If an employee installs vanilla pip, or naïvely tries out a new packaging tool, it won’t have the private server set up, and will install from PyPI – where a malicious user could have uploaded a package with the same name as the intended internal one.
And PEP 752 solves this! If the org controls the namespace, malicious users now can’t shadow secret internal packages. Instead of a quiet install of untrusted code, you get a fixable “d’oh 404”.
(Also: for this use case, pre-existing packages on PyPI can simply stay there.)

I wouldn’t be opposed to giving namespace grants primarily to organizations that (intend to) run an internal index server.

2 Likes

Hm, I would say the opposite–you should only get a namespace grant if you’re going to run a public index server that serves up all of those packages. An org can always run a private server, mirror the parts of PyPI that they need/have vetted, and name things however they like with no limit[1]. If they want to claim real estate on PyPI they should be providing something to the community, IMO.


  1. and to mitigate the security risk you outline: disallow installs from PyPI ↩︎

3 Likes

This is correct and something I will talk about but the bigger issue is that it’s unrealistic to expect every project like Django and Jupyter to maintain their own infra.

1 Like

I’m still very against string prefixes and placing corporations into a tier above volunteers, especially when that will increase the load on pypi admins/support. Full namespacing by user available to everyone would not have this issue, however, short of that

(copied from the related 755 discussion)

This sort of syntax would seem to be possible to support today, with no changes to existing or future packages (other than installers and other tools that interact with user provided package info), no extra administrative overhead, and it should prevent the class of issues that this claims to help avoid.

5 Likes

It’s easy to set up global config for pip to default to a different index server. Although I concede this is not a solution in all cases, I’m still not entirely clear what cases this proposal is aiming to address, and so I don’t know if configuring pip is a reasonable alternative.

Having said that, I agree that expecting open source projects to set up their own server is an unwarranted cost to impose. But for commercial organisations (ones willing to pay for a closed org namespace) it is IMO a reasonable option to suggest. The fact that organisations don’t do this already suggests that there are other factors in play, although the cynic in me tends to think “organisations doing nothing and hoping open source will fix their problems for free” is the root cause here…

(with regard to owner::distribution syntax for verified owners)

The problem is that people will expect to be able to put this in dependency specifiers, and to make that possible requires a spec change, i.e. a PEP. That’s not an objection to the idea (which seems plausible to me given my continuing confusion over what problem we’re trying to solve here), but it does move it from the position of “something we could do today” to one of “a potential alternative PEP”.

1 Like

I would assume corporations would be paying a fee high enough that the PSF would be able to higher enough people and then some to handle any additional load.

2 Likes

Feel free to read that without the especially part then. I don’t like the way this treats open source volunteers as if their packages are less important, or that the consequences of a typo squat [1] for them would be better for the ecosystem. The tiering of corporations above the volunteer ecosystem alone leaves a bad taste.


  1. Not that I actually think this pep is the best we can do for that either. ↩︎

1 Like

I think this is a quite radical interpretation of what I’ve written. Can you please be more explicit about what you find troublesome?

1 Like
  • The closed prefix grants are exclusive to corporations that pay. This is structured in a way that Requires review, due to the impact closing off entire prefixes can have.
  • There’s a presumption attached that there’s not a good reason for the community to use existing naming conventions when they build code that interacts with a commercial product even when the intention is not to typo squat, but to extend and improve developer experience.
  • Various other solutions are available which do not have these problems, but it appears that the goal here considered corporation use first.
  • Most of the examples are from a single company. Someone from another company already pointed out that just being able to specify the username that provides the package to pip would be sufficient.
  • There are examples of how this is disruptive to how the community already names things, and because those packages already exist, this provides no guarantee to users, so it’s unclear that this even solves the problem.
2 Likes

I will probably alter or remove that part, but that is a policy issue which is now in PEP 755 so let’s discuss over there.

I have no illusion that there may be cases when communities want to have the same prefix which is why I added the concept of open namespaces. However, for times when they are not intended to be open as determined by the owners, then it’s actually a trade-off of user security versus adherence to branding.

This part confuses me because non-paid organizations can claim prefixes in the same way. Are you talking about the open/restrictive concept as in your first point? If so, that is PEP 755.

That’s just not true. In the next draft I will move my employer’s bullet point to the appendix but that is one out of ~10 examples of projects that I use.

I think perhaps you’re talking about Steve’s recommendation of something like microsoft::pkg_name? If so, this is basically the same concept as NPM scoping as I put in the rejected ideas section.

I find it better to think about this as reducing the impact of malicious actors rather than eliminating entirely. In the situation of a restricted namespace, having future unauthorized uploads prevented coupled with judicious use of PEP 541 requests when malicious uses are found reduces impact so much that the risk to a user becomes negligible.

1 Like

This is not the same as npm scoping. This doesn’t get rid of the flat namespace, it’s an optional way to say “I want this package from this user”, and errors if that package is not provided by that user. it doesn’t require npm namespacing to function, but does prevent the issue you are claiming to want to prevent.

I don’t think this is a valuable tradeoff as presented when compared with various alternatives. You are essentially saying that even though python got to where it is from a strong community that was free to name things in a way that made sense, we suddenly have to compromise on that. There are better ways forward than this; than treating the volunteer community as a security risk.

1 Like

That sounds like the same thing to me (and NPM also kept the flat namespace). Can you please help me understand how this is different?

I understand you’re trying to look out for the community but this is not an accurate statement. The security risk is not volunteers but rather whomever is not authorized for a namespace, which in the vast majority of cases includes every other organization and company.