Pypi package name restricted namespace proposal

I know that PEP708 is trying to prevent dependency confusion attacks, but I have a simpler solution that may work in concert with that solution. Which is just to say that there exists a package name prefix that will never be allowed on pypi.

A solution would be a policy to disallow packages on pypi starting with a particular prefix, for this example let’s call it nopypi- . Once such a policy were enacted, users of private repos can name a package nopypi-foobaz, and they know that their teammates can run $PKGMGR install nopypi-foobaz and under no circumstance will it ever install a package from pypi matching that name, even if they didn’t configure their package manager correctly.

Seems like there’s too many ways that packaging can be misconfigured, that this line of defense would put a stop to pypi serving packages names that would never appear on pypi legitimately.

The prefix should be something more friendly like internal- or private- but effective it means that it is incapable of being indexed on pypi.

-ken

5 Likes

This would work nicely (in conjuction with PEP 708, as you say).

A lot of practical reasons exist to need the PEP’s changes, but also having a way forward that provides a “never on PyPI” scope means that those of us building private packages can use it to avoid conflicts at all.

It also helps us avoid accidentally publishing to PyPI (though so does the Private :: Do Not Upload classifier), but more importantly prevents someone else from publishing without the classifier.

1 Like

I wasn’t aware of that one. Where is it documented? (And it should probably be documented/publicised better :slightly_smiling_face:)

1 Like

It’s documented at Classifiers · PyPI and Packaging and distributing projects — Python Packaging User Guide

The Private :: Do Not Upload classifier kind of happened accidentally.

It’s use happened accidentally, because PyPI does not allow unknown classifiers in uploads, so if your project includes an unknown classifier, PyPI rejects the download. Someone at some point realized that meant you could prevent private projects from being accidentally uploaded to PyPI by adding an unknown classifier. It’s always been a bit of a weird way of implementing it [1], but it works!

At some point PyPI documented it explicitly in the classifiers page that any classifier that begun with Private :: would never be acceptable on PyPI.


  1. Conceptually cleaner is probably an upload client feature that allows you to explicitly define which repositories it is OK to release your project for, but that’s harder to ensure will happen vs a classifier like this. ↩︎

2 Likes

And Private :: was given some extra protection so it can never be created on PyPI:

1 Like

Based on the restricted trove classifier I’ve checked the index for usage of the prefix private.

There are 11 packages using private-.*, 15 using private[^-].*, and a package called private.

If we didn’t want to boot them (or just grandfather them in?), an alternative would be notpublic and/or not-public which have zero packages using them today.

The key here is simplicity and reducing the likelihood of typos. For example, both prvt-.* and pvt-.* are unused in pypi, but there’s just too many ways people typo abbreviations like that and don’t notice that they’ve done so.

Could we just use x-? The convention exists in a few other spaces (HTTP headers, CSS), and it’s really hard to mistype.

1 Like

There are 60 projects on PyPI starting with “x-” (or “x_” or “X-”). So we’d have the same problem of what to do with them. And “private-” is more obvious, IMO, if we do have to deal with existing projects.

You could make leading underscores or dash in the project name a valid pep508-identifier for building when the “do not upload” classifier is set (or alternatively some new property in the project metadata). This would never clash with a valid PyPI project name.

Just my engineer brain, this is how I would go about things:

  1. Get buy-in of the concept
  2. Write the code to prevent new packages being created with a configured list of prefixes
  3. lock down x-.* private.* not-public-.* for all new packages going forward
  4. discuss with existing x-.* and private-.* package owners if they would agree to moving to a different package name, maybe giving them an extended grace period if they have lots of users.
  5. After grace period is up, then prevent updates to those existing package names (for abandoned packages), or remove them entirely (for packages republished under a different name).
1 Like

Presumably this means “published to PyPI”. The whole idea would be to create these packages and publish them to other indexes (and on a side note, I’d really appreciate any proposal here to explicitly say that anyone-who-isn’t-PyPI should not block these names, otherwise some will block them for “compatibility” :grimacing: ).

Chances are it wouldn’t be terrible to have a handful of existing projects grandfathered into the namespace. And once we lock it down, we can clean up anything malicious that’s been dumped in there since we started talking about it.

(I’m +0.5 on just grabbing both/all three namespaces while we’re at it. Seems likely to be no real harm, and technically we’ve already said that x-.* packages ought to be related to the x package, which probably means we won’t cause too much upset by blocking them.)

1 Like

Agreed, will totally defeat the purpose if private repo start using this package name filter.

Thanks. I’m kinda new here, so can you let me know how I can help get this idea rolling? This certainly needs to be formalized and document, but this doesn’t seem like it needs to be a cpython PEP. I’m happy to create a warehouse PR (for both the filter implementation and the documentation) and let the feedback there show me the next steps.

PyPI is always a bit of a grey area in terms of PEPs.

For something like this, I think a PEP would be the right process to go through, even though it only relies on changes in PyPI itself, it’s something that we should probably get broad agreement on it being a solution that people are generally OK with-- and importantly relying on it for a security feature is something that likely should get baked into a PEP.

You may want to give it more than a day for people to comment to see if anyone is violently opposed (but I don’t think you have to, waiting is mostly to save you effort).

1 Like