Dedicated prefix for internal, non-public packages?

We’re figuring out how best to install a mixture of public & internal Python packages. The simplest solutions all involve using either --find-links or --extra-index-url to tell pip where to find our internal packages, and letting it fetch the rest from PyPI. However, if someone finds the name of an internal package (which is hardly a secret), they could upload a malicious package of the same name on PyPI, and we may get that installed. This is not considered a bug in pip (issue #8606).

The best workaround at present appears to be running an index server (with devpi, pypi-server or simpleindex) which can proxy or redirect requests to PyPI, but where internal packages can shadow public ones, so once I have made an internal foo package, no foo from PyPI will be found. You pass this as --index-url rather than --extra-index-url (though the latter will appear to work :slightly_frowning_face: ). If you use a lot of packages, though, there’s a risk you one day want a public package with the same name, because it’s all the same namespace. And the server is extra complexity compared to just putting internal packages in a directory or e.g. a Gitlab package registry.

Alternatively, you could namesquat your internal packages on PyPI so no-one else can upload them. But it’s easy to forget to do that when you make a new package, and no doubt some organisations are reluctant to publish even package names.

Would it make sense to reserve some prefix for internal packages, so the same names could not be claimed on PyPI? I can see a couple of possible variants of this idea:

  1. Find some currently legal prefix like internal- which is not yet used on PyPI (so internal- itself is out), and make PyPI refuse to accept packages named with that prefix.
  2. Expand the legal characters in package names to allow a new prefix like internal:, without accepting it on PyPI.

I’m suggesting this as a rough and ready way to make the easy options more secure, not a perfect solution. It would treat PyPI specially, as the global default index, rather than trying to describe index preference or trust levels in general.

1 Like

There is a (long) thread here about namespace support on PyPI, which I think is what you’re looking for. The idea is company “foo” can reserve the foo prefix, so only it can upload packages like foo.bar to PyPI, but they’re free (and safe) to use the name internally.

Thanks! I guess I’m suggesting a simpler alternative to full namespace support which could be put in place relatively quickly - a single namespace special cased. It looks like that thread started 2 years ago, and the last message is suggesting that it would take funding to make it happen, so I think it’s worth looking at what we can do in the meantime.

1 Like

Having one single prefix that’s explicitly forbidden on PyPI so people can use it for their own indexes sounds like a good idea to me. We can for example block every name matching privatepi-* (and maybe reserve the username privatepi as well) and tell people they should use this prefix for any private pacakges. This would be entirely compatible with ideas raised in the namespace support thread (if I recall the details right) and is relatively easy to implement for PyPI.

2 Likes

My thinking on this topic has evolved a bit since I started that namespace thread a while back. These days I’m thinking we’re better off starting with enhancing pip vice pypi - specifically as I mention in a comment on the draft PEP on package indexes. To elaborate on that comment, what I’d love to see is:

  • pip introduce support for repository namespace syntax like @repo/package, where the repo-to-URL mapping can be handled via config file or env vars as discussed in Draft: Add PEP on package indexes by fredrikaverpil · Pull Request #2 · fredrikaverpil/peps (github.com)
  • resolution for packages without a repository namespace would continue as currently implemented in pip
  • resolution for packages with a repository namespace specified would fail if no repo-to-URL mapping is defined. If in the future pypi added repository namespacing to its API (where a single group controls a given namespace much like github organizations), then pip could query PyPi for indicated packages
  • pip would never query pypi for repository namespaces containing the keywords “internal” or “private” (e.g. @exampleco-internal/secretpackage ), instead requiring a defined repo-to-URL mapping and failing if not defined.
  • intent would be that this approach does not affect any other language syntax, package imports, etc. Goal would be to simply influence how packages are installed to eliminate confusion and enable better control by devs.

(And yes, I get that pip allows install via URL…that doesn’t work in my environment for a bunch of reasons I won’t bore you with. Having the additional abstraction and specificity described above would give us a ton of benefits for the various environments we manage.)

This (indeed, the whole proposal in your post) sounds like something that should be standardised via a PEP before being adopted in pip. We try to stick to behaviour defined by agreed standards in pip, to avoid getting stuck with implementation defined features that other tools end up having to reverse engineer and match bug-for-bug.

I’m not sure what the status is of @fredrikaverpil’s draft PEP, the discussion seemed to die down. Maybe the two proposals should be combined?

A concern with this approach, at least for me, is that the naming would likely be inconsistent between the distribution and the package.

If I published privatepi-database-connector at Bloomberg I’d want to import bloomberg.database.connector, not privatepi.database.connector. If this wasn’t done, then if Bloomberg ever decided to publish that package on PyPI, not only would the distribution name change but also all of the code using the package would have to be changed.

1 Like