Namespace support in pypi

This would also provide a solution to people raising issues like pypa/pip#3454 and pypa/pipenv#2159, where the fundamental problem is pip does not have a way to prefer a package source.

The problem IMO, however, is how to work out a balanced policy. The general criteria mentioned above (can manually apply if the entity has a significant number of packages) is likely not useful to companies wanting to reserve name for internal packages, but a liberal approach (e.g. allow reservation of <name>-* if the entity owns <name> package and/or username) would be very vulnarable to name-squatting. Maybe some compromise would be possible? Say, automatically reserve <name>-internal-* for the owners of package <name>.

I’ve propose something similar ~1 years ago on pypa/warehouse, for more informations see my discussion and ideas here (a sort of idea draft)

So it sounds like we’ve identified three potential use cases for namespaces so far:

  1. Expanding the space of available package names to reduce conflicts and make it possible to publish forked packages without renaming everything.

    • Comment: IMO this doesn’t seem very promising right now, because we don’t have good ways to manage the resulting conflicts at the Python import level. Maybe it’s worth revisiting after we have a robust resolver and Conflicts metadata?
  2. Accurately signaling the origin of public packages. For example, if a package is called largeco-blah, end users might appreciate knowing whether the package is maintained by LargeCo Inc. or not.

    • Comment: this is essentially the same issue that classic trademark is trying to address – giving people accurate information about what they’re getting. We already have some relevant policies here – in particular, PEP 541 has mechanisms for handling trademark disputes – but they’re fairly ad hoc; this would be systematizing them. Some challenges include: how do we handle the tension between names that designate origin vs names that describe usage (e.g. pygithub is a package for working with github, so it’s an accurate descriptive usage, but it’s not maintained by GitHub Inc.)? How do we effectively communicate the difference to users? If PyPI is going to be in the business of promising to users that azure-storage comes from Microsoft, then how do the PyPI administrators figure out that they’re actually talking to Microsoft and not some scammer? (This is basically the same problem as Certificate Authorities have to solve, and it’s highly non-trivial.)
  3. Reserving portions of the namespace for private usage. Lots of organizations have internal packages; they definitely don’t want to accidentally get a public package that happens to use the same name, and they would prefer that no such public package exist (since it’s awkward to have unrelated packages where you can’t install both of them at the same time, and maybe their package will become public later).

    • Comment: This is essentially asking for PyPI to create a formal, blessed way to squat names. So the challenge would be to find a way to balance the public’s desire to keep names available to use and not be locked up by speculation or some opaque and unaccountable process, versus organizations’ desire to avoid accidental conflicts. One approach might be to carve out a specific namespace for this usage, e.g. prohibit packages on PyPI that start with private- and then document that everyone’s internal packages should use this. In the mean time, there are other options like using devpi (as noted up thread). This is clearly a common problem though, so at a minimum we should have some docs addressing it.
4 Likes

Thanks for the summary, @njs!

For those who haven’t been following it, here’s the GitHub issue about planning the rollout of the new pip resolver.

I believe @dustin is working on the PEP 541 process (and, towards that goal, on a user support ticket for PyPI.) Perhaps he could speak more to how frequently we see trademark questions come up currently among those support requests?

Perhaps this could be on https://packaging.python.org – anyone want to take a stab at writing this up as a guide and improve Hosting your own simple repository - Python Packaging User Guide along the way? People do want clearer and more discoverable recommendations for the intersection of private stuff and PyPI.

The typeshed project, which provides PEP 484 type stubs, is currently discussing to distribute non-standard-lib type stubs via pypi (https://github.com/python/typeshed/issues/2491). Currently all stubs are vendored by the type checkers, but this approach doesn’t scale. Similar to DefinitelyTyped in the JavaScript world, we’d like users of a package foo to be able to install the corresponding stubs by just typing pip install types.foo or something similar.

But we’d need to ensure that people can’t squat these names for security reasons. As opposed to other Python packages, people will just try to install the type stubs without previously checking them, and they should be able to. But without namespacing this would open a wide door to attackers. So for us, namespacing is essential.

From the descriptions above, I’d agree with that. It sounds like the problem statement they started with was very similar to ours, and so the solution ended up offering characteristics we consider desirable: genuinely opt-in (so the folks for whom the existing flat namespace is working well don’t need to care), and with a centrally administered approval process so you didn’t get a proliferation of vanity namespaces producing install time package conflicts.

Slightly related since it’s relevant for internal-only packages, pypi.org will never have a classifier that starts with "Private :: " and it rejects uploads with invalid classifiers. (PR w/ link to a tweet)

2 Likes

Has there been any progress on Nuget-style namespaces? If not, is the blocker development time or a PEP?

As far as I am aware there has been no progress. It would require someone to write and champion a PEP, and then someone to implement it assuming it got accepted.

Addding a belated note on the “private package” problem: Linux distros face a variant of this with system API binding packages that are installed directly into the system Python by a primarily C/C++ focused build process instead of being published as regular Python packages.

I think any PEP should put this into the “No need to solve” category though, for the following reasons:

  1. These are open source projects, so they (or a consuming distro) are free to make an sdist and publish it to PyPI (best resolution)
  2. Distros that allow for multiple Python stacks or fully support venvs will likely want the sdist anyway, so the bindings can be used outside the system Python (encourages the best resolution)
  3. When there are technical barriers to the best resolution, actual namesquatting is a defensible interim measure given the availability of PEP 541 to address disputes (e.g. I’ve held the “rpm”, “dnf” and “solv” names on PyPI for years, and relatively recently allowed a team from Red Hat access to the last to publish real libsolv bindings).
    4.The namesquatting workaround could be made more systematic (similar to the blocking of stdlib names), but any prefix based name reservation would only apply to future explicitly distro-controlled packages and, for that purpose, distros fall under the same design category as “largeco” in NJS’s write-up.

See proposal for name reservations: PyPI as a Project repository vs. Name registry (a.k.a. PyPI namesquatting, e.g. for Fedora packages) - #12 by encukou

I don’t think distros need prefix name reservation. Who’d be the gatekeeper? If someone random contributor wants to write a Fedora-specific tool and name it fedora-foobar, they should go right ahead. (Though of course, just foobar would be a better name—if it’s good, others might want to adopt it.)

#988 is now closed as pip 20.3 has the new resolver on by default.

Hi all, in light of Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies | by Alex Birsan | Feb, 2021 | Medium is now an appropriate time to revisit this & Improving risks and consequences against typosquatting on pypi - #12 by PoluX ?

2 Likes

IMO this is not (directly) related to the issue. As you can read from the article, npm (which does support namespacing) is still affected by the issue. On the flip side, the issue can be quite easily prevented by registering placeholder packages on PyPI even without namespacing, and I believe the PyPI team accept requests to reserve names without actually publishing any releases as well. Namespacing can make prevention a bit easier, but can’t help much, nor is necessary to avoid it from happening.

Agreed. Although it may make it easier to make a clear distinction between what is <company A’s> and what isn’t.

and I believe the PyPI team accept requests to reserve names without actually publishing any releases as well

Is there a sanctioned/“official” means of doing this?

This is true only when package publishers fail to make use of the namespace support. If PayPal’s auth-paypal package had been named @paypal/auth, with PayPal in control of the @paypal scope, they would not have been attackable using this technique.

Solving this problem requires both namespace support, and awareness of that support’s existence and its importance.

3 Likes

You can squat a package name without namespacing as well (at least on PyPI); if auth-paypal were a Python package, they could claim that name on PyPI with a stub package, or ask for PyPI admins to give it to them. Namespacing makes the process much easier, but it is not required.

p.s. To be clear, I’m fully supportive of having namespace support in Python packaging. But it should not be presented as a requirement incorrectly.

2 Likes

Based on my discussion with them, PyPI maintainers prefer you sending a list of names to block to admin@pypi.org.
Name squatting is against PyPI policy.

1 Like

In order to use the blocking/squatting method, we’d have to send a list of approximately 1,000 internal packages, and we’d have to update the list daily. To ensure safety, we’d have to delay publication of a new internal package until after the block/squat has been put in place.

So while it is true that namespace support is not required, I can’t imagine it being practical to handle enterprise package protection without it.

4 Likes

Speaking in a personal capacity, this would be a great thing to add to fundable-packaging-improvements/FUNDABLES.md at master · psf/fundable-packaging-improvements · GitHub. There’s consensus that we want this in some form, and there’s at least 2 “good models” for how this could work.

This is definitely a good candidate for “targeted funding to do this” style grants, especially given that some of the organisations that care about this might also be willing to fund this work.

And, if you’re reading this as an organisation interested in moving this forward (through funding), I suggest dropping an email to packaging-wg@python.org that you’re interested in funding this.

2 Likes