PEP 755: Implicit namespace policy for PyPI

mikeshardmind · September 9, 2024, 7:37pm

Your motivation section is missing some crucial details, and is outright misleading in my opinion.

Typeshed is a community effort to maintain type stubs for various packages. The stub packages they maintain mirror the package name they target and are prefixed by types-. For example, the package requests has a stub that users would depend on called types-requests.

The issue with this is the convention currently is postfix with -stubs, not prefix with types, this basically inverts whether or not this idea makes sense.

Search results · PyPI shows this pretty clearly.

The alternative of “do nothing” as you put it actually seems preferable to me. I don’t see this pep as providing benefit, but it will increase the work that needs to be done to support it. I’d rather see the effort invested in explicit namespaces, which would introduce less administrative overhead as a result, as there is no scarcity there (at least, if namespacing is done tied to users/orgs, there’s no new scarcity). This also leaves it as something that’s just available to community projects without having to have a policy of when a community project is “large enough” or “important enough” that it gets to count

I also think it’s disingenuous to point to a history of people asking for namespaces, and not differentiate whether those users would consider this appropriate when this is an implicit option that isn’t really namespacing but reserved prefixes.

Liz · September 9, 2024, 7:46pm

This isn’t just convention, it’s actually a documented requirement for distributing type information as a stub-only package

For package maintainers wishing to ship stub files containing all of their type information, it is preferred that the *.pyi stubs are alongside the corresponding *.py files. However, the stubs can also be put in a separate package and distributed separately. Third parties can also find this method useful if they wish to distribute stub files. The name of the stub package MUST follow the scheme foopkg-stubs for type stubs for the package named foopkg .

https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages

but there’s nothing that enforces that a package named this would be behaving and not contain code. So existing standards, not just conventions, conflict with this pep

BrenBarn · September 9, 2024, 7:53pm

Would this PEP actually handle the situation described in that particular comment? It seems to be talking about typos, not namespaces. As I understand it, this PEP in itself would not prevent someone from creating a package called requets package name, nor would it prevent them from registeringgoogel- as a prefix, nor would it even prevent someone from uploading a bunch of malicious google- packages as long as they did so before someone actually registered the google- prefix. Those things may indeed be preventable by sharp-eyed PyPI admins now, and will remain so under this proposal, but the question is how much is the marginal gain from this proposal if it still requires manual vetting to prevent typosquatting.

I agree it’s clear from that issue thread that there is desire for namespacing of some kind. But like others have posted here, I’m not entirely convinced by the PEP that this specific proposal is going to be useful enough to outweigh its downsides. Just because people want namespacing “of some sort” and this PEP proposes namespacing of some sort doesn’t necessarily mean those two sorts will be the same.

jamestwebber · September 9, 2024, 7:57pm

If NVIDIA owned the cupy- prefix then it would prevent typo-squatting of those packages. But it wouldn’t prevent typo-squatting on cuppy- or something.

So yes and no: it prevents some malicious uploads but not all of them.

sinoroc · September 9, 2024, 8:18pm

Yes, I think it would improve the quality of the proposal(s) if there was a list of concrete examples of the current issues and how this proposal would improve the situation.

Like others I am not sold on the idea that prefixes will help protect against dependency confusion and typosquatting. If we had had prefixes or real namespaces right from the start, for example: pip install 'ofek/hatch', then of course I could add ofek to some kind of allow-list in my pip config or in my corporate PyPI proxy settings so that I would never end up downloading and installing okef/hatch.

ntessore · September 9, 2024, 8:18pm

At the risk of repeating an old argument, but: why not a solution where a regular PyPI package foo can be marked as a namespace package in its PyPI metadata. A request for such a PyPI namespace package could go into the review queue, as per the current proposal. If accepted, only the owners of foo would be able to register PyPI packages foo.* going forward. This would bind namespaces to actual packages, so that, e.g., first-come-first-served and the existing policy on trademarks apply.

It would also make it so that PyPI namespaces should, in many cases, mirror the actual installed package namespaces (e.g., PyPI package foo installs namespace package foo, PyPI package foo.bar installs package foo.bar).

srittau · September 9, 2024, 8:20pm

Elizabeth King:

Michael H:

The issue with this is the convention currently is postfix with -stubs, not prefix with types, this basically inverts whether or not this idea makes sense.

This isn’t just convention, it’s actually a documented requirement for distributing type information as a stub-only package

For package maintainers wishing to ship stub files containing all of their type information, it is preferred that the *.pyi stubs are alongside the corresponding *.py files. However, the stubs can also be put in a separate package and distributed separately. Third parties can also find this method useful if they wish to distribute stub files. The name of the stub package MUST follow the scheme foopkg-stubs for type stubs for the package named foopkg .

https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages

but there’s nothing that enforces that a package named this would be behaving and not contain code. So existing standards, not just conventions, conflict with this pep

The typing spec is talking about import packages, not distribution packages, which is what PyPI and this PEP (to my understanding) is concerned about. This is the next paragraph in the typing spec after the one you quoted:

Liz · September 9, 2024, 8:24pm

While the typing spec allows the name of the distribution to be different, this isn’t how people actually distribute stub packages, and will just lead to further confusion of end users. It’s already surprising to most users when these differ.

ofek · September 9, 2024, 8:27pm

This is not true and here is a maintainer explicitly saying that this feature would decrease the chances that users are exposed to malicious packages.

mikeshardmind · September 9, 2024, 8:31pm

Okay, clearly we’re talking past each other here. The typeshed distributes under types-, users may distribute with a postfix -stubs, There are far more packages distributed under the latter than the former because type checkers actually just vendor the typeshed in practice.

The typeshed doesn’t benefit from this because type checkers vendor the typeshed.
Other uses don’t benefit because other uses are told to use the postfix and not the prefix, a pypi search was linked above showing the impact of that.

Jelle · September 9, 2024, 8:39pm

Type checkers vendor typeshed’s standard library, but not necessarily the third-party packages (which is what this conversation seems to be about). Mypy, for one, does not vendor third-party packages in typeshed, though other type checkers might.

Liz · September 9, 2024, 8:40pm

The typeshed has published 234 packages that conform to the prefix types- (see: Profile of typeshed_bot · PyPI)

I can’t get a good number from the builtin pypi search, but there’s over a thousand stub only packages that took the time to use the right classifier to be filterable by that that match “stubs” Search results · PyPI

Various stub only packages I personally use aren’t marked with that classifier, such as asyncpg-stubs · PyPI

There’s over 3000 matches for just “stubs”, but a few of those are definitely not typing stubs.

mikeshardmind · September 9, 2024, 8:47pm

Realistically, the stubs situation isn’t even the only case where this comes into play. I don’t really want to get further into strict semantics of that, the reality on the ground right now is that there are plenty of namespaces that people might think a company would want to register that would require grandfathering.

despite the documented policy for typeshed provided stub packages, there are types- packages that predate that policy.

This removes the ability for users to trust this as a security feature, which in turn means we should stop talking about it as if it is one. Users cannot trust from a prefix alone that they are getting packages from a specific provider, they would need to check the status of a package as belonging to the organization they expect.

pf_moore · September 9, 2024, 9:44pm

OK, I’ll attempt to do that.

The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership.

Incorrect. Using an informal naming convention is an initial indicator, and combining this with checking the project author detail achieves this. It might not be simple to do this, but the proposed mechanism, with the need to allow pre-existing projects to remain in a reserved namespace, honestly doesn’t seem that much better.

(Typeshed has been discussed to death, so I’ll skip that example)

Major cloud providers like Amazon, Google and Microsoft have a common prefix for each feature’s corresponding package

Have they indicated that the current situation is a problem for them? Can representatives participate in this thread to make their case? I don’t think it’s unreasonable to expect community participation if they want to support this feature. And conversely, if they aren’t willing to engage with the community, I don’t think we should be expected to infer and implement their requirements.

Many projects support a model where some packages are officially maintained and third-party developers are encouraged to participate by creating their own. For example, Datadog offers observability as a service for organizations at any scale.

I’m sorry, and I don’t want to suggest any hidden agenda either on your part or on Datadog’s, but I honestly think that using Datadog as a motivating example in a PEP that’s being proposed directly as part of work that they are funding, is ill-advised. If the proposal has merit, then finding other examples would be much less controversial. This isn’t a matter of distrust, it’s simply a case of ensuring that the motivation section of the PEP is as compelling as possible to the audience reading it - which is mostly (at this point) the open source volunteer community^[1].

Such projects are uniquely vulnerable to name-squatting attacks which can ultimately result in dependency confusion.

We have PEP 708 which is designed to mitigate dependency confusion. This proposal needs to describe how it relates to that one. My impression is that this is more about typosquatting attacks, and as others have said, it only addresses vulnerabilities related to one very particular class of typo.

For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration.

It’s also entirely reasonable to assume that 3rd parties could produce safe, useful integrations. By blocking of the “official” namespace, those 3rd party integrations would have to use unrelated names (unless Datadog creates some form of “contrib” area, which is then just as vulnerable to malicious projects as the current open namespace). So users looking for integrations will get used to the idea of useful integrations having names outside of the Datadog namespace, negating the benefit of having an “official” namespace.

To be clear, I see the advantage here, I just think it’s relatively small, and not compelling, because it’s at best a partial mitigation for a small class of attacks. I think we should aim to do better than this, if we want to address this problem.

Namespacing also would drastically reduce the incidence of typosquatting because typos would have to be in the prefix itself which is normalized and likely to be a short, well-known identifier like aws-.

Do you have any evidence to support this? You mention the cupy-cuda case, but that seems to have been one instance (registering a lot of packages, certainly, but still only a single attack). I’d imagine there are at least as many typosquatting attacks against popular projects that don’t use a namespace-style name (like requests or flask). Without data, this statement is basically just speculation.

In contrast, the PEP says basically nothing about the potential risks of the proposal. It assumes “corporate organisations” are stable entities, and can be trusted to make reasonable use of the feature. While this may be true for the given examples, it ignores (for example) the possibility of ill-conceived startups grabbing a namespace based on speculative plans and VC funding, then burning out or changing direction and leaving a bunch of reservations that (as stated) the PEP has no means of dealing with.

Hopefully, the above is useful. I’m not a fan of this proposal as it stands, but I know there’s a need for something in this area. So if this helps in developing a more acceptable solution, then that’s great. And TBH, if it helps to clarify weaknesses in the existing proposal, even if that means the PEP fails and we’re left still looking for a solution, I’m OK with that outcome as well.

Nope, I’m not going to do this. I’m not at all convinced that “minimal ecosystem disruption” is the key constraint here. If the problem’s important enough to need a solution, we should come up with the best solution we can, and work out how to handle the ecosystem disruption, not settle for a suboptimal solution just because it can be shoehorned into the existing infrastructure.

And doing nothing is absolutely a valid alternative. After all, that’s what PEP rejection would be. So my proposed alternative at the moment is “reject the PEP, i.e. do nothing until a better proposal comes along”.

If I have to discuss alternatives, I think we should be taking a much longer-term view. That may not fit well with the amount of time you have funded to work on this, but I don’t think that should be the driver here. If we need to implement a solution that needs a multi-year migration process, then that’s fine. We’ve done that many times before, and it’s worked perfectly well. Slower than anyone might like, but the end result is what matters.

Again, if the companies this PEP is aiming to help were more active in the community, maybe that wouldn’t be as significant an issue ↩︎

brettcannon · September 10, 2024, 12:25am

Without officially speaking for my employer, there is a desire to lower the risk of someone mistaking some random person’s package as being official.

For instance, let’s say you’re trying to figure out how to log something in Azure. You might find a blog post that says, “install azure-logging”. Well, azure-logging is not from Microsoft. And actually there are a bunch of packages that have azure- as a prefix and “logging” in their name. A search for [azure-logging] on PyPI lists a bunch of stuff with an azure- prefix that isn’t from MS; the 15th package listed is the first official package from MS.

Now you can say that you should always check that your package is owned by Profile of microsoft · PyPI and/or Profile of azure-sdk · PyPI to know it’s 1st-party, but we all know people don’t go to PyPI directly all the time to check for this stuff. So having something visible in your pyproject.toml or lock file would be good to have, and the project name is the easiest thing in this case.

And this matters if some azure- prefix package is malicious, buggy, etc. and people naively blame MS for the problem caused by the package.

petersuter · September 10, 2024, 6:25am

To me it sounds like specifying something like pip install azure-loganalytics by microsoft (or equivalent syntax) and automatically confirm the account would be more useful.

Even that would only move the trust problem from the package to the account. How does one know the microsoft account is actually Microsoft?

Something like the DNS verified domain name linking takluyver suggested above would be a great solution. And the DNS verification doesn’t have to affect the package name; it could be linked to the account name.

And this would not be breaking anything because you could still verify it manually and omit the by microsoft if your tool does not support it yet.

pf_moore · September 10, 2024, 7:09am

But a key aspect of this proposal is that these packages would remain, so you’d still have this issue, surely?

Liz · September 10, 2024, 8:10am

It sounds like this case would benefit from actual namespacing and would not benefit from this pep. This pep isn’t getting rid of the existing azure-logging and other packages, but actual new namespacing by both org and account could signal this accurately.

ncoghlan · September 10, 2024, 8:36am

I found this idea intriguing enough to start a dedicated thread to discuss it separately from the namespace prefix PEP threads: Establish publisher authority via automated DNS backed challenges?

steve.dower · September 10, 2024, 3:28pm

I’ll speak officially: Brett is right

We would likely want to take full ownership of the azure- namespace, and suggest that third-parties use an -azure suffix instead. The main aim is to automatically protect users from attempted typosquats, though. We’re not so concerned about non-malicious users, but don’t really want to burden the PyPI team with having to evaluate each one.

If the team would prefer to scan and assess all new packages under azure- for malicious intent, rather than simply saying “Microsoft automatically asserts their ‘Azure’ trademark over the whole namespace”, then we would live with that. But it seems like a poor use of volunteer time.

(FTR, we’ve requested a few specific typosquats in this namespace be preemptively blocked, and have actively pursued one case of deliberately hijacking an actual name we intended to use, but have pursued more outside this namespace than within.)

I like this. pip install microsoft::azure-loganalytics or microsoft@azure-loganalytics might be hijacking the syntax from something potentially better, but interpreting it as “package must have this user as owner/maintainer” would likely be better than the whole namespace.

We’d publish on our own site (probably Python | Microsoft Developer) that it’s our account.^[1] Additional DNS verification might be neat, but simply acknowledging that it’s our account would likely also be sufficient.

I’m 99% sure we used to have it there, as a “check out our packages” link, but it clearly didn’t survive one of the many rewrites of that page. ↩︎