PEP 755: Implicit namespace policy for PyPI

Your motivation section is missing some crucial details, and is outright misleading in my opinion.

  • Typeshed is a community effort to maintain type stubs for various packages. The stub packages they maintain mirror the package name they target and are prefixed by types-. For example, the package requests has a stub that users would depend on called types-requests.

The issue with this is the convention currently is postfix with -stubs, not prefix with types, this basically inverts whether or not this idea makes sense.

Search results Ā· PyPI shows this pretty clearly.

The alternative of ā€œdo nothingā€ as you put it actually seems preferable to me. I donā€™t see this pep as providing benefit, but it will increase the work that needs to be done to support it. Iā€™d rather see the effort invested in explicit namespaces, which would introduce less administrative overhead as a result, as there is no scarcity there (at least, if namespacing is done tied to users/orgs, thereā€™s no new scarcity). This also leaves it as something thatā€™s just available to community projects without having to have a policy of when a community project is ā€œlarge enoughā€ or ā€œimportant enoughā€ that it gets to count :roll_eyes:

I also think itā€™s disingenuous to point to a history of people asking for namespaces, and not differentiate whether those users would consider this appropriate when this is an implicit option that isnā€™t really namespacing but reserved prefixes.

6 Likes

This isnā€™t just convention, itā€™s actually a documented requirement for distributing type information as a stub-only package

For package maintainers wishing to ship stub files containing all of their type information, it is preferred that the *.pyi stubs are alongside the corresponding *.py files. However, the stubs can also be put in a separate package and distributed separately. Third parties can also find this method useful if they wish to distribute stub files. The name of the stub package MUST follow the scheme foopkg-stubs for type stubs for the package named foopkg .

https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages

but thereā€™s nothing that enforces that a package named this would be behaving and not contain code. So existing standards, not just conventions, conflict with this pep

2 Likes

Would this PEP actually handle the situation described in that particular comment? It seems to be talking about typos, not namespaces. As I understand it, this PEP in itself would not prevent someone from creating a package called requets package name, nor would it prevent them from registeringgoogel- as a prefix, nor would it even prevent someone from uploading a bunch of malicious google- packages as long as they did so before someone actually registered the google- prefix. Those things may indeed be preventable by sharp-eyed PyPI admins now, and will remain so under this proposal, but the question is how much is the marginal gain from this proposal if it still requires manual vetting to prevent typosquatting.

I agree itā€™s clear from that issue thread that there is desire for namespacing of some kind. But like others have posted here, Iā€™m not entirely convinced by the PEP that this specific proposal is going to be useful enough to outweigh its downsides. Just because people want namespacing ā€œof some sortā€ and this PEP proposes namespacing of some sort doesnā€™t necessarily mean those two sorts will be the same.

4 Likes

If NVIDIA owned the cupy- prefix then it would prevent typo-squatting of those packages. But it wouldnā€™t prevent typo-squatting on cuppy- or something.

So yes and no: it prevents some malicious uploads but not all of them.

Yes, I think it would improve the quality of the proposal(s) if there was a list of concrete examples of the current issues and how this proposal would improve the situation.

Like others I am not sold on the idea that prefixes will help protect against dependency confusion and typosquatting. If we had had prefixes or real namespaces right from the start, for example: pip install 'ofek/hatch', then of course I could add ofek to some kind of allow-list in my pip config or in my corporate PyPI proxy settings so that I would never end up downloading and installing okef/hatch.

2 Likes

At the risk of repeating an old argument, but: why not a solution where a regular PyPI package foo can be marked as a namespace package in its PyPI metadata. A request for such a PyPI namespace package could go into the review queue, as per the current proposal. If accepted, only the owners of foo would be able to register PyPI packages foo.* going forward. This would bind namespaces to actual packages, so that, e.g., first-come-first-served and the existing policy on trademarks apply.

It would also make it so that PyPI namespaces should, in many cases, mirror the actual installed package namespaces (e.g., PyPI package foo installs namespace package foo, PyPI package foo.bar installs package foo.bar).

1 Like

The typing spec is talking about import packages, not distribution packages, which is what PyPI and this PEP (to my understanding) is concerned about. This is the next paragraph in the typing spec after the one you quoted:

3 Likes

While the typing spec allows the name of the distribution to be different, this isnā€™t how people actually distribute stub packages, and will just lead to further confusion of end users. Itā€™s already surprising to most users when these differ.

This is not true and here is a maintainer explicitly saying that this feature would decrease the chances that users are exposed to malicious packages.

Okay, clearly weā€™re talking past each other here. The typeshed distributes under types-, users may distribute with a postfix -stubs, There are far more packages distributed under the latter than the former because type checkers actually just vendor the typeshed in practice.

The typeshed doesnā€™t benefit from this because type checkers vendor the typeshed.
Other uses donā€™t benefit because other uses are told to use the postfix and not the prefix, a pypi search was linked above showing the impact of that.

Type checkers vendor typeshedā€™s standard library, but not necessarily the third-party packages (which is what this conversation seems to be about). Mypy, for one, does not vendor third-party packages in typeshed, though other type checkers might.

4 Likes

The typeshed has published 234 packages that conform to the prefix types- (see: Profile of typeshed_bot Ā· PyPI)

I canā€™t get a good number from the builtin pypi search, but thereā€™s over a thousand stub only packages that took the time to use the right classifier to be filterable by that that match ā€œstubsā€ Search results Ā· PyPI

Various stub only packages I personally use arenā€™t marked with that classifier, such as asyncpg-stubs Ā· PyPI

Thereā€™s over 3000 matches for just ā€œstubsā€, but a few of those are definitely not typing stubs.

Realistically, the stubs situation isnā€™t even the only case where this comes into play. I donā€™t really want to get further into strict semantics of that, the reality on the ground right now is that there are plenty of namespaces that people might think a company would want to register that would require grandfathering.

despite the documented policy for typeshed provided stub packages, there are types- packages that predate that policy.

This removes the ability for users to trust this as a security feature, which in turn means we should stop talking about it as if it is one. Users cannot trust from a prefix alone that they are getting packages from a specific provider, they would need to check the status of a package as belonging to the organization they expect.

2 Likes

OK, Iā€™ll attempt to do that.

The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership.

Incorrect. Using an informal naming convention is an initial indicator, and combining this with checking the project author detail achieves this. It might not be simple to do this, but the proposed mechanism, with the need to allow pre-existing projects to remain in a reserved namespace, honestly doesnā€™t seem that much better.

(Typeshed has been discussed to death, so Iā€™ll skip that example)

Major cloud providers like Amazon, Google and Microsoft have a common prefix for each featureā€™s corresponding package

Have they indicated that the current situation is a problem for them? Can representatives participate in this thread to make their case? I donā€™t think itā€™s unreasonable to expect community participation if they want to support this feature. And conversely, if they arenā€™t willing to engage with the community, I donā€™t think we should be expected to infer and implement their requirements.

Many projects support a model where some packages are officially maintained and third-party developers are encouraged to participate by creating their own. For example, Datadog offers observability as a service for organizations at any scale.

Iā€™m sorry, and I donā€™t want to suggest any hidden agenda either on your part or on Datadogā€™s, but I honestly think that using Datadog as a motivating example in a PEP thatā€™s being proposed directly as part of work that they are funding, is ill-advised. If the proposal has merit, then finding other examples would be much less controversial. This isnā€™t a matter of distrust, itā€™s simply a case of ensuring that the motivation section of the PEP is as compelling as possible to the audience reading it - which is mostly (at this point) the open source volunteer community[1].

Such projects are uniquely vulnerable to name-squatting attacks which can ultimately result in dependency confusion.

We have PEP 708 which is designed to mitigate dependency confusion. This proposal needs to describe how it relates to that one. My impression is that this is more about typosquatting attacks, and as others have said, it only addresses vulnerabilities related to one very particular class of typo.

For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration.

Itā€™s also entirely reasonable to assume that 3rd parties could produce safe, useful integrations. By blocking of the ā€œofficialā€ namespace, those 3rd party integrations would have to use unrelated names (unless Datadog creates some form of ā€œcontribā€ area, which is then just as vulnerable to malicious projects as the current open namespace). So users looking for integrations will get used to the idea of useful integrations having names outside of the Datadog namespace, negating the benefit of having an ā€œofficialā€ namespace.

To be clear, I see the advantage here, I just think itā€™s relatively small, and not compelling, because itā€™s at best a partial mitigation for a small class of attacks. I think we should aim to do better than this, if we want to address this problem.

Namespacing also would drastically reduce the incidence of typosquatting because typos would have to be in the prefix itself which is normalized and likely to be a short, well-known identifier like aws-.

Do you have any evidence to support this? You mention the cupy-cuda case, but that seems to have been one instance (registering a lot of packages, certainly, but still only a single attack). Iā€™d imagine there are at least as many typosquatting attacks against popular projects that donā€™t use a namespace-style name (like requests or flask). Without data, this statement is basically just speculation.

In contrast, the PEP says basically nothing about the potential risks of the proposal. It assumes ā€œcorporate organisationsā€ are stable entities, and can be trusted to make reasonable use of the feature. While this may be true for the given examples, it ignores (for example) the possibility of ill-conceived startups grabbing a namespace based on speculative plans and VC funding, then burning out or changing direction and leaving a bunch of reservations that (as stated) the PEP has no means of dealing with.

Hopefully, the above is useful. Iā€™m not a fan of this proposal as it stands, but I know thereā€™s a need for something in this area. So if this helps in developing a more acceptable solution, then thatā€™s great. And TBH, if it helps to clarify weaknesses in the existing proposal, even if that means the PEP fails and weā€™re left still looking for a solution, Iā€™m OK with that outcome as well.

Nope, Iā€™m not going to do this. Iā€™m not at all convinced that ā€œminimal ecosystem disruptionā€ is the key constraint here. If the problemā€™s important enough to need a solution, we should come up with the best solution we can, and work out how to handle the ecosystem disruption, not settle for a suboptimal solution just because it can be shoehorned into the existing infrastructure.

And doing nothing is absolutely a valid alternative. After all, thatā€™s what PEP rejection would be. So my proposed alternative at the moment is ā€œreject the PEP, i.e. do nothing until a better proposal comes alongā€.

If I have to discuss alternatives, I think we should be taking a much longer-term view. That may not fit well with the amount of time you have funded to work on this, but I donā€™t think that should be the driver here. If we need to implement a solution that needs a multi-year migration process, then thatā€™s fine. Weā€™ve done that many times before, and itā€™s worked perfectly well. Slower than anyone might like, but the end result is what matters.


  1. Again, if the companies this PEP is aiming to help were more active in the community, maybe that wouldnā€™t be as significant an issue ā†©ļøŽ

14 Likes

Without officially speaking for my employer, there is a desire to lower the risk of someone mistaking some random personā€™s package as being official.

For instance, letā€™s say youā€™re trying to figure out how to log something in Azure. You might find a blog post that says, ā€œinstall azure-loggingā€. Well, azure-logging is not from Microsoft. And actually there are a bunch of packages that have azure- as a prefix and ā€œloggingā€ in their name. A search for [azure-logging] on PyPI lists a bunch of stuff with an azure- prefix that isnā€™t from MS; the 15th package listed is the first official package from MS.

Now you can say that you should always check that your package is owned by Profile of microsoft Ā· PyPI and/or Profile of azure-sdk Ā· PyPI to know itā€™s 1st-party, but we all know people donā€™t go to PyPI directly all the time to check for this stuff. So having something visible in your pyproject.toml or lock file would be good to have, and the project name is the easiest thing in this case.

And this matters if some azure- prefix package is malicious, buggy, etc. and people naively blame MS for the problem caused by the package.

2 Likes

To me it sounds like specifying something like pip install azure-loganalytics by microsoft (or equivalent syntax) and automatically confirm the account would be more useful.

Even that would only move the trust problem from the package to the account. How does one know the microsoft account is actually Microsoft?

Something like the DNS verified domain name linking takluyver suggested above would be a great solution. And the DNS verification doesnā€™t have to affect the package name; it could be linked to the account name.

And this would not be breaking anything because you could still verify it manually and omit the by microsoft if your tool does not support it yet.

3 Likes

But a key aspect of this proposal is that these packages would remain, so youā€™d still have this issue, surely?

5 Likes

It sounds like this case would benefit from actual namespacing and would not benefit from this pep. This pep isnā€™t getting rid of the existing azure-logging and other packages, but actual new namespacing by both org and account could signal this accurately.

3 Likes

I found this idea intriguing enough to start a dedicated thread to discuss it separately from the namespace prefix PEP threads: Establish publisher authority via automated DNS backed challenges?

4 Likes

Iā€™ll speak officially: Brett is right :wink:

We would likely want to take full ownership of the azure- namespace, and suggest that third-parties use an -azure suffix instead. The main aim is to automatically protect users from attempted typosquats, though. Weā€™re not so concerned about non-malicious users, but donā€™t really want to burden the PyPI team with having to evaluate each one.

If the team would prefer to scan and assess all new packages under azure- for malicious intent, rather than simply saying ā€œMicrosoft automatically asserts their ā€˜Azureā€™ trademark over the whole namespaceā€, then we would live with that. But it seems like a poor use of volunteer time.

(FTR, weā€™ve requested a few specific typosquats in this namespace be preemptively blocked, and have actively pursued one case of deliberately hijacking an actual name we intended to use, but have pursued more outside this namespace than within.)

I like this. pip install microsoft::azure-loganalytics or microsoft@azure-loganalytics might be hijacking the syntax from something potentially better, but interpreting it as ā€œpackage must have this user as owner/maintainerā€ would likely be better than the whole namespace.

Weā€™d publish on our own site (probably Python | Microsoft Developer) that itā€™s our account.[1] Additional DNS verification might be neat, but simply acknowledging that itā€™s our account would likely also be sufficient.


  1. Iā€™m 99% sure we used to have it there, as a ā€œcheck out our packagesā€ link, but it clearly didnā€™t survive one of the many rewrites of that page. ā†©ļøŽ

8 Likes