PEP 755: Implicit namespace policy for PyPI

petersuter · September 10, 2024, 6:05pm

Yes, in a way it would be sufficient (even without the “by microsoft” CLI assertion feature) if it’s easy to find e.g. could it be linked from Profile of microsoft · PyPI somehow?
But to make the CLI assertion feature much more useful (i.e. assure the user that manually confirming it every time is not needed) that should only work if the account is verified somehow.
Maybe the account name should match the domain name to make it even clearer?

To reduce PyPI admin resources I would have guessed automated DNS verification would be nice. (Or maybe some other form of verification is already part of paid organization accounts?)

sinoroc · September 10, 2024, 6:07pm

Yes, intuitively, this seems better than the current proposal to me. But I guess this should be discussed on PEP752.

steve.dower · September 10, 2024, 6:33pm

We’re getting far enough off the original topic that we probably shouldn’t take it further, but I don’t think DNS verification changes anything, either for the original proposal or this side one.

Either you trust the publisher/namespace, or you don’t. There’s no way to automatically verify whether the consumer trusts it - they need to indicate that in some way. An installer or index can only check against what the user specifies, not against other ambient information.

oscarbenjamin · September 10, 2024, 6:42pm

Maybe it could be more like the way that ssh asks if you trust the first time you connect to a new host but then stores it in known_hosts sort of like:

$ pip install azure-loganalytics
...
Package azure-loganalytics is published by organisation microsoft@pypi.org which is not in trusted organisation list.

Organisation: microsoft@pypi.org
Org page: https://pypi.org/microsoft
Verified Homepage: microsoft.com/some/where

Do you want to add microsoft@pypi.org to trusted organisations (yes/no)?

Then once you’ve accepted microsoft@pypi.org once you don’t get asked about it the next time you pip install something from the same organisation. Then if you do pip install a typosquatted package it becomes more noticeable because you are being asked about a new organisation.

Obviously you would need a -y flag to force install and some way to pass a list of trusted organisations like -t trusted_orgs.txt.

bwoodsend · September 10, 2024, 8:35pm

That would require all non-organisation packages to go through this ssh style prompt too though, right? Otherwise pip install azure-unofficial-package without an organisation would sail through unchallenged?

jamestwebber · September 10, 2024, 8:44pm

I would hope that this is an opt-in security setting, so most people wouldn’t worry about it.

But I’m still not sure how the users of it would deal with the fact that lots of their “official” packages depend on unofficial, third-party dependencies. I guess it could only apply to the top-level, actually-requested package.

oscarbenjamin · September 10, 2024, 10:04pm

I was actually imagining that you would get asked about the organisation for every package unless you pass -y in which case you get the current trust everything behaviour. If you want to protect against typo-squatting then somehow you need to have installers that don’t just automatically install whatever the user types by default.

There would be an incentive then for non-corporate projects to batch up into organisations and provide some more aggregated layer of trust in the packages that are on PyPI so that rather than “do you trust numpy?” it is more like “do you trust scientific-python?” which would comprise many common packages. I’m not sure if PyPI’s current organisations are exactly the right design for that though (I don’t actually understand how they work).

Of course you can still install some random package but you will (rightly IMO) be asked:

Do you want to add rando123@pypi.org to trusted authors (yes/no)?

It may be that the best default behaviour for most users is that if you trust a package (because you trust its org/author) then you implicitly trust its entire transitive dependency stack. If the focus is on typo-squatting then the primary benefit is just ensuring that the actual name typed is a trusted package. Obviously others will want to have some other option like --max-security-trust-no-one so that nothing that has not been listed in trusted.txt is ever allowed.

ilotoki0804 · September 11, 2024, 4:03am

I think the most appropriate solution is for everyone to have their own namespace, like Github or Docker.

Unlike the proposed approach, this would require a transition, but it could be a long-term solution to the malware and name collusion problem.
If Python is truly looking long term, I believe this is the best approach.

The transition process would be as follows:

Allow packages to be downloaded as <owner name>/<package name>.
Deprecate downloading packages with just <package name> and warn users who install packages that way.
Prohibit downloading packages as <package name>.
If there is no package in PyPI with an exact matching name, allow the package to be created with that name. Currently, it will refuse to create the package even if a package with a similar name exists.
Allow the package to be created no matter what other packages exist in PyPI.

In addition, we could provide “certified packages,” like Docker Official Images, which allow them to be installed without an owner name.
This would minimize the transition burden.
Transition steps 2 and 3 will not affect certified packages.

Docker Official Images - a curated set of Docker repositories, serve as the starting point for the majority of users, and are some of the most secure on Docker Hub.

barry · September 11, 2024, 4:10pm

If this was an approach we adopted, we’d still have the potential for namespace conflicts^[1] at the top level, although perhaps it would be less common.

Also currently users and orgs are in different namespaces, so combining them at the top level (i.e. <owner name>) would have to be worked out.

claim contentions, typosquatting etc. ↩︎

bwoodsend · September 11, 2024, 10:12pm

I’d be very sad if this happened. One of my favourite things about PyPI is that it is a flat namespace. I can’t confuse or forget whether its bill/package or bob/package. It also, coupled with the convention of distribution_name == module_name pattern which, despite just being a convention, has kept me from ever facing the issue of not being able to install two packages simultaneously because they have overlapping filenames.

I can understand people wanting namespaces but I think any approach that takes that simplicity away from what a user sees is not worth it.

BrenBarn · September 12, 2024, 2:00am

I appreciate that the PEP has been split into two. Of course they are still sort of interrelated in that it wouldn’t make sense to accept 755 in isolation, and the utility of 752 in isolation is unclear. Some aspects of the motivation also straddle the boundary. Most of my concerns lean toward the policy side so I’m going to reply in this thread but some of what I’m responding to is in the other thread or the other PEP. In particular, I think that concerns about the motivation or “what problems does this solve” are inherently policy-driven because, in practice, all of the issues come down to humans understanding “what will PyPI do”, and technical issues of how that happens are more in the background.

In PEP 755 the Motivation starts with this:

The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership.

I think a lot of the discussion indicates we need to be a bit more detailed about that. As in, what exactly are we signalling, and who is supposed to be receiving those signals, and how are they expected to make use of them.

In my view, the main useful thing would be signals to human beings installing packages, namely signals that ultimately boil down to “you can/cannot trust this package”. Some such signals may be mediated by something like “this package is/is not approved by this person/entity” (and then it’s up to the person using the tool to decide if they trust that entity). Some of the stuff in the PEPs (especially the controversial stuff about a corporate/community “double standard”) seems more to do with benefiting the organizations that upload the packages, and I see that as a way less important goal.

Given that, as I see it the proposal has two big problems:

The gaps in the namespace guarantees are so huge that it seems to me they won’t solve enough of the problem to be useful. One such gap is the grandfathering-in of existing packages, and the other is the “open” namespaces, which will still allow unrestricted uploads. With these gaps, the amount of information that a user can get just by looking at the name of a package foo-bar seems likely to be barely more than it is today.
The fact that all the information is going to only be displayed on PyPI webpages reduces the utility even more. A vast number of people install packages without visiting their PyPI page, or visit the page only long enough to copy the pip install info from the header. These people will derive no benefit from any kind of “verification” info that only shows up on PyPI project pages.

Like others I also have concerns about the differential treatment of company vs. community projects. One reason this can be tricky is that, if part of the rationale is to provide a funding stream for PyPI (and it is, then the decision about whether to approve this PEP becomes in part a financial one. Suppose we had a really limited namespace system where companies could pay $1 billion to reserve a prefix. Even if the system was totally useless for small-time package maintainers, as long as it didn’t actively harm them, it might still be a net win if we could just get one or two companies to cough up, because that money could be used to improve the PyPI experience for everyone else in various ways.

On the other hand, even a more full-featured system could wind up not being worth it if the fees were too low. The namespacing could become a victim of its own success if it results in a flood of namespace requests that winds up requiring more PyPI staff time than can be funded with the resulting fees.

Those are extreme scenarios, but the point is that I feel it’s difficult to evaluate a proposal like this without having some kind of concrete estimate of how much money it will bring in relative to how much work it will create. And of course even having to consider that as part of the PEP process feels a bit weird in itself. . .

Aside from that, I’ll echo some other people’s concerns about the two-tiered system for company vs. community names. It doesn’t sit right with me. Along the lines I said above, I could maybe be convinced it was worth it if we had a solid basis for believing we could raise a large amount of money while not inconveniencing anyone except a few big companies paying big fees. But at least for me, the burden of proof there is on the proposal — unless we have some clear and convincing rationale for believing we can do that, we should assume we cannot, and in that case the two big problems I mentioned above are enough to make me -1 on the idea.

Maybe the most egregious part of the policy proposal to me is the idea of forbidding a viewable global namespace registry “because this has the potential to leak private information such as upcoming products”. My feeling is that, whatever gets decided, it really should be presented to companies as a take-it-or-leave-it scenario. Like, if you want to keep your product name private, that’s fine, but then you can’t reserve it; or if you want to reserve it, great, but then you can’t keep it private.

This comes down to what I said about motivations. Helping individual Python users be more confident about the packages they install is a worthwhile goal for PyPI. Keeping some hypothetical CEOs happy about the secrecy of their products is not.

Liz · September 12, 2024, 4:27am

I almost like this. I think it needs to be microsoft@pypi.org::azure-loganalytics the entire prefix argument falls apart in the multi-index situation (pip’s --extra-index), this doesn’t have to.

From where I’m sitting, this is the best handling I’ve seen suggested, it handles all of the issues and provides stronger information to users without needing any new administrative policies. I’d be happy to help contribute time and code to make this happen.

ofek · September 13, 2024, 3:46am

I’m going to post an update soon but I would like to briefly comment on this:

I’ll also speak officially on behalf of my employer. We would pay to reserve datadog on the order of what Steve expressed here:

ofek · September 17, 2024, 4:24am

Since last time, I landed a PR which did the following:

Improved wording in the rationale to make it clear that community organizations can reserve namespaces.
Used “paid” organization over “corporate” organization to ease understanding for non-native speakers and reduce concern among those who consider the word corporate to have a negative connotation.
Removed expectation that grants for communities should be open.
Improved language around why reviews for paid organizations are prioritized.
Included PEP 541 in the expected teaching process.

I missed this and will include directives to make that easy in the next PR, thanks!

Paid organizations I think should have that ability. If you want I can make it optional and opt-in by default but there are operational challenges to exposing such a page. I’ll ask someone to comment on this (if they are able) who works on a similar project to this proposal.

Done! PEP 752: Implicit namespaces for package repositories - #46 by ofek

I added a buy-in section and am trying to have more people comment.

It’s a trade-off. Official packages have (unless mistakes happen) a 0% chance of distributing malicious code whereas unofficial packages do not have the same guarantee. I think the benefit outweighs the minor inconvenience of having to be cautious even for popular unofficial packages.

This will be in the next PR, thanks!

I don’t think the grandfathering-in will be very impactful because the PEP 541 process may be judiciously used in applicable scenarios such as bad actors and unmaintained packages with no users. What remains would dwindle over time.
The open namespace concept doesn’t detract from the guarantee because the most common scenario I envision would be a private root grant with an open child such as namespace-contrib.

This is my thinking as well although it will be significantly more than just two companies. I’m trying to have people repeat publicly in this discussion what was spoken about in private but it’s tricky

It’s possible that this gives our community enough funding to attempt work on the explicit namespaces feature in future.

pf_moore · September 17, 2024, 7:34am

That specifically seems like a bad outcome. If the expected result of this feature is to generate revenue to fund its replacement, that suggests that the stated reasons for preferring this option are wrong, and the real reason is “because it’s cheaper”.

I’d much rather the expected revenue is targeted at other improvements, not at fixing the flaws in this one.

takluyver · September 17, 2024, 9:04am

Sorry, but I’m -1 on the pair of PEPs - 752 & 755. They’ve got better since the initial version, but I still don’t think they’re good, and it feels like the priority is to do something quickly rather than taking the time to do it right.

It still feels like the open communities that are such a big part of Python’s success would be a second class citizen to paying companies - companies are invited to reserve prefixes without using them, and there’s deliberately no transparency around that.
Related to the above, there’s nothing about when & how a prefix can be taken back without the owner choosing to release it. “Never” is a valid option, but not one I’d agree with - e.g. if a startup reserves a nice prefix and then goes bust, I’d definitely want PyPI to open it back up. I’d expect hashing out a compromise on this to be tricky, but I don’t think the policy PEP can just ignore the whole question.
The special treatment of pre-existing projects in a newly restricted prefix weakens the security properties of this model and makes it harder to understand (‘how to teach this’ doesn’t cover ‘how to stop people assuming the obvious but wrong thing’). I appreciate that all the alternatives have substantial drawbacks as well, but it seemed like they were ruled out without much discussion - apologies if that discussion happened somewhere and I didn’t notice.
It seems to be a major goal of this proposal to raise money for PyPI, but it’s not clear how much money it would raise (despite point 1). How many companies want to reserve a namespace? How many of those don’t already pay for an organisation? Would they keep paying for it, or can you pay for a year, reserve the namespace you want, and then keep it after you stop paying? And would any of this jeopardise the donated resources that PyPI depends on?

I think some kind of package namespacing on PyPI would be a useful way to make it easier to see who you’re trusting. Done carefully, it could even raise money as well. But we probably only get one go at introducing it.

ofek · September 17, 2024, 12:57pm

The flat namespace would never go away and that is specifically what this proposal targets.

That’s not the intention but rather more leniency is given when reviewing. I’ll see what I can do to update the language around that.

As I mentioned this will come in the next PR.

BrenBarn · September 17, 2024, 8:24pm

That’s debatable. Tons of companies routinely distribute software to end users that many would characterize as malware, or at least as sneaky and undesirable. The mere fact that a package has official status under a particular namespace doesn’t tell us anything about the likelihood of it being shady. As was mentioned earlier, trusting the package has less to do with the name and more to do with the authors.

In a way that’s even more worrisome, as it suggests that PEP 541 would be used to “clear the way” for official namespaces by weeding out small-timers who happened to be grandfathered in. Even assuming that’s not what you meant, I think assuming the problem of grandfathered-in names will solve itself is too optimistic to justify such a foundational aspect of the PEP.

But what we envision as being a common scenario cannot substitute for a guarantee. We need to consider a range of possible futures, not just hope that the feature will be used in the way we hope.

ofek · September 17, 2024, 8:43pm

Do you agree with my two examples of valid reasons for someone to relinquish their claim over a package? To reiterate:

A user publishing malicious code or causing equivalent harm
A user not responding to correspondence regarding their package that has no users

BrenBarn · September 18, 2024, 7:15am

I certainly agree with #1 (and I hope that’s uncontroversial! ).

With #2, it depends. I was writing a longer explanation here but the simple answer is I don’t support having different rules for abandoned or unused packages just in the case where someone wants to use the new name themselves. And I especially don’t support having different rules just in the case where someone is willing to pay to get control of the name. In other words, if we want to have a process under which old packages are cleaned up, we should clean them up automatically under specified conditions regardless of whether someone is trying to claim the name.

I feel basically the same way with regard to namespaces. If some user uploaded some (working) package to display a pair of googly eyes and called it google-eyes, I am not in favor of Google being able to steal or invalidate that package name for their own new use unrelated to that purpose just on the basis of being willing to pay to do so, even if the original package hasn’t been updated or downloaded in fifty years.