PEP 752: Implicit namespaces for package repositories

mikeshardmind · September 11, 2024, 11:42pm

The two are one and the same. I’ve provided examples, the pep itself provides existing examples, others have provided examples. Python’s ecosystem has gotten to where it is with people naming things that interact with the service often based on the service it interacts with or package it extends functionality of.

npm added full namespacing and kept the flat global namespace. This allows for packages to have the same name.

As it was presented, this is simply a tooling side validation that the package came from that user and no more, and therefore is never required because there are no colliding package names still. (And without the colliding package names, you can’t just typo the user account name, you’d need to typo both that and the package, similar to how prefixes claim to work to help here.) It’s an optional way of specifying it, though it functionally allows a company concerned about this to recommend installing their packages with that syntax.

There’s also other suggestions in the 755 thread, such as being able to have users you trust (think like ssh known keys, but for remote index uploaders), and be informed when a package is sourced from a user you haven’t said you trust, that would also catch the unintended case. The community has come up with many other solutions that don’t require this level of drastic change to how packages are named and posture towards volunteers who make things and open source them for others.

ncoghlan · September 12, 2024, 1:56am

Another entry for the “alternative mechanics” list…

Something that occurred to me between this comment and the ideas in the Establish publisher authority via automated DNS backed challenges? thread is this: what if the open vs restricted namespace restriction didn’t prevent uploads entirely, but instead affected a new PyPI endpoint called filtered.pypi.org?

On the main PyPI endpoint, open and restricted namespaces would have no functional differences, with the namespace owner just getting notified of new project registrations within the namespace, and authorised and unauthorised projects having differing metadata.

On the filtered endpoint, by contrast, only packages authorised by the namespace owner would appear in restricted namespaces (regardless of when they were published), while open namespaces would continue to rely just on the API metadata differences.

That way namespace owners would only be claiming filtering rights for a namespace prefix, rather than claiming all publishing rights.

To improve typosquatting protection for new users, clients could switch to the filtered index as their default (after a suitable deprecation period) and add an --unfiltered-index option to request the use of the main index.

When using the PyPI.org web UI, authorised vs unauthorised would be a visual indicator, except in search results, where seeing the unfiltered results would be explicitly opt-in (think of the namespace filtering like a “safe search” feature that’s on by default).

The implementation PEP could also still define a third tier of namespace grant where new unauthorised project registrations were disallowed entirely (probably due to legal trademark rights). At the API level, those would behave like regular restricted grants. Not sure of a suitable name for that, but “locked” or “trademarked” might work (since I imagine any such grant would require legal weight behind it when filtering-only grants were an available option).

Edit to summarise the tiers of protection in this variant:

unclaimed: no protection (status quo)
open: metadata + owner notifications (typically for contribution namespaces inside a restricted namespace)
restricted: metadata + owner notifications + filtering (the main tier for typosquatting protection)
locked: metadata + owner control + filtering (rare, only the PSF could lock prefixes without some legal claim like an approved trademark in a jurisdiction the PSF acknowledges)

This summary also suggests another potential option for the tier naming: rather than redefining “restricted”, instead add “filtered” as a new tier between “open” and “restricted” (and change the corresponding namespace API field to a string instead of a boolean)

encukou · September 12, 2024, 10:29am

So, something like: by registering package foo, you automatically reserve all of foo-* (but can give up or delegate the reservation)? That would match existing patterns like *-stubs or flufl-*! But it’d be hard to make it work with existing packages.
Something like that that can evolve from PEP 755 by loosening the grant-giving policy.

steve.dower · September 12, 2024, 2:50pm

No, more like I could publish steve/spam and you could publish petr/spam^[1] and then everyone would have to specify which one they intended by providing the full name.

The simpler proposal is that only one spam package exists, and if you published it and someone tries to install steve/spam, the install will fail because I’m not a maintainer of the package (but if someone just installs spam, they’ll get it regardless, unlike npm).

Maybe “spam” is a poorly chosen name here, but I prefer using “spam” and “eggs” rather than “foo” and “bar” ↩︎

ofek · September 17, 2024, 4:23am

Since last time, I landed a PR which did the following:

Expressed the use case for open/restricted namespaces in the motivation section.
Made it clear that the current automatic protection against typosquatting used by PyPI is insufficient.
Added examples of attacks to the appendix.
Fixed versioning of the JSON API.
Rejected the idea of encouraging projects to maintain their own package repositories.
Rejected the idea of fixed top-level prefixes used for reservations.
Rejected the idea of using DNS.
Expressed in the rationale that using PEP 541 requests in addition to this proposal would minimize risk to users.
Added a community buy-in section enumerating feedback from relevant projects and communities.

It’s not a very robust heuristic but in my mind I’m mapping all feedback that is not in favor as a proponent of explicit namespaces.

I agree but didn’t want to have a rejected ideas section in both proposals so I put it in 752.

Thomas Kluyver:

ncoghlan:

Within a restricted grant, they offer a way to carve out a prefix for plugins and third party solutions.

Thanks, that makes sense. But coming back to my point above, I think this is better represented as a setting for grants rather than a whole separate type. I.e. once you get a top-level prefix, you can make a sub-prefix like foo-contrib-, then you can set that to allow either anyone to upload, or a limited set of people, but different than the parent foo- prefix.

I think the reason ‘open namespaces’ was a separate grant type in the first place is so that ‘restricted’ namespaces could be exclusive to paying customers. I think that’s a mistake, as discussed on the PEP 755 thread, and this is why I’m pushing back on them being a separate type.

The concept of open/restricted (final naming) has, since the very first draft, being a setting on a grant and not a separate type of grant. I’m not quite sure where that came from?

I don’t want to go into it on this thread because it’s about policy but the original reason for encouraging grants for community organizations to be open was for two reasons:

Projects from such organizations are in most cases comprised of at least some packages maintained unofficially and therefore this was a preventative measure to avoid the possibility of closing an ecosystem.
In practice it will be significantly easier to get an application accepted for such organizations than expected which may exacerbate 1. to a level where it happens regularly and taints the perception of the Python community as a whole.

I think the cynical view is not correct and I’ve added a rejected idea Encourage Dedicated Package Repositories.

PyPI without a doubt, and likely packaging tooling after a year or two.

takluyver · September 17, 2024, 7:18am

I’m sorry, but I think this section is disingenuous (hopefully unintentionally). You point to 3 community projects, but only Airflow sounds actually enthusiastic about your specific proposal, and you heavily implied that they could have an exception to be treated like a paid organisation. I think these projects are more likely to be supportive with your latest changes, but their earlier comments are not great evidence.

ofek · September 17, 2024, 1:41pm

I must say, this experience is increasingly leaving me with unease. There is a tendency to assume bad faith on my part in the worst case or as a minimal avatar of corporate interest in the best case.

Here is what the enumeration pulls from:

Apache Airflow

This is cool. Actually we had a very relevant discussion today about a potential confusion and new provider created by someone […] would be nice if we could make it a convention and have a way to enforce the “root” namespace to only be used by Apache Airflow team. This is also the “expectation” of the ASF - naming convention of packages is part of “branding” expectations from project - and currently with PyPI is not enforceable - with that change however it could be.
Typeshed

[…] we’d need to ensure that people can’t squat these names for security reasons. As opposed to other Python packages, people will just try to install the type stubs without previously checking them, and they should be able to. But without namespacing this would open a wide door to attackers. So for us, namespacing is essential.
Project Jupyter (expanded)

I think this is really interesting, and a good idea. […] there may be a slight preference in parts of in our community for npm-style, largely because our projects often straddle npm and PyPI […] But I definitely see the strong backward-compatibility arguments of the prefix approach. […] Any transition will be tricky for us because the existing widespread use of prefixes is very much used by both official projects and community plugin packages, alike. This has been a communication issue, in that it is unclear from package names alone, what projects are ‘official’. […] The current proposal would not change that, but it would at least stop the unofficial population of the plugin namespace from growing.

There was a misunderstanding which caused the following thinking that they couldn’t benefit i.e. open grants didn’t allow uploads from others:

I don’t think we would pick moving to a new only-official prefix unless it was @npm -style. I’m not 100% sure we would do that, given the user impact. […] I had definitely misunderstood the public/private namespace distinction, since the definitions in the PEP are not what I expected those words to mean (the “open/closed” labels some folks have mentioned feel more intuitive to me). I believe I understand now, thank you.
Microsoft

I’ll speak officially […] We would likely want to take full ownership of the azure- namespace, and suggest that third-parties use an -azure suffix instead.
DataDog

I’ll also speak officially on behalf of my employer. We would pay to reserve datadog

I said:

That’s a great idea actually, I could add an explicit exception that reviewers could grant community projects a private namespace at their discretion.

This was not some deal specifically for them to make them happy but rather for “community projects” as a concept. In any case, this part is gone in the latest policy update.

takluyver · September 17, 2024, 2:08pm

Sorry - I don’t mean to assume bad faith on anyone’s part. But the original proposal seemed to me strongly biased towards corporate interests, it feels like it has taken a lot of discussion to get concessions to community groups that seemed obvious from the start, and there are still bits that are uncomfortably corporate leaning.

I believe you’re trying to consider everyone’s needs, but it’s always easiest to understand the priorities of the environment we’re in and the people around us. I don’t think you’re doing anything wrong, just that you’re starting from what works for Datadog.

pf_moore · September 17, 2024, 6:03pm

I’m sorry you’re feeling like this. It’s possible the reason is that PEPs are typically submitted by people who have a direct interest in the feature being proposed, so that it’s very clear why they are arguing in favour of the PEP. In this case, though, I personally^[1] find it hard to understand why you, Ofek, the maintainer of hatch, are interested in this feature. Rather, it feels like you are proposing it on behalf of your employer, and are representing their interests - and as a result, you’re getting feedback reflecting people’s discomfort with corporate influence over open source projects and communities.

Thank you for adding that, but can I suggest that you need to work more to build consensus on the proposal rather than simply rejecting opposing views with an explanation of your position. While I can’t speak for @dustin, my view on the PEP process is that it needs to establish a consensus with the community that the proposal is, indeed, the best solution to the problem. Your rejection of my suggestion is fair, but it hasn’t done anything to bring me round to thinking that your proposal is the right approach. So it doesn’t feel to me like we’re any closer to a consensus as a result of my contribution to the discussion, and that is reflected in the (to a certain extent, increasingly frustrated) tone of my responses here.

I’d like to find some common ground for a solution, but I don’t see how to do that - your motivation isn’t clear to me, and your approach of rejecting rather than discussing alternative proposals makes it hard for me to see how we can move towards a solution that the whole community supports.

I don’t want to speak for others, although I’m speculating that this might be a more general cause for the tone of the feedback. ↩︎

ofek · September 17, 2024, 6:32pm

That’s a good point, thanks. My view of the required Rejected Ideas section was that it’s for detailing why certain approaches are suboptimal without dumping everything in the Rationale section. Is the suggestion to wait some amount of time before translating what is in the discussion to that section?

pf_moore · September 17, 2024, 6:38pm

No, it’s to make your points here, rather than in the PEP, and only when consensus has been reached (whether it’s “no, that idea isn’t reasonable” or “yes, that’s a good point and the proposal needs to be changed to take it into account”) do we update the PEP to reflect the community consensus. (If no-one else makes any comment, or if the community is divided, then you get to make the decision as the PEP author, of course).

ofek · September 17, 2024, 6:43pm

Okay, I’ll post proposed updates here first to be explicit! Most of what is in the rejected ideas text came from other people’s thread comments and only a bit from myself, just fyi.

mikeshardmind · September 18, 2024, 9:47am

I did not read that as support for this pep’s specific solution, but that Microsoft would use it. In the same breath, alternatives were suggested that are less heavy-handed that it was believed Microsoft would be okay with.

A direct quote of a portion of the linked message:

“better than the whole namespace”.

This feels very disingenuous to use as support for this specific method while rejecting other options, and I think it’s the fact that you are presenting the information in such a way that’s also amplifying why people are so negative on this. It feels to me like this proposal requires deceptive presentation to even survive basic comparison with other options, and that the alternatives are not being given fair consideration as to if they are actually better for the community and would solve the problems that have led to some level of support here.

mikeshardmind · September 18, 2024, 9:56am

Beyond that, it was raised that the other options (ssh-like presentation, or requiring a specific user) could actually also assist in the case of multiple indexes.

flat string prefixes which pare part of the package name can’t do this. I can’t even think of a way where tooling can reconcile this for the user due to that you want paying corporations to be able to have these grants be secret on top of it. If the list of prefixes were public, at least here a tool could query it in the multi-index situation and detect: “hey, one of these indexes provided a package that another index had a namespace for”

Liz · September 18, 2024, 12:27pm

Others have already mentioned that how you are presenting this is contributing to this perspective. I share their concerns, but I want to see some sort of solution here too.

What you have here appears to be conflating support for the idea that there is a problem worth solving with support for this specific solution. If you want to convince me (and it seems like others too) you need to show that this is the best way forward that balances all of the appropriate concerns and that it stands up to the needs whether there is funding backing it or not. That’s not really what has happened so far, you’ve rejected many alternatives people have raised without discussion, rationale, or consensus for why those aren’t sufficient.

As for the influence of money, one of my main concerns with the prefix based solution is that it requires an ongoing administrative cost for pypi. While there appear to be groups willing to pay for it, there’s a lot of questions that come up there. Do we avoid making better solutions because the current solution is funding more than just itself? I’d rather see interested corporations here be willing to contribute to a funding of pypi’s security efforts, either monetarily or in dev hours, independent of specific solutions, allowing the best solutions to be the best on their own, and the funding for that coming because the companies believe in the community’s ability to create solutions that fit their needs.

ofek · September 18, 2024, 1:18pm

I’m doing the exact same thing I did for the rejected ideas of PEP 723. Ideas would be put forth and people would comment. If there was sufficient rationale to say that it wasn’t the best idea then I would summarize what people have said and provide my opinion when necessary. If there was a spectacular idea then I would change the proposal, which I did (the original proposal was a multi-line string). No one in that discussion was taking issue with modifying that section after a few days of feedback which is why I’m mildly confused why it’s happening now.

In any case, I’m going to stop responding for a few days.

sinoroc · September 18, 2024, 7:00pm

As usual words are nice, but we have to acknowledge when someone also does the work. So kudos to Ofek for bringing up a complete proposal (well, even two in this case). The only one we have so far as far as I can tell.

But it being the only one, does not mean it is our only shot at this problem. I guess the author(s) of a proposal are free to write the “Rejected Ideas” section however they see fit. In the end it is the whole proposal that will be evaluated (including this section). And of course, it does not prevent anyone from writing an alternative proposal putting forward exactly one of those rejected ideas.

[I assume it is known and obvious to most but maybe not all.]

sinoroc · September 19, 2024, 5:31pm

Datadog offers observability as a service for organizations at any scale.

Can we do without this sentence? I do not know how that is relevant to the proposal. : D

Generally, is there a policy for when the author of a PEP is a company? Should there be a disclaimer? Should the company name be listed in the authors? Should the individual author(s) list their work email address? I very likely should ask this in its own discussion thread, but maybe there is a simple one-liner answer. So if it is worth its own thread, feel free to open it or let me know and I will do so.

pf_moore · September 19, 2024, 5:38pm

I’m not sure there is. But it isn’t really relevant - in this case the author is Ofek, not Datadog. However, as I understand it, Ofek’s time is funded by Datadog, and it’s obvious that Datadog would benefit from this proposal, so there’s a perception of conflict of interest, even if the reality is that Ofek is acting entirely appropriately.

But with all that said, I think it would be worth making particular effort to avoid the perception that this proposal is guided by Datadog’s interests. Maybe @ofek could find a co-author from an open source community project, who could help frame the proposal in a way that gave equal representation to the interests of companies and community projects?

ofek · September 19, 2024, 5:56pm

If you want me to, I will, but please understand this request is a critique of a footnote at the very end of the document and matches the preceding examples where the first sentence is a description:

Additional examples of projects with open namespaces:

pytest is Python’s most popular testing framework. They have the concept of plugins which may be developed by anyone and by convention are prefixed by pytest-.

MkDocs is a documentation framework based on Markdown files. They also have the concept of plugins which may be developed by anyone and are usually prefixed by mkdocs-.

Datadog offers observability as a service for organizations at any scale. The Datadog Agent ships out-of-the-box with official integrations for many products, like various databases and web servers, which are distributed as Python packages that are prefixed by datadog-. There is support for creating third-party integrations which customers may run.

Sure, I will try!