Hello everyone! This proposal helps address the need of large projects requiring multiple independent packages. As outlined in the PEP, I’ve chosen the implementation that would cause the least amount of disruption for the community i.e. no special NPM-style scope syntax.
I’m excited to begin gathering feedback more broadly
Not directly to the PEP but I am curious: did we (PSF or someone else) finally manage to get the effort towards supporting namespaces funded?
As for the PEP, I’ve only given it a preliminary read and prefer to wait with any questions for now. However, so far I am worried that, while this specification may allow to completely avoid the disruption to tooling, allowing to reserve package prefixes in the existing namespace is going to be a lot more disruptive to individual projects. They will likely suddenly need to change their name because they will start to block organisations from reserving a prefix simply because the project chose to name their package in a specific way. Sure, the organization could have often probably chosen to enforce the trademark earlier anyway (though they could have considered it acceptable usage) but now they would just be forced to be able to use the feature at all.
My employer Datadog is devoting engineering time (not publicly announced yet) in Q4 for improving the Python packaging ecosystem and one such item is the PoC of this proposal.
This is explicitly outlined in the proposal and existing packages would be untouched, nothing would happen to them! The first line of the Specification section says:
Such reservations neither confer ownership nor grant special privileges to existing packages.
And then further down in the Uploads section as the very first upload prevention criteria it says:
I’ve only skim-read it, but it seems this could be achievable with even fewer changes. It’s basically just a prefix name restriction on PyPI uploads, and there’s no real need to add anything to metadata. Packages uploaded by an organisation will (I believe) already be marked as such.
For what it’s worth, my employer would probably consider paying to become an organisation for this feature. None of the other benefits provide us enough value as things currently stand.
(Edit: Big +1 to the idea, of course. And totally fine with gating it on cash/donations.)
Package metadata itself within the artifacts is unchanged but I am proposing we add it explicitly to the APIs so that consumers may do fancy stuff like extra security protocols.
My employer is in the same boat; should this be accepted we will begin paying for an organization.
Before I comment on the PEP itself, I want to just call out a process level thing. I’m currently the PEP-Delegate for this PEP (as I have a standing delegation from the SC for PyPI related PEPs, so I’m effectively the default option).
I work at Datadog with @ofek, and I suggested a Nuget style approach like what the PEP describes (which I’ve been a fan of ever since I first saw the Nuget docs on their feature). I’ve also reviewed an earlier draft of the PEP (but have not yet reviewed this specific draft).
I normally don’t elect to be a PEP-Delegate for my own PEPs to remove any chance of a conflict of interest arising, but this PEP is in a bit of a weird grey area for me. I’m personally fine acting as the PEP-Delegate for this PEP (which I view the responsibility to be largely to capture the overall sentiment of the community, and to put my own opinions to the side unless there isn’t a clear sentiment).
If anyone doesn’t feel comfortable with me serving in that role for this PEP, feel free to comment here, or reach out to me privately (DM, email, whatever), or you can contact the SC and I’m happy to have someone else do it as well.
My bad - I knew they would be untouched but I overlooked that their existence wouldn’t prevent the creation of a grant and thought that this meant that organisations would just have to get all 3rd-party projects to rename on their own or not be able to get the prefix. That is definitely not disruptive then
I’m not sure how well it aligns with the motivation though which mentions having a way to “signal a verified pattern of ownership”. For existing projects, this would only be limited to the lack of a visual indicator in the UI (PyPI website) and the user couldn’t rely on the fact that some prefix is reserved to determine that the package comes from the owner of the prefix. In fact, grants for community projects have even less protection since even the new uploads can be done by anyone (i.e. it’s “public namespace” per PEP’s terminology) and the only benefit is that visual indicator on PyPI.
This seems like a very narrow use case, I personally don’t see much value in public namespaces and even the non-public ones seem limiting. I guess the question is: do other companies consider this helpful, especially when compared to npm-like namespaces where it’s guaranteed that nobody’s project but the org’s can be part of the namespace?
Non-standard HTML attributes should be prefixed with data-.
I’m also not convinced that the simple repo API needs to expose this, especially as mandatory. If we do:
the API version should be increased
I would prefer the attributes to be optional
including it in the package-list page will drastically increase its size, especially for PyPI; perhaps only expose it in the package files page (in metadata)
I don’t think “proper” namespaces (ala NPM scoping) are that great of a solution when you’re trying to retrofit them.
The main problem I have with “proper” namespaces in PyPI is that I think it makes the problem of dependency confusion worse rather than better. If I tell someone to install a library of mine that is in the namespace foo, I might tell them to install foo-bar, but it’s unclear whether that means the package foo-bar in the flat namespace, or the package bar in the foo namespace.
This means that users will have to be a lot more careful about communicating package names and making sure that they install the “right” package then they have to be today.
One of the main reasons we implemented the normalization rules around names, and pretty much the primary driver behind all of the rules we did pick was to make package names easier to communicate without having to be really careful about what punctuation or capitalization you use.
I view “proper” namespaces as a major regression in that goal unless we make moves to remove the un-namespaced packages completely and move them into a namespace of some sort (which would probably mean blocking new registrations without a namespace and setting up a permanent list of aliases for any existing projects from the old style to new style names).
I don’t think that level of disruption is feasible, at least not without a significantly better justification, and I think Nuget style namespaces gives us the vast bulk of the benefit (and they even give us some extra power, like child namespaces), with minimal amount of disruption to the community and without regressing on how easy/hard it is to confuse someone with the name of a dependency.
Existing packages when a Namespace is granted
The way I’ve thought about this is that it’s a tradeoff.
As I mentioned above, I don’t think NPM style is achievable or a good idea for where this ecosystem has evolved to, so I’m going to focus on Nuget style, but between different options we could take for dealing with existing packages that collide with that namespace.
I think if you disallow namespaces that collide with existing projects, then namespaces become a lot harder to actually find one that both identifies your organization and nobody else has ever used before. I suspect the likely outcome of a policy like this, would be that the fear you had about organizations wanting to claim a namespace “bullying” existing projects would happen, as orgs would be highly incentivized to “clear” out their desired namespace prior to applying.
Whether that means they’d reach out to projects and attempt to get them to voluntarily rename themselves or attempt to engage the PSF and lawyers to invoke trademark laws to get the project’s name taken away from them on trademark grounds, either way the incentives would exist to push them to do that.
This does mean that the guarantee of a namespace is somewhat weaker if there was an existing project using that prefix already. I think that’s OK because I don’t see a better alternative, and I suspect that most people claiming a namespace are going to request one that has minimal overlap with existing projects (and if there is a large overlap, PyPI would likely reject the name anyways).
Presumably that existing project isn’t malicious (or if is, we’d remove it), but the namespace would still prevent attackers from uploading new projects that attempt to masquerade as a project from someone else who has a namespace.
I can say that folks I’ve talked to within Datadog find that tradeoff acceptable, and it sounds like @steve.dower thinks it would be acceptable for Microsoft as well. We also have the prior art from Nuget, and if I look at their package list, it looks like a large number of the most downloaded projects have their prefixes registered (but not all of them, suggesting it’s not viewed as a “must have” either).
Public Namespaces
Public namespaces are a bit weird I agree. They’re primarily useful in cases like mkdocs or django where there is a wide array of related packages that all tend to follow a naming scheme like mkdocs-$something or django-$something.
Currently there’s not a great indicator if a django-$something is an official Django thing or a third party thing (though in some cases you can tell by the users displayed on the page). A public namespace helps a bit in that it gives you an indicator that definitively says “django-foo is or isn’t an official Django thing”. It’s just an indicator, so it’s not an amazing security control by any means, but it’s better than nothing I think?
Where I would go a different direction than the PEP, is I wouldn’t put any stipulations on whether a community project’s namespace is public or private. I’d just allow that up to the organization (whether they’re a community org or a corporate org) to make that decision for themselves.
I also don’t see it called out either way in the PEP (unless I missed it), but another interesting question I would have is whether a private namespace can have a public child namespace.
Personally I would say yes, particularly because I think it creates an interesting setup where a organization can get a namespace for foo, which protects all of foo-*, but then create a child namespace for foo-ext-* which is public and allows anyone to upload to.
I think this is inline with the overall goal of the PEP, as the PEP creates a way for organizations to claim a namespace as “theirs”, and if the namespace is theirs, then they should also be able to carve out sections of it for anyone to use.
Repository Metadata
I don’t have a strong feeling about whether we should expose this in the repository metadata or not, as I think the main benefit is in the disallowing of new uploads. That being said, we are including a UI indicator, which presumes some sort of benefit for telling users that X project is part of a protected namespace, so it does feel somewhat logical that API clients would get the same benefit.
I like that rather than a boolean indicator for “owned by a namespace” or not, the metadata includes what the namespace is, and who owns it.
Where I’m a little concerned is around the scoping of the metadata.
The PEP currently puts the repository metadata as something attached to each individual artifact, which feels wrong to me. The protection that the PEP brings is centered around who owns a given name, not which artifacts.
I think this gets more confusing when we start considering what happens if a namespace gets deleted [2]. Currently the PEP says that the old artifacts should retain their indicators for the namespace. That feels wrong to me?
For one, clients and users don’t have any way to determine who the current owner of that namespace is, to know if some artifact that was uploaded under a given namespace is for the current owner or not [3]. We could bundle more information into the API about the current owner as well, but that feels even weirder to me or we could provide an API that allows you to query the current owner of namespace, but that would require double the number of HTTP requests (one for the project page and one for the namespace page) or it would require a global “all of the namespaces page”, but that would end up having the size problems that /simple/ has unless we paginated it or made a search endpoint that can do multi lookups or something.
It also feels weird because if I’ve been prefixing my packages with foo for several years and then I go and get the foo namespace assigned to me, what happens to my previous artifacts? Do they gain the “uploaded under namespace X” metadata even though they were not? Do they stay immutable (like what the PEP described for if the namespace is deleted) so then my old artifacts just forever look worse than my new artifacts?
A similar problem exists for namespace deletion too. If my old artifacts retain the uploaded-while-under-the-namespace marker after I delete a namespace, do my new artifacts just forever look worse than my old artifacts unless I get the namespace back?
My suggestion would be to nix the per-artifact metadata and just have project level metadata.
That metadata should say:
Whether this project’s name matches a namespace (regardless of whether they’re the owner of that namespace or not).
Whether this project is owned by the owner of the namespace or not.
Whether that namespace is a public namespace or not.
What the actual namespace is.
What organization owns the namespace.
More concretely, (using the JSON serialization as an example), something like:
{
"meta": {...}, # existing field
"name": "...", # existing field
"files": [...[, # existing field
"namespace": {
"owned-by": true, # not sure I like this name
"name": "foo",
"owner": "foo org",
"public": false
}
}
Where if the project doesn’t match a namespace, the namespace key is missing/null/empty, and if it does then the namespace object has all 4 keys.
Within the namespace object, owned-by would be a boolean indicating if the owner of this project also owns the namespace, name would be the namespace itself, owner would be the name of the org who owns the namespace, and public is whether the namespace is public or not.
I also agree with @EpicWink about making support for this in the repository API optional, something about if the repository in question doesn’t support namespace reservation, then they should always omit the namespace key or something.
Much of the doc has no bearing on any project other than PyPI, since one of the interesting parts of this proposal is it’s really just a feature for PyPI to control who can register certain names and doesn’t have much technical impact on any other project. However, by adding it to the repository API, we’re expanding that scope to include non-PyPI projects. That’s not wrong, and obviously the policies around namespace ownership are still PyPI only, but we should definitely consider what the impact will be for non PyPI projects.
Permissions
Something that Nuget does is they extend their permission model such that if there is a namespace foo and a project foo-bar is owned by the namespace owner, that even if there are other owners on the namespace, that the org that owns foo cannot be removed from the project (e.g. granting ownership to the org is a one way door).
This would add a bit of an extra layer of protection, so that a rogue owner can’t steal projects from the org… but our orgs can already have more advanced permissions IIRC (and we can add more if we need to), so it may not be needed.
I think it would be interesting to think about whether that additional constraint on namespaces is something that we would desire or not.
I will say if If we choose not to add it, then the questions around namespace creation/deletion/recreation and what that means for the per-artifact metadata means that we also have to include “project shifting in and out of namespace ownership” as an edge case.
Someday I’ll learn how to express myself with fewer words, but today is not that day. ↩︎
Nuget doesn’t have this problem as they don’t allow namespaces to be deleted, but since we’ll be charging companies for them, we have to consider what happens if someone decides to stop paying. ↩︎
Re-registering previously used namespaces is going to be fraught with problems anyways just culturally. I don’t think we should disallow it, but we might want to discourage it unless the PyPI Admin approving the request thinks it’s OK. ↩︎
I see a few major downsides to the nuget style with python as well as a few ways to allow a minimally disruptive way to switch to npm style
This can’t help many of the largest existing packages. There are plenty of legitimate, non-typo squatting libraries that extend the behavior of existing libraries out there already that take the form $base_pkg-something, or legitimately wrap apis provided by companies which use the name of a company or a service it provides as the package name or package name prefix. So the prefix would be out of bounds for those packages by default with the rules here. (plenty of examples with various databases, web frameworks, numpy, pandas, linters, and even things like amazon’s boto, “apache-”, “google-”, etc)
The above is largely an existing convention on pypi by library authors. It won’t help future typo squatting because practically no names that are high target could ever have their prefix reserved without breaking existing legitimate libraries.
This additionally conflicts with an established practice of 3rd party typing stub packages being $pkg-stubs
npm style namespacing would be relatively trivial to switch to:
Ensure the namespace syntax is not syntax that is current normalized to something with another meaning user-$namespace@pkg and org-$namespace@pkg may work for this. (eg. user-mikeshardmind@some_library or org-google@jax)
All user accounts and organizations have a namespace matching the username or org name.
All current and future packages that aren’t namespaced are aliased to their owning user’s namespace. It remains possible to upload non-namespaced packages, but both package authors and those installing are encouraged to switch to a declaration using an explicitly declared namespace
I think it would be interesting to get some numbers on the projects that might be helped by this to see how rampant this is.
I will point out that even if google-* (for exaple) is widely used by non google, they of course have the option to pick a different prefix, or even multiple different prefixes for different segment areas.
Having to rename the project is kind of a drag, but with Nuget style you only have to do that if your desired namespace is already inundated with packages from other users. With npm style you have to rename your project no matter what, since the whole point of it is that foo-bar and @foo/bar are distinct names.
Of course they also have the option to ask to have the name reserved anyways, just to stop the bleeding so new projects can’t collide with their name.
I’m not sure how existing legitimate libraries would be broken by this proposal since existing libraries are not affected, only new projects are prevented from being uploaded. The only thing I can come up with is some library that uses a foo-bar name, where foo is someone else, and they expect new projects to regularly be released inside of the foo namespace, which feels like a pretty minor edge case?
I’ll admit that I had forgotten about the -stubs packages. However, there’s no reason why they couldn’t use stubs-$pkg as their names (and existing -stubs packages would continue to work), other than typing tools would need to be updated to support both naming schemes [1].
Even $pkg-stubs would continue to work in the general case, the only time it wouldn’t work is if an organization created a private namespace (because that would of course preclude creating the $pkg-stubs package).
Actually I guess they wouldn’t even have to update anything, since I believe they only care about the name “on disk” matching the foo-stubs package, and they don’t care about the name on PyPI, so there’s no reason why stubs-foo couldn’t work just fine other than it violates the convention (not the requirement) that the name on PyPI and the import name are equal (so stubs-foo on PyPI would install foo-stubs/* on disk).
In any case, this point seems to equally apply to NPM style, since presumably a random third party who wanted to release stubs for org-google@jax wouldn’t be allowed to publish a package at org-google@jax-stubs.
I don’t think this is easy as you think. I may be wrong! But I think introducing “npm style” namespaces is hard enough it’s effectively a non-starter unless someone comes up with something really clever that I’ve yet to see proposed.
Tools validate that package names can only have alphanumerics and _, -, ..
Adding a new symbol would mean that every tool needs to update and if someone chooses to use a package with that name, they’re effectively deciding that anybody using an older tool will be incapable of using their package. I suspect most people will not find that acceptable, and will just ignore the namespace feature (unless we require it of them). Maybe in a decade the feature would have had enough time to “bake” that people would be willing to rely on its existence? [2]
Depending on how old of a client we care about, pip at least (and I’m not sure about the rest) used to normalize everything, so those older clients would treat org-google@jax exactly the same as org-google-jax.
In your examples, user-ns@pkg will be a distinct package from user-ns-pkg, but humans communicating that to each other will be extremely error prone I think, especially since the user-ns@pkg is the new syntax. I think people would have to be very careful to say “user DASH namespace AT pkg”, and even then people would get confused.
Your examples do point out something though, that PyPI has chosen (I think, I could be wrong) to have organizations and users exist in separate namespaces (so you can have a user an an org with the same name I think). So using a mandatory namespace like that would require having a discriminator for users vs organizations, which I think makes the entire thing even more confusing for users [3].
We cannot break the existing package names, which means that the “flat” namespace will have to be maintained indefinitely. If we allow new packages to be registered to that flat namespace, the dependency confusion attack still exists (and we will have made it worse, by taking a single namespace and creating 3 namespaces, user-foo@bar, org-foo@bar, foo-bar, user-foo-bar, and org-foo-bar will all be packages that will, to many users, be the “same” project, but which we would have to treat as distinct packages.
Packages are also not owned by a single user, let’s take a look at Django, it has 4 users associated with it 2 owners and 2 maintainers (the UI doesn’t differentiate).
So should that be user-ubernostrum@django or user-felixx@django (assuming we limit it just to owners)? Regardless of what the answer is (even if it’s to pick both), how is an end user supposed to know that those are the correct users and it’s not user-dstufft@django? Let’s say it was an org, and it was org-django@django, how is a user supposed to know that this project uses orgs, and it’s not user-django@django?
We also allow people to transfer ownership and even be removed from a project. Would that be effectively renaming the project then? This is the case in Go right now, and when a project changes hands (or moves where it’s hosted from since they use URLs) it is incredibly disruptive, and every project that has done it ends up with a bunch of angry people in their issues. Go’s tooling has some functionality that helps mitigate this via the replace directive so that end users can rewrite imports/dependencies. Python doesn’t. [4]
There’s also a question of import names. Now PyPI doesn’t require that project name and import names match, but the typical convention is that they do, and this helps makes sure that two projects can be installed side by side. We have the benefit here that our names on PyPI are limited only to characters that can exist in an import statement (other than the - character, which normalizes to one that does), which helps this convention.
Keeping this convention in mind, should a name like org-aws@s3 be import s3? What about org-aws@utils? The @ character can’t be used in imports (unless we extend Python itself to allow it), so we can’t include the namespace in the package name unless we translate it to a _ or a ., since namespaces are kind of like namespace packages, let’s say the convention is to use a ., then org-aws@s3 becomes import aws.s3? Well that isn’t quite right, because user-aws@s3 or a un-namespaced aws.s3 might exist. So I guess the convention would naturally want to be org_aws.s3, which I find extremely ugly that every package now will be prefixed with user_ or org_, but there’s still the fact that org-aws.s3 is a valid package in the flat namespace.
In general, I think NPM style scoping was primarily a way to avoid namespace collision between private and public packages, rather than a meaningful security feature for squatting or dependency confusion. They take two similar to a human names (@foo/bar vs foo-bar) and turn them into distinct things, which makes typosquating even easier.
I think the questions around how it would even work for Python are vastly harder than the questions for how Nuget style prefix reservation would work (which those questions for Nuget style basically boil down to “there are some use cases where it’s not helpful” rather than “we’ve made things worse”).
The way NPM style was implemented on NPM makes a lot more sense in this vein, the scope is just part of the package name and there’s no attempt to do some sort of automatic translation (which is where a lot of the problems above come from), they just created a separate namespace in their “scopes” and just gave people the option to use it. Which, yea that makes sense if the primary purpose is to carve out a namespace where they can do whatever they want without worrying about collision.
That’s not what we’re trying to do though.
Which should be a pretty trivial change to make, since it’s just looking for two name patterns instead of 1, but otherwise doesn’t change their functionality. ↩︎
We’ve generally tried very hard to make the enhancements we’ve made things that could be progressive enhancements, that wouldn’t require package authors to make a decision to not-support a large number of people. I’m not going to say we can’t do something like that, if most people wanted it or there was a really compelling reason then maybe! ↩︎
This is obviously subjective, but I have a visceral reaction to the idea that every project on PyPI is now going to need to be prefixed with something like user- or org- just to differentiate if the namespace is a user or an org. Just looking at it I immediately dislike the aesthetics of it, and I suspect others will too. People are very attached to their project names! ↩︎
There’s a lot about how go works that makes them better suited to handle this than Python is, and even then it’s still incredibly disruptive. ↩︎
Allowing private namespaces to have public children would also provide a way of explicitly capturing the scenario where there are existing names: when the organization goes to submit their private root namespace registration, they could be given the notification that along with that private root namespace, they will also be creating public child namespace registrations for the existing projects.
For example:
Submitting private root namespace application for package foo and prefix foo-.
For compatibility with previously uploaded packages not owned by this organisation, the following required public child namespaces will also be registered:
foo-baz
foo-bar
foo-stubs
(if a community already has a convention like foo-ext-, or foo-contrib- for third party packages, then just one exception should be registered for that common prefix, rather than individual exceptions for each package sharing the prefix)
This wouldn’t be necessary when the root registration is public anyway, but it allows the compatibility exceptions to private namespaces to be clearly recorded.
It likely also makes sense to allow private child namespaces within public namespaces. That way an organisation could allow free access to a top level prefix like foo-, but reserve something like foo-core- for official packages.
On a related point, the PEP should specify a rule for when it is legitimate for an organisation to convert a previously public namespace grant to a private one (either by changing the grant, or by dropping a public child grant of a wider private grant): when all packages currently uploaded under that namespace are already owned by the same organisation as the one that owns the namespace. If there are still public exceptions required, then the attempted conversion should emit a similar notice to the one suggested above.
I took that part of the PEP as assuming that community organisations would be low overhead to set up with minimal oversight, so allowing them to reserve private namespaces would be problematic from the point of view of potential malicious disruption to the smooth operation of the package ecosystem.
If a community organisation did want to make their root namespace private with specifically carved out public child namespaces then they’d need to apply to the PSF for reclassification as a corporate organisation (potentially requesting a waiver of any associated fee, but otherwise providing all the same contact information and assurances as any other registered corporate organisation). That doesn’t seem like an unreasonable barrier to me - if you’re reserving parts of the Python packaging ecosystem namespace, the PSF really should know exactly who they’re dealing with.
Since this feature is tied to organisational accounts, there’s also the question of what happens as a result of corporate mergers, acquisitions, and demergers (or their community equivalents).
That means I agree reporting the “now” information at the project level where namespaces are concerned is far more interesting than reporting the “at time of publication” information at the artifact level. For historical releases that are already in use, security comes from comparing artifact hashes to their expected values. It’s mainly when reviewing new potential dependencies that we’re really interested in whether or not a project is authorised by an already trusted organisation. There may be use cases for capturing snapshots of the namespace and project ownership details at release publication time, but I don’t think that needs to be part of the initial design proposal.
Slightly tweaking @dstufft’s suggested field names:
"namespace": {
"name": "foo",
"namespace-owner": "foo org", # Replaces `owner` field
"project-owner": "foo org", # New field
"approved-project": true # Replaces `owned-by` field
"public": false
}
The first change here is to rename the owned-by boolean to approved-project. This is due to the Grant Ownership section allowing delegated approvals between organisations to permit authorised uploads to private namespaces by organisations other than the one owning the namespace grant.
The owner field is renamed to namespace-owner to avoid ambiguity between namespace ownership and project ownership.
The project-owner field is added to allow true first-party projects to be distinguised from authorised third party projects (via the client-side comparison namespace["project-owner"] == namespace["namespace-owner"]). If the project doesn’t have an organisational owner, then the project-owner field would be null (rather than being omitted completely).
For fully lapsed namespace registrations, my inclination would be for the registry server to keep them around, but set the namespace-owner field to null, and the grant type to public (this approach would then apply to all child grants too). Requests to claim (or reclaim) unclaimed grants would go through the same review process as requests for new grants. Converting reclaimed grants from public back to private would need to follow the same guidelines as suggested above for organisations that want to amend their own existing grants that were initially created as public.
(FWIW, the fact organisation names naturally change over time is also the main reason I strongly dislike package management systems that incorporate organisational info into their reference mechanisms for software components. It’s like the packaging system designers saw Conway’s Law and thought “We gotta get ourselves some more of that!”, so now corporate shenanigans can mess up the build process for your software. So a big +1 for a NuGet style namespacing solution from me).
I think Datadog’s interests align with those of the wider community in relation to this topic, so I also think it makes sense to let the standing delegation apply here (without even relying on the fact that I’d trust you personally to put the community’s interests first).
I’m not convinced by the arguments here claiming the issues exists for npm- but not nuget- syntax. Both syntax suffer from the similar dependency confusion based on their syntax structure:
npm’s org@ confused with org-
nuget’s prefixing org- confused with suffixing -org.
The points I am in favor of are those that aren’t syntax-related, but rather more about namespace-ownership relations:
Can keep namespace during owner’s migration (as @ncoghlan pointed out) without aliasing
Can have multiple namespaces tied to the same owner (i.e., user/org account)
I find it confusing that some folks here are calling this “nuget-style namespacing”. This to me is just namespace policy, not syntax.
Thanks @ofek for reaching out to Jupyter! Speaking on my own here, not on behalf of the project or anything.
I think this is really interesting, and a good idea. I don’t personally have a strong opinion, but there may be a slight preference in parts of in our community for npm-style, largely because our projects often straddle npm and PyPI, and we and our users (extension authors, at least) are already used to @jupyterlab/pkg, and coming from GitHub, the meaning of @owner/pkg is clear for many, if not all. But I definitely see the strong backward-compatibility arguments of the prefix approach.
Any transition will be tricky for us because the existing widespread use of prefixes (jupyter-, jupyterlab-, ipython-, and jupyterhub-) is very much used by both official projects and community plugin packages, alike. This has been a communication issue, in that it is unclear from package names alone, what projects are ‘official’. I don’t have numbers, but I think unofficial plugin packages outnumber official ones significantly with these prefixes (at least on jupyterlab-, the most plugged-into part of the project). The current proposal would not change that, but it would at least stop the unofficial population of the plugin namespace from growing.
Since the maximize backward-compatibility strategy (which I love) means all those packages get to stay, we would have the choice:
keep current prefixes, in which case most packages on our reserved prefix are unofficial, so the prefix itself is not meaningful (as is the case today). New plugins would need to pick different names, while existing ones would keep the privilege of occupying the official namespace indefinitely. The main problem solved by this is typo squatting, I think.
pick a new (less obvious) prefix, like jupyter-official, meaning we have to rename all of our packages, but the name would be strictly meaningful.
leave jupyterlab- namespace unreserved (status quo), but reserve others, e.g. jupyter-
A case could be made that we shouldn’t actually be able to reserve jupyterlab- when we already don’t control a majority of the packages in it.
I’m not sure what we would choose. I don’t think we would pick moving to a new only-official prefix unless it was @npm-style. I’m not 100% sure we would do that, given the user impact.
I think @ofek is already addressing this, but just to write it down: the PEP should be clear about the following scenario:
User pythondev has published hello-thing with their own account
The hello prefix is then reserved
pythondev is then listed as prefix owner
Does hello-thing get the “official” visual indicator?
pythondev now publishes hello-stuff with their own account
Does hello-stuff get the “official” visual indicator?
If I understand @ofek’s answer correctly, no, neither hello-thing or hello-stuff will get the visual indicator, because pythondev published with their own account, not an account that would grant this “official” status. This is good, because retro-applying the visual indicator would be dangerous IMO.
Now a few things I am/was confused about.
Namespaces themselves
Can I reserve the prefix hello-world directly without reserving hello first?
Who can reserve namespaces?
In Approval Process:
The default policy of only allowing corporate organizations to reserve namespaces (except in specific scenarios) provides the following benefits: […]
This paragraph initially led me to think that only corporate organizations would be able to reserve namespaces. Why not.
So actually non-corporate organizations can also reserve namespaces.
Root grants
If I get a root grant for hello, can I myself generate another root grant for hello-world or do I have to request it too?
Semantics
Namespaces are per-repository and MUST NOT be shared between repositories.
What does repository mean in this context? The only meaning that would make sense to me as I read it would be “PyPI-like index”, as in “namespaces on any index must not be shared with other indices”. The phrasing is a bit confusing: do we want to actually prevent people from reserving the same namespace in multiple indices, or do we just mean that by default, there is no protocol or anything that would propagate namespaces across different indices?
To comment on the namespace feature itself: I would myself never reserve a non-public namespace (given I can legitimately reserve it). Imagine if Django had reserved django as non-public: does that mean all Django apps would have had to be named djangoapp-thing, just so Django itself could publish just a few django-thing packages?
The project-* naming pattern is so common that it would seem counter-productive to me to prevent your users from following it. I’m thinking about mkdocs-*, but also my own Griffe project, which supports extensions, and for which we already have a few griffe-* third-party packages on PyPI.
IMO the visual indicator is enough to make an app/plugin/extension “official” (even though official packages can still be tampered with through social engineering, etc.). As an open-source dev, I don’t see any obvious benefit to non-public namespaces (managed by open-source orgs/communities) that outweighs the constraints brought by removing the risk of typo-squatting (popularity-squatting?). But I’m not super aware of the reality of such squatting, so, just my opinion.
For corporate orgs who do not expect their community to publish third-party packages however, non-public namespaces make a lot of sense!
Also, I wonder if having a “NON OFFICIAL” visual indicator would help, too. Seeing “official” on official packages is nice, but not seeing anything on non-official ones won’t trigger higher alert senses. The fact that a namespace is reserved should indicate that the org who reserved it wants to make it clear what is official and what is not, I guess.
Ah, the PEP should definitely mention dependency confusion attacks, because namespaces would allow preventing them! I imagine a world where many corporate orgs reserve namespaces on PyPI without ever publishing any package, just to prevent others from doing so. Sounds a bit weird (and could be abused?), but then there would be no more dependency confusion attacks possible on their internal packages.
I wouldn’t want to see this feature be made as complicated as some of the above discussion suggests. Either you want to totally close off your namespace (and live with the existing packages, or one-by-one get them to rename themselves, though I promise you virtually nobody can be bothered to do that until it’s a real security/trademark issue!), or you don’t close it off and rely on the list of maintainers to show your package is official.
The list of maintainers is the critical indicator. I’d prefer to see “verified” ticks on those before any namespace registration feature at all.
It was mentioned above, but one big tradeoff of registered namespaces is it breaks the “package name == import name” assumption, which I know isn’t always true, but it’s nice to be close to it. We actively discourage teams at Microsoft from starting their packages with “microsoft-” today because there’s no way they’re going to claim import microsoft And the team that controls import azure spends a lot of effort actively caring for that namespace. So the name would just be a marker rather than reflecting the import, and I’m not personally a huge fan of that.
Oh, you’re imagining Nuget! But yeah, this is why I think any registration should come with a decent sized (thinking five figures annually) bill paid towards PyPI support and maintenance. If you want to claim a chunk of the namespace for yourself, you’d better really want it.