PEP 752: Implicit namespaces for package repositories

Follow-up from the previous thread: PEP 752: Package repository namespaces

This is now only the concrete specification and everything related to policy or operational choices for PyPI is in this thread: PEP 755: Implicit namespace policy for PyPI

1 Like

Thanks for splitting these up @ofek!

I’d like to bring up a pure implementation (non-policy) point from the original thread around the idea of fixed prefixes (i.e., this comment).

In particular, I think this PEP should consider that approach and, if there’s consensus that it’s not a good idea, then it should be listed under “Rejected Ideas” with a rationale.

To summarize the idea again, for continuity: rather than allowing grants of any top-level prefix (such as foo-bar- or foo-), namespaces would instead be defined as grants within a fixed subset of well-known prefixes.

The names of these prefixes can be subject to debate (or logistical/index constraints), but the idea behind them would be to limit/reduce the ambiguity behind package names like foo-bar-baz, which in turn current proposal could be either bar-baz in namespace foo, baz in namespace foo-bar, or a normal package named foo-bar-baz.

Concretely, assuming a known prefix like corpspace- (picked blithely to demonstrate that the actual prefix doesn’t matter), grants could look something like this:

  • corpspace-foo: foo is a corporate organization namespace for foo
  • corpspace-wing-ding: wing-ding is a similar corporate organization namespace

Similarly, there could be an orgspace- (again purely as an example) well-known prefix for non-corporate organizational namespaces allotments.

I think the two strongest arguments for this approach are that it’s (1) backwards compatible with the flat index like the current PEP is, and (2) avoids much of the ambiguity around how to interpret identifiers that may or may not have a namespace component in them.

At the same time, I think it’s been raised that companies may not like this naming scheme as much as having top-level prefixes (although it’s similar to the popular/mandatory RDN scheme in Java). However, my personal inclination is that the aesthetic value of corporate namespaces should be given relatively less priority versus efforts to generally reduce the ambiguity of Python packaging :slightly_smiling_face:

(I recognize that it’s poor form to repeatedly bring up an idea for consideration, so this will be my last time reintroducing it!)

5 Likes

No worries at all!

Do you think this should be in the specification or are you saying to ensure that the specification doesn’t prevent repositories from doing this if they so choose?

1 Like

I was thinking the former: IMO it would be nice to have the namespace PEP define a namespacing scheme that’s minimally ambiguous, and I think fixed well-known prefixes accomplish that while not compromising the “must be backwards compatible” constraint.

(Of course, selecting those fixed prefixes ends up being a policy decision by each index, but IMO it’s a smaller + more constrained one…)

1 Like

I like this idea. It also removes the issue of individuals picking a naming scheme like flufl.enum only to later find that some organisation has taken the flufl namespace, disrupting the individual’s naming scheme for future projects.

3 Likes

And last question, are you both saying this in addition to what has been specified or are you envisioning this is the only way?

My preference would be for it to be the only way – IMO that’s the only way to disambiguating benefits are fully realized.

(I could see an argument for the PEP standardizing it as an addition, in which case I’d advocate for PyPI itself - as a matter of policy - adopting only prefixes for namespacing. But given that PyPI is the elephant in the room, I think it would be parsimonious to have it be the only way, full stop.)

1 Like

I see.

Are you imagining that Google, Amazon, and every company in addition to Django, Jupyter, and every project will voluntarily change all of their existing and future package names? My argument is that I truly don’t see that happening in any case, ever.

2 Likes

Answering in one company case, for my current company there is no strong standard for package/library names and we have internal registry for python packages/other languages. I do maintain several internal libraries used by bunch of teams. And while I’d strongly prefer to avoid renaming module names, renaming package names is much easier step for me. Some of packages I handle already have inconsistent package names vs import names anyway so adding new prefix I would not mind much.

Edit: The libraries I am thinking of are all internal though where user count is limited to some company engineers (like maybe 100 at most) and not public widely used library like google-cloud ones.

Other aspect is I feel it’s easier for companies to make dictatorial decisions like this than open source projects. I’d be rather surprised if Django did rename but if google-cloud did package rename with push of some high level internal engineers I can much more easily see it.

1 Like

My argument here is twofold:

  1. Companies, as-is, already don’t consistently use naming schemes that are compatible with this PEP. Google has some packages under google-, but they also have many that aren’t. Amazon has boto3 and botocore, etc. In other words: regardless of what this PEP standardizes on, there is going to be some heartburn for corporations to shoehorn their pre-existing package names into the new scheme.
  2. Companies may be less happy with a non-fixed-prefix scheme in which they “control” a namespace, but pre-existing members of that namespace are carved out as grandfathered exceptions. If I was Google or Amazon, I would personally be unhappy with a fixed prefix, but even more unhappy with the idea that I do control google- except for a large number of pre-existing exceptions (including potentially illegible exceptions, e.g. via projects that exist due to registration with no releases).

Or as a TL;DR: I think that migrating to a fixed-prefix namespace is no harder than the migrations that companies will likely need/want/feel pressured to do anyways. And, of course, companies can continue to publish packages under both package-level names, or do deprecation processes to move users over to the fixed-prefix name as they please.

Edit: To evidence argument (1) above, here’s a Google PyPI account demonstrating ownership over hundreds of packages with a large number of naming conventions: Profile of google_opensource · PyPI

4 Likes

What I’d say is: if that is the only way to gain full control over a namespace, some of them might. This is especially true given that many are likely to never gain truly full control over any namespace under your current proposal, due to the existence of grandfathered-in packages sharing the namespace. There’s also nothing stopping them from publishing the same package under two names, or replacing one with a dummy package that says “mycorp-stuff has now been renamed to corp-mycorp-stuff, please use that instead”.

1 Like

I feel very strongly that this is untenable and advocates really should just wait until somebody writes a proposal for explicit namespaces which is implied by this PEP. I will add this to the rejected ideas with sufficient rationale!

2 Likes

Simple repo API v1.1 is already defined: Simple repository API - Python Packaging User Guide

In addition, v1.2 is provisionally defined: PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks | peps.python.org

2 Likes

I might be misunderstanding, but what would this have to do with explicit namespacing? I think the “known prefix” approach is similarly implicit, in the sense that it’s composed of pre-existing syntax.

Thank you! I think I disagree about this being untenable, but I will reserve further discussion until there’s a rationale to ground the conversation against :slightly_smiling_face:

Slight nit in the way restricted namespaces are defined: “A restricted namespace only allows uploads from an owner of the namespace.”

They’re not quite that narrow. Repositories are free to define delegation mechanisms, such that a namespace owner may grant publishing authorization for the namespace without adding someone directly to the owning organisation.

(The glossary of terms in PEP 752 is as far I made it right now. The summary of changes in the previous thread looked very promising, though)

Disallowing top level namespace grants feels like a PEP 755 (PyPI policy) topic rather than a PEP 752 (client & server API implementation) topic.

Even if there was a defined “permissible reserved prefixes” list outside PyPI’s policy for reviewing grant requests, what would clients be expected to do with that information? On upload, it’s up to the server to decide whether to accept the upload or not, and on download, if the client can’t even trust the server when it says it is controlling uploads to a particular namespace, there are much bigger problems to worry about.

The actual prefix does matter, quite a bit, since it says something about every single package using that prefix, and it has to entirely unused on PyPI.

  • a PyPI search shows restricted- isn’t empty (ditto ns-, namespace-, reserved-)
  • org- wasn’t obviously in use, but may have not-for-profit connotations that aren’t applicable
  • etc

It would make sense for PyPI to try to find such a prefix and tightly control it, so organisations can choose between an easy request for a namespace without that controlled prefix rather than a potentially drawn out review process for a top level namespace grant, but that’s all still PyPI policy when reviewing namespace grant requests rather than anything that directly affects how client and server API implementations are written.

2 Likes

Maybe I’m slicing this too thinly: I was thinking that the concept (“namespaces are defined only within a fixed set of index-chosen top-level grants”) and the practical allocation (“PyPI chooses these prefixes for its namespaces”) would be split along implementation/policy lines.

My thinking behind that is that it’s beneficial to reduce ambiguity at the implementation layer if possible, rather than deferring to the policy later – one of my main arguments for forbidding arbitrary top-level grants is that they violate the general security maxim of “don’t add new meanings to a pre-existing identifier” (package identifiers, in this case). Fixed prefixes also violate that maxim but not as pervasively, hence IMO there being a benefit to saying “indices can pick any fixed prefixes as a matter of policy, but as a matter of implementation they must not allow arbitrary top-level grants.”

So in other words: I don’t expect this PEP to define the list of permissible reserved prefixes, but I would be interested in it adding constraining language saying that index policy should be defined over a set of index-chosen reserved prefixes :slightly_smiling_face:

(But still: possibly slicing too thin. If so, this can definitely wait until a policy discussion builds from the concrete prescriptions in this PEP!)

This is veering a little into policy, but IMO it doesn’t have to be entirely unused: PyPI could make the administrative decision to free an almost unused prefix. That wouldn’t be ideal if the prefix contains more than 1 or 2 unmaintained projects, but there may be some candidates that fit those qualifications.

But yeah, this is why I wanted this PEP to only proscribe “must be a fixed prefix” and not actually select them – selecting them is hard :sweat_smile:

(I think org- would be a good candidate, since PyPI already uses the term “organization” to refer to both paid and free organizations. If org- itself has too much of a “non-profit” connotation, maybe it could be org-corp and org-comm for corporate and non-corporate namespaces, respectively? But I’m pretty bad at naming :slightly_smiling_face:)

I think the furthest the API design PEP could reasonably go is to say that the PyPI policy MAY start conservatively and disallow top level grants to parties other than the PSF. Then all initial grants to publishers would be within the selected prefixes chosen in the PyPI policy PEP.

At the implementation level, I think the reserved prefixes would just be top level grants owned by a dedicated entity (with a name specified in the PyPI policy), so there wouldn’t be any technical changes needed to describe the prefixes separately from the namespace grants.

Potentially relaxing that in the future would then be a pure policy discussion that doesn’t require any technical changes.

1 Like

I’m repeating myself from the previous PEP 752 thread, but: I don’t think ‘open namespaces’ really solve a problem, and my impression in that thread was that the community organisations you had in mind for these (like Jupyter or mkdocs) would often prefer to have a restricted prefix.

Rather than ‘open namespaces’ being a separate type, why not make every prefix restricted when it’s registered, and allow the owner to delegate upload access to either a sub-prefix, or the whole of their prefix, as they choose - i.e. allowing anyone is just another delegation choice.

1 Like

The idea is to offer orgs that choose to allow free registration of new projects the option of doing so.

However, if an org actually takes advantage of that option, the security benefits of the reserved prefix get much weaker, so it needs to be clearly advertised that this is a weaker form of prefix reservation than one where publishers are specifically vetted and approved by the holders of a namespace grant.

If the option is left out, the affected orgs would just not seek a namespace grant in the first place.

If the option is left in, but not clearly distinguished from restricted namespaces, the whole scheme becomes less trustworthy.

It’s also worth keeping in mind that if there’s no straightforward way to define open nested prefixes, it forces any extension or plugin namespaces to use a different top level name.

1 Like

I agree that the distinction from a reserved prefix would need to be very clear. But it seems like the distinction from no prefix (i.e. every package today) doesn’t really matter, because it’s effectively the same.

I guess this seems fine to me. If you don’t want to restrict the prefix, why register it at all? I can already see on PyPI whether a project is owned by an organisation such as jupyter, so it’s not adding anything there. What problem does an open prefix solve?

The only concrete advantage that I can see is that registering a prefix like jupyter- prevents anyone else from maliciously registering it and blocking the jupyter organisation from creating new jupyter- packages. I guess organisations might register a prefix just for that, but it feels pretty grim if we need that.

3 Likes