Follow-up from the previous thread: PEP 752: Package repository namespaces
This is now only the concrete specification and everything related to policy or operational choices for PyPI is in this thread: PEP 755: Implicit namespace policy for PyPI
Follow-up from the previous thread: PEP 752: Package repository namespaces
This is now only the concrete specification and everything related to policy or operational choices for PyPI is in this thread: PEP 755: Implicit namespace policy for PyPI
Thanks for splitting these up @ofek!
Iâd like to bring up a pure implementation (non-policy) point from the original thread around the idea of fixed prefixes (i.e., this comment).
In particular, I think this PEP should consider that approach and, if thereâs consensus that itâs not a good idea, then it should be listed under âRejected Ideasâ with a rationale.
To summarize the idea again, for continuity: rather than allowing grants of any top-level prefix (such as foo-bar-
or foo-
), namespaces would instead be defined as grants within a fixed subset of well-known prefixes.
The names of these prefixes can be subject to debate (or logistical/index constraints), but the idea behind them would be to limit/reduce the ambiguity behind package names like foo-bar-baz
, which in turn current proposal could be either bar-baz
in namespace foo
, baz
in namespace foo-bar
, or a normal package named foo-bar-baz
.
Concretely, assuming a known prefix like corpspace-
(picked blithely to demonstrate that the actual prefix doesnât matter), grants could look something like this:
corpspace-foo
: foo
is a corporate organization namespace for foo
corpspace-wing-ding
: wing-ding
is a similar corporate organization namespaceSimilarly, there could be an orgspace-
(again purely as an example) well-known prefix for non-corporate organizational namespaces allotments.
I think the two strongest arguments for this approach are that itâs (1) backwards compatible with the flat index like the current PEP is, and (2) avoids much of the ambiguity around how to interpret identifiers that may or may not have a namespace component in them.
At the same time, I think itâs been raised that companies may not like this naming scheme as much as having top-level prefixes (although itâs similar to the popular/mandatory RDN scheme in Java). However, my personal inclination is that the aesthetic value of corporate namespaces should be given relatively less priority versus efforts to generally reduce the ambiguity of Python packaging
(I recognize that itâs poor form to repeatedly bring up an idea for consideration, so this will be my last time reintroducing it!)
No worries at all!
Do you think this should be in the specification or are you saying to ensure that the specification doesnât prevent repositories from doing this if they so choose?
I was thinking the former: IMO it would be nice to have the namespace PEP define a namespacing scheme thatâs minimally ambiguous, and I think fixed well-known prefixes accomplish that while not compromising the âmust be backwards compatibleâ constraint.
(Of course, selecting those fixed prefixes ends up being a policy decision by each index, but IMO itâs a smaller + more constrained oneâŚ)
I like this idea. It also removes the issue of individuals picking a naming scheme like flufl.enum
only to later find that some organisation has taken the flufl
namespace, disrupting the individualâs naming scheme for future projects.
And last question, are you both saying this in addition to what has been specified or are you envisioning this is the only way?
My preference would be for it to be the only way â IMO thatâs the only way to disambiguating benefits are fully realized.
(I could see an argument for the PEP standardizing it as an addition, in which case Iâd advocate for PyPI itself - as a matter of policy - adopting only prefixes for namespacing. But given that PyPI is the elephant in the room, I think it would be parsimonious to have it be the only way, full stop.)
I see.
Are you imagining that Google, Amazon, and every company in addition to Django, Jupyter, and every project will voluntarily change all of their existing and future package names? My argument is that I truly donât see that happening in any case, ever.
Answering in one company case, for my current company there is no strong standard for package/library names and we have internal registry for python packages/other languages. I do maintain several internal libraries used by bunch of teams. And while Iâd strongly prefer to avoid renaming module names, renaming package names is much easier step for me. Some of packages I handle already have inconsistent package names vs import names anyway so adding new prefix I would not mind much.
Edit: The libraries I am thinking of are all internal though where user count is limited to some company engineers (like maybe 100 at most) and not public widely used library like google-cloud ones.
Other aspect is I feel itâs easier for companies to make dictatorial decisions like this than open source projects. Iâd be rather surprised if Django did rename but if google-cloud did package rename with push of some high level internal engineers I can much more easily see it.
My argument here is twofold:
google-
, but they also have many that arenât. Amazon has boto3
and botocore
, etc. In other words: regardless of what this PEP standardizes on, there is going to be some heartburn for corporations to shoehorn their pre-existing package names into the new scheme.google-
except for a large number of pre-existing exceptions (including potentially illegible exceptions, e.g. via projects that exist due to registration with no releases).Or as a TL;DR: I think that migrating to a fixed-prefix namespace is no harder than the migrations that companies will likely need/want/feel pressured to do anyways. And, of course, companies can continue to publish packages under both package-level names, or do deprecation processes to move users over to the fixed-prefix name as they please.
Edit: To evidence argument (1) above, hereâs a Google PyPI account demonstrating ownership over hundreds of packages with a large number of naming conventions: Profile of google_opensource ¡ PyPI
What Iâd say is: if that is the only way to gain full control over a namespace, some of them might. This is especially true given that many are likely to never gain truly full control over any namespace under your current proposal, due to the existence of grandfathered-in packages sharing the namespace. Thereâs also nothing stopping them from publishing the same package under two names, or replacing one with a dummy package that says âmycorp-stuff has now been renamed to corp-mycorp-stuff, please use that insteadâ.
I feel very strongly that this is untenable and advocates really should just wait until somebody writes a proposal for explicit namespaces which is implied by this PEP. I will add this to the rejected ideas with sufficient rationale!
Simple repo API v1.1 is already defined: Simple repository API - Python Packaging User Guide
In addition, v1.2 is provisionally defined: PEP 708 â Extending the Repository API to Mitigate Dependency Confusion Attacks | peps.python.org
I might be misunderstanding, but what would this have to do with explicit namespacing? I think the âknown prefixâ approach is similarly implicit, in the sense that itâs composed of pre-existing syntax.
Thank you! I think I disagree about this being untenable, but I will reserve further discussion until thereâs a rationale to ground the conversation against
Slight nit in the way restricted namespaces are defined: âA restricted namespace only allows uploads from an owner of the namespace.â
Theyâre not quite that narrow. Repositories are free to define delegation mechanisms, such that a namespace owner may grant publishing authorization for the namespace without adding someone directly to the owning organisation.
(The glossary of terms in PEP 752 is as far I made it right now. The summary of changes in the previous thread looked very promising, though)
Disallowing top level namespace grants feels like a PEP 755 (PyPI policy) topic rather than a PEP 752 (client & server API implementation) topic.
Even if there was a defined âpermissible reserved prefixesâ list outside PyPIâs policy for reviewing grant requests, what would clients be expected to do with that information? On upload, itâs up to the server to decide whether to accept the upload or not, and on download, if the client canât even trust the server when it says it is controlling uploads to a particular namespace, there are much bigger problems to worry about.
The actual prefix does matter, quite a bit, since it says something about every single package using that prefix, and it has to entirely unused on PyPI.
restricted-
isnât empty (ditto ns-
, namespace-
, reserved-
)org-
wasnât obviously in use, but may have not-for-profit connotations that arenât applicableIt would make sense for PyPI to try to find such a prefix and tightly control it, so organisations can choose between an easy request for a namespace without that controlled prefix rather than a potentially drawn out review process for a top level namespace grant, but thatâs all still PyPI policy when reviewing namespace grant requests rather than anything that directly affects how client and server API implementations are written.
Maybe Iâm slicing this too thinly: I was thinking that the concept (ânamespaces are defined only within a fixed set of index-chosen top-level grantsâ) and the practical allocation (âPyPI chooses these prefixes for its namespacesâ) would be split along implementation/policy lines.
My thinking behind that is that itâs beneficial to reduce ambiguity at the implementation layer if possible, rather than deferring to the policy later â one of my main arguments for forbidding arbitrary top-level grants is that they violate the general security maxim of âdonât add new meanings to a pre-existing identifierâ (package identifiers, in this case). Fixed prefixes also violate that maxim but not as pervasively, hence IMO there being a benefit to saying âindices can pick any fixed prefixes as a matter of policy, but as a matter of implementation they must not allow arbitrary top-level grants.â
So in other words: I donât expect this PEP to define the list of permissible reserved prefixes, but I would be interested in it adding constraining language saying that index policy should be defined over a set of index-chosen reserved prefixes
(But still: possibly slicing too thin. If so, this can definitely wait until a policy discussion builds from the concrete prescriptions in this PEP!)
This is veering a little into policy, but IMO it doesnât have to be entirely unused: PyPI could make the administrative decision to free an almost unused prefix. That wouldnât be ideal if the prefix contains more than 1 or 2 unmaintained projects, but there may be some candidates that fit those qualifications.
But yeah, this is why I wanted this PEP to only proscribe âmust be a fixed prefixâ and not actually select them â selecting them is hard
(I think org-
would be a good candidate, since PyPI already uses the term âorganizationâ to refer to both paid and free organizations. If org-
itself has too much of a ânon-profitâ connotation, maybe it could be org-corp
and org-comm
for corporate and non-corporate namespaces, respectively? But Iâm pretty bad at naming )
I think the furthest the API design PEP could reasonably go is to say that the PyPI policy MAY start conservatively and disallow top level grants to parties other than the PSF. Then all initial grants to publishers would be within the selected prefixes chosen in the PyPI policy PEP.
At the implementation level, I think the reserved prefixes would just be top level grants owned by a dedicated entity (with a name specified in the PyPI policy), so there wouldnât be any technical changes needed to describe the prefixes separately from the namespace grants.
Potentially relaxing that in the future would then be a pure policy discussion that doesnât require any technical changes.
Iâm repeating myself from the previous PEP 752 thread, but: I donât think âopen namespacesâ really solve a problem, and my impression in that thread was that the community organisations you had in mind for these (like Jupyter or mkdocs) would often prefer to have a restricted prefix.
Rather than âopen namespacesâ being a separate type, why not make every prefix restricted when itâs registered, and allow the owner to delegate upload access to either a sub-prefix, or the whole of their prefix, as they choose - i.e. allowing anyone is just another delegation choice.
The idea is to offer orgs that choose to allow free registration of new projects the option of doing so.
However, if an org actually takes advantage of that option, the security benefits of the reserved prefix get much weaker, so it needs to be clearly advertised that this is a weaker form of prefix reservation than one where publishers are specifically vetted and approved by the holders of a namespace grant.
If the option is left out, the affected orgs would just not seek a namespace grant in the first place.
If the option is left in, but not clearly distinguished from restricted namespaces, the whole scheme becomes less trustworthy.
Itâs also worth keeping in mind that if thereâs no straightforward way to define open nested prefixes, it forces any extension or plugin namespaces to use a different top level name.
I agree that the distinction from a reserved prefix would need to be very clear. But it seems like the distinction from no prefix (i.e. every package today) doesnât really matter, because itâs effectively the same.
I guess this seems fine to me. If you donât want to restrict the prefix, why register it at all? I can already see on PyPI whether a project is owned by an organisation such as jupyter, so itâs not adding anything there. What problem does an open prefix solve?
The only concrete advantage that I can see is that registering a prefix like jupyter-
prevents anyone else from maliciously registering it and blocking the jupyter organisation from creating new jupyter-
packages. I guess organisations might register a prefix just for that, but it feels pretty grim if we need that.