Yeah, there’s a messy history here - starting from the fact that the original wheel spec simply wasn’t anything like as precise as we expect a specification to be these days (times were different then ).
Given where we are now, I’d say that we don’t really have much choice other than taking what’s in the official wheel spec as the current definition, but with the understanding that because of backward compatibility issues, parts of the ecosystem (notably Warehouse) don’t conform.
Maybe we need to change the spec, rather than fixing Warehouse. But if we do, then I think that at this point we have to make any spec change via a PEP, and not try to “just fix things”.
I think nobody actually conforms currently? Or at least nobody that is releasing to PyPI, since PyPI rejects those wheels, unless there are projects doing the normalization, and they currently just don’t support PyPI if you have a . in your name.
I think Warehouse requires a fix regardless of what we decide to allow the normalized name, because the original PEP 427 spec didn’t require using the non-normalized name, so I think it’s wrong either way that Warehouse doesn’t allow that.
For Warehouse I think the question is whether it allows the normalized name, or requires the normalized name (or allows, but renames).
If we switch to requiring normalization then I think that will break everyone releasing to PyPI today, since Warehouse doesn’t allow normalization, every build system that wants to release to PyPI has to have not implemented the current state of the wheel spec.
So, switching to requiring normalization would have to be phased in, where we allow both, get all the build systems to switch to emit normalized, then after some period of time, switch to enforcing it in Warehouse.
It’s not really up to me since you’re the PEP Delegate here, but FWIW I think it’s better to update the spec to match reality, then treat restricting it further as requiring a PEP rather than treating a clause in the spec that is more restrictive than reality as canonical and requiring a PEP to update things to match reality.
The reality of the situation is pretty much what the original PEP 427’s code implemented, that normalization is not required (but it’s not prevented either) but you are required to escape the - character.
Installers have to handle non normalized names already, because they exist everywhere on PyPI.
At least some of the major build tools don’t implement normalization.
The central repository doesn’t support it.
If the decision is that the spec stands as is without a PEP, then we need someone to act as the champion to go around and get all of these tools to update, otherwise it’s likely to sit there and bit rot with the spec and reality not matching.
When I say everyone, it’s not just the . in the name either, projects like Django are releasing wheels named Django-4.0.6-py3-none-any.whl, which as the spec stands should be django-4.0.6-py3-none-any.whl.
I might be mistaken, but as I understand it most of the other backends, e.g. @ofek 's Hatch and @takluyver 's Flit, had already implemented what was eventually specified in the current wheel spec, and Setuptools was the main outlier? Perhaps they, @frostming and @sdispater could confirm their current situation?
As I seem to remember and as the spec implies, that was in the same boat as the . issue, where Setuptools (Django’s backend) was not normalizing that either, while most of the other backends were. But like that issue, that was somewhat before my time and (as you helpfully mentioned above) there is more to the backstory than I recalled.
Hatch only can work to release to PyPI because it doesn’t actually let you have a . in your name either, it hard normalizes all names and removes the concept of display names completely.
I was a little broad in my “Nobody who is releasing to PyPI with a . in their name” because I didn’t realize that Hatch actually doesn’t allow your project name to be unnormalized at all, so they are able to release on a technicality.
However, I think I still accurately described the status quo, that the only real rule in practice is that you escape - to _ and compare the segments normalized at comparison time (not at production time).
I want to be clear, I don’t actually care which way we decide on this, just that if we decide to stick with the spec that currently doesn’t match reality, we need someone to manage these changes, and I’m fairly sure that @jaraco would be opposed to them, though I don’t know if he’d block adoption in setuptools over it or not.
To me, unless they got an Artifactory feature request at some point, Poetry choosing not to normalize is a positive signal that normalization is in fact closer to spec.
def safe_name(name: str) -> str:
"""Convert an arbitrary string to a standard distribution name
Any runs of non-alphanumeric/. characters are replaced with a single '-'.
"""
return re.sub("[^A-Za-z0-9.]+", "-", name)
def safe_version(version: str) -> str:
"""
Convert an arbitrary string to a standard version string
"""
try:
# normalize the version
return str(Version(version))
except InvalidVersion:
version = version.replace(" ", ".")
return re.sub("[^A-Za-z0-9.]+", "-", version)
def to_filename(name: str) -> str:
"""Convert a project or version name to its filename-escaped form
Any '-' characters are currently replaced with '_'.
"""
return name.replace("-", "_")
that looks like the original interpretation of PEP 427.
I’m not sure what this statement is supposed to mean, it sounds like you’re suggesting that poetry choosing something means the opposite is the right thing, which sounds wrong? I don’t agree with everything that poetry does, but I don’t think they purposely choose directions opposed to the specs.
If you can describe the update you want precisely, then write that up and call it a pre-PEP. We can then try to decide whether it will work (and if not, what we do about that). I’m not looking to make the process onerous by saying it should be a PEP, just that we should be clear what we’re agreeing to (which is all a PEP is, IMO).
At the moment, this thread seems to be a mix of suggestions, discussions and proposals, and I can’t honestly say what exact changes anyone wants. A PEP (or “written proposal” if not calling it a PEP yet feels more comfortable) would cut through that confusion.
I’d written a bunch of stuff here, but deleted it, because what it basically came down to was:
Yeah, I can see a number of migration issues moving from the current state of affairs to what the spec says. If we’d had a PEP for the change to the spec, and if I’d insisted on a “Transition Plan” section in that PEP, then we could have thrashed out those details then. But we didn’t, and that sucks. More reason (IMO) to insist on a PEP if we’re changing things again.
The current situation feels like a mess, and I don’t see how anyone could document anything useful (I don’t count a spec that says “we’d like such and such to happen, but it’s optional for tools to conform” as useful…) If you want to try, that’s fine. But I don’t think where we are now is something we want to live with permanently.
I’d still rather treat the current spec as the intended end goal, and spend time working out how to get there, rather than spend time trying to document where we are now just so that we can then write another spec saying where we want to get to in the long term.
Also, I think that the fundamental issue here is that we’ve never formally agreed on a single, common normalisation for project names. PEP 503 and the original PEP 427 defined different normalisations, which is why we have this mess in the first place. The spec change for wheels moved to a common normalisation (PEP 503) with a minor adaptation for wheel filenames (replace dashes with underscores). So whatever its flaws, the current wheel spec is at least consistent with everywhere else where we normalise project names. I don’t think we should abandon that feature lightly. In fact, the core metadata specification for project name formalises that normalisation as the one to use[1] and furthermore states that it is to be used “for comparison purposes” - so that two project names that normalise the same must be considered the same project.
I’ll also note that all of this is a digression from @bernatgabor’s original request, which was that project name normalisation in wheels should not escape dots. Once we’ve thrashed out whether we want to consider the current spec as correct (whether or not projects have actually implemented it yet) then we can go back to the question of whether we want to change it…
Another change that I think was made without a PEP, unfortunately. Frankly, I think our existing standards process, which states that changes can be approved as minor by agreement on discourse, is too loose here, and is allowing too much to go through without a proper PEP. ↩︎
Tools that produce wheels have to escape - to _ but are otherwise able to emit anything that normalizes (via PEP 503) to an equal value.
Tools that consume wheels cannot assume that the wheel is already normalized (via PEP 503) and must do the normalization themselves.
That means all popular tools (except Warehouse) are now in compliance, and Warehouse can be updated to accept to correctly handle names when normalized.
If we think it’s a good idea to restrict this (I don’t have an opinion) to require normalization, then we can manage that change with a PEP.
I can write that up in a different form if you want, but I figured a PR to the actual spec would be easier.
So what happens if someone uploads Django-1.0-py3-none-any.whl and django-1.0-py3-none-any.whl to a package index? Which one should consumers use? Are they required to have the same content? Or are indexes required to disallow this? In which case what does pip do when given two indexes, which each only have one of the two files, but different ones? Is pip allowed to use a cached django-1.0-py3-none-any.whl to avoid downloading Django-1.0-py3-none-any.whl?
The current spec is just as problematic, of course. None of the normalisation recommendations are MUST, they all say SHOULD[1], even the one that ensures that there are no hyphens in the project name - that’s only guaranteed by the global requirement that components cannot contain dashes. In fact, given that the existing spec has no mandatory requirements, I don’t actually see how any current tool can be failing to conform to the spec…
OK, it’s too late for me to dig any deeper into this right now. I still believe that we should require a PEP to change the spec (and the fact that the spec is all SHOULD rules reinforces that, because tools do currently conform to the spec, even if only in a useless sense, so there’s no immediate issue). If my role as PEP delegate gives me the right to demand a PEP, then I’ll do so. But I’m not clear if I have that right according to our governance rules, so if people want to debate governance and/or claim that “agreement on the Packaging discourse” is sufficient for a change to be made without a PEP, I’ll defer to whoever it is that handles governance decisions. And if no-one knows how we resolve questions like this, we have a different problem
Marginally related, but PEP 503 normalization was adopted for extras names in PEP 685 as a single common normalization format with distribution names, with the consensus being that this advantage was sufficiently compelling to outweigh more closely matching the original core metadata spec for extras names and as well as common usage.
I guess it depends on whether one considers not mandating normalization to be inconsistent with the status quo of much of the ecosystem doing so. To note, this also reverts the change to normalize case, which hasn’t been much discussed here, and didn’t play a part in the original request, though it does tend to match with whether or not . normalization is also implemented.
For reference, distilling the results you helpfully provided in your previous post and filling in a couple gaps, we have:
Tool
No .
Lowercase
Flit
*
Poetry
Hatch
Setuptools
PDM
* Doesn’t allow . in user-declared distribution names
These sound like good questions for a repository spec, I don’t see how they’re related to the wheel spec.
Nothing in the repository spec requires treating filenames as unique, PyPI has chosen to do that.
If we want to interpret those as the RFC SHOULD and not as the english word should, then the spec would be even looser than what I’d recommend it is in practice, and there’s basically no requirements on filenames
I’d say that as the PEP Delegate for these things, you’re well positioned to decide if they require a PEP or not. I’m not likely going to write a PEP to do so anytime soon as I have other PEPs that I want to focus on.
I’d be -1 on changing Warehouse until the spec is clear in what it’s required, as any change we make risks making the situation more complex.
Updated PDM with the results that @ofek posted, since he linked to the actual code in PDM, but to make sure I also tested it:
~/foo.bar is 📦 v0.1.0 via 🐍 v3.9.13 (t)
❯ pdm build
Building sdist...
Built sdist at /home/dstufft/foo.bar/dist/Foo.bar-0.1.0.tar.gz
Building wheel...
Built wheel at /home/dstufft/foo.bar/dist/Foo.bar-0.1.0-py3-none-any.whl
Tools that produce these distribution names MAY choose to emit them in any form that would result in the same result when using :pep:503 normalization rules.
I might be missing something but wouldn’t MUST be alright since all backends satisfy that currently?
I’m not going to update the PR, since Paul has stated he wants to see this as a PEP if we’re going to change things, and also that the spec is full of “shoulds” not “musts”, so it currently doesn’t really require much of anything from filenames. Since it’ll have to go through the PEP process, I’m not sure that spending time honing a PR is a particularly useful endeavor, and I’m actually just going to close it for now.
That being said, I mostly meant what you’re saying here, probably a better way to word it would have been:
Tools that produce these distribution names MAY choose to emit them in any form it desires, but that form MUST result in the same string when using :pep:503 normalization rules.
Or something along that lines. Basically I was trying to give permission for hatch/flit/etc to continue to normalize if they wanted to, but say that no matter what a project emits, it has to end up normalizing to the correct value.
That phrasing is arguably somewhat confusing to me, but what I thought it was intending to say was that backends may use PEP 503 normalization instead so long as it is a superset of the escaping required in that section (which PEP 503 already ensures, AFAIK). Otherwise, I’d imagine the concern is that it could be read as not permitting further normalization including PEP 503, which would then mean ≈half the current backends are not conferment with the spec (by that interpretation).