Sorry Paul, I think I expressed myself badly (argh, Python nomenclature can get a bit messy sometimes). Let me paraphrase myself bellow:
As far as I understand (and please correct me if I am wrong), the main points seem to be:
PyPI (as a public package index) has very strong reasons for enforcing strict uniqueness checks (security reasons, competition between publishers that might confuse users, etc…). Therefore it is not viable to differentiate between distributions named after “normal packages” and namespace packages on PyPI.
pip, whose primary use case is to download from PyPI, prefers to rule out the possibility of treating distributions named after namespace packages and “normal packages” as two different distributions. This is compatible with PyPI and also helps users to fix unintentional typing errors and avoid downloading wrong/malicious distributions.
Having one normalisation rule to be applied everywhere would be simpler.
There is some advantage in normalising the .dist-info directory (as pointed out by Pradyun), and if I understood correct this would also help to optimise the checks for conflicting distributions already installed (since .dist-info serves as a database).
If I understood correctly, although not strictly necessary, the idea is to rule out the coexistence of distributions named after namespace packages and “normal” packages even in the private index scenario.
Private indexes cannot treat zope.interface and zope-interface as different packages regardless of what happens with wheel filenames.
PEP 503 requires the normalized form of the name to be used in URLs when pip requests the Simple API for a given project from an index. So pip install zope.interface means pip does GET /simple/zope-interface/.
This was done to solve several real problems at the time, but in effect it means that an index server cannot treat names that normalize the same as different projects.
Given that the differentiation between distribution named after namespace packages and “normal” packages is already ruled out in private indexes regardless, there is no much point in keeping producing different files for them… If we drop it we can comply with the optimisations mentioned earlier.
I will go back and summarise these points in the setuptools issue I mentioned earlier. If anyone in the community would like to submit a PR I will try to review it (although specifically talking about .whl files, it might be implemented in pypa/wheel).
Ultimately what I care about is that the spec isn’t just some aspirational document of things we think it would be cool if they were followed. They should be documents that clearly define what is a hard requirement, what is highly recommended, and what is fully optional, so that implementers through the chain know what things they can depend on, and what is required of them.
When a change is made that restricts something that was valid to make it no longer valid, there’s always going to be a transitional period. However, there needs to be some plan to move that transitional period along and to move us into a state where people can actually depend on the things that we spell out as hard requirements.
When we don’t do that, and we interleave hard requirements with things that are in effect, fully optional, it makes the specs unable to be relied upon. It forces every implementer to sit there and carefully figure out what the real, de facto spec is, because it differs from the specs as written.
For filenames, so many projects being released to PyPI fail the requirement of the spec that we cannot actually enforce it. However, we have to enforce something, otherwise even the most basic of requirements like name cannot have - will regress.
The situation is already crummy with the spec and reality not matching up, in this case in an obvious way. This creates a scenario where again, the spec on paper and the de facto spec are different, because what PyPI accepts is different than what the spec says. We could “just” fix PyPI to be more permissive, but all that does really is change the de facto spec to be differently different from the real spec.
This isn’t just some hypothetical problem of purity, but it has real practical implications. Otherwise you end up with the mess that HTML is
Absolutely agreed. However, we cannot dictate when tools will implement particular standards, and as a result tools “later” in the chain have to be more lenient than we might otherwise like.
To give a concrete example here, none of this would be an issue if backends had been the ones to promptly enforce the change to the spec, rather than PyPI. It sucks that PyPI can’t be strict yet, and as a result we continue to get non-standard filenames being uploaded. But it sucks just as much that PyPI won’t accept metadata 2.2 yet, and so backends can’t produce it and installers can’t use it to optimise.
I agree that we need a transition plan. But I don’t see what’s so bad about relaxing PyPI’s requirements until backends catch up. That’s a transition plan, and all it relies on is people being patient with each other (and specifically with the extended timescales involved in volunteer open source projects).
We did have an issue with the way the current spec was created, in that it didn’t go through “due process” and as a result setuptools objected to what we ended up with (which meant they weren’t willing to implement it). Hopefully that’s resolved now, but if not, we should focus on getting a normalisation standard that we do all agree on.
Until this thread, there was no evidence that the backends were going to catch up. AFAICT setuptools had an ideological disagreement with that requirement, and so there was not “until backends catch up”, it was just going to be “relax requirements… forever”.
It’s still not clear that there is agreement from setuptools that they’re willing to implement normalization of filenames. At least one maintainer seems to still be hard against it in the issue tracker. Until that disagreement gets actually resolved It feels very much like relaxing PyPI’s requirement is just allowing the spec to continue to diverge from reality, and potentially makes a final resolution more complicated because it adds yet another axis of preexisting behavior to consider.
It’s hard for me to express just how little I care about what normalization requirements we have, I just want there to be an actual agreed upon spec that we can implement, and the reality of the situation is that setuptools is a large enough constituent that if they’re unwilling to implement something, then we can’t consider it an agreed upon spec.
For anyone else who wants to either chime in or provide a different perspective, I tried to summarize the responses to Jason’s original question in a post on that issue.
From Gentoo (i.e. downstream packager) perspective normalization makes things easier. Our package naming rules diverge from those for Python projects (and they’re over 20 years old, and changing them would be a major backwards compatibility hassle). The current normalization rules make it possible for a clean 1:1 mapping from Gentoo package names to PyPI filenames.
The fact that setuptools diverge is a hassle but it’s a minor hassle because it simply implements the old specification. We need to support it anyway because of old package versions, so it’s a matter of having a switch to restore the old behavior. It’s somewhat inconvenient because packagers now have to remember “you have to disable normalization if it’s setuptools or old”.
If setuptools finally started normalizing, the rule would eventually be simplified to “you may need to disable normalization if it is an old package”.
If the specs were changed again today, things would get really messy for Gentoo. For a start, maintainers would have to remember to switch between 3 normalization schemes now. What’s worse, Gentoo package names can’t have full stop character in them, so we won’t be able to do 1:1 normalization and instead we’d have to keep manually defining whether the - there converts to _ or to . (the problem is already there for non-normalized case but normalization gives us hope that it will eventually disappear).
It sounds like the Setuptools maintainers involved have now indicated they are okay with following the lead of other tools and the general consensus here per pypa/setuptools#3777.
Additionally, Flit >=3.9 (released a month ago) now normalizes sdist names following PEP 625 (i.e. same as wheels) per pypa/flit#628
Combined with pdm-pep517 being deprecated and no longer developed and pdm-backend having replaced it, which normalizes both sdist and wheel names, this leaves only Setuptools as the outlier, and it looks like that might change soon-ish. Therefore, it seems reasonable to update PEP 716 to reflect this new reality and mandate normalization, to be consistent with what existing tools do (or plan to, at least).
I’m now seeing what might be a related bug due to the broken package name normalization in some situations. I don’t have a root cause yet, but it seems related. At the very least, PyPI needs to honor the package name in its UI.
@dustin responded over on the PyPI bug, with what I suspected would be the case. pdm-backend normalized my pyproject.toml name in the package’s metadata, and PyPI honors that. So it does sounds like @ofek 's suggestion of using hatchling might be a temporary solution, although in my case, that will require pyproject.toml churn for the non-standard tool settings. Does pdm-backend need to support a similar option? @frostming
Hatchling does not normalize the name found within distribution metadata files so PyPI and other consumers have the raw text the user defines. Is that what you’re asking?
Yes, exactly! I want the packaging tools to honor my name in pyproject.toml in the metadata. I think this is the clarification that @dstufft is working on.
Yup, that’s major part of the PEP draft, that the metadata name field MUST NOT be normalized, and that this SHOULD be the name that is presented to the user—with the other presumably being, at least based on emerging consensus here and among tools, that sdist/wheel and .dist-info file/directory names shall be normalized. This seems to be the compromises which makes most people happiest and addresses the pain points of both normalizing and not normalizing without significantly regressing on the other.
What is the policy of name field in the POST request? Here is what I got:
[PublishError]: 400 Client Error: Start filename for 'test.dot' with 'test.dot'. for url: https://test.pypi.org/legacy/
The name in both METADATA and POST payload is unnormalized form: test.dot, while the sdist filename is normalized as PEP 625: test_dot-0.1.0.tar.gz
But if I change the dot to hyphen: test-dot, it is uploaded successfully. The PyPI side treats dot and hyphen differently. hatchling is flexible about this, but it seems the validation logic is wrong.