Change in PyPI upload behavior. Intentional, accidental, pebkac?

Yes, exactly! I want the packaging tools to honor my name in pyproject.toml in the metadata. I think this is the clarification that @dstufft is working on.

Yup, that’s major part of the PEP draft, that the metadata name field MUST NOT be normalized, and that this SHOULD be the name that is presented to the user—with the other presumably being, at least based on emerging consensus here and among tools, that sdist/wheel and .dist-info file/directory names shall be normalized. This seems to be the compromises which makes most people happiest and addresses the pain points of both normalizing and not normalizing without significantly regressing on the other.

1 Like

What is the policy of name field in the POST request? Here is what I got:

[PublishError]: 400 Client Error: Start filename for 'test.dot' with 'test.dot'. for url: https://test.pypi.org/legacy/

The name in both METADATA and POST payload is unnormalized form: test.dot, while the sdist filename is normalized as PEP 625: test_dot-0.1.0.tar.gz

But if I change the dot to hyphen: test-dot, it is uploaded successfully. The PyPI side treats dot and hyphen differently. hatchling is flexible about this, but it seems the validation logic is wrong.

1 Like

In Hatchling?

Oh, I mean the validation made by PyPI warehouse


Ah, I found it reported at

3 Likes

Thanks for finding those @frostming. From reading the warehouse bug, it sounds like @dstufft basically knows how to fix the problem, but wants to clarify the spec first to avoid making the problem even murkier. That seems like a reasonable approach, although I think that also means that those of us with dots in their package names are stuck not being able to do uploads. It’s also not clear to me whether (temporarily) switching to a different upload tool than pdm publish would unblock me.

I have chosen hatchling as a build backend essentially only for its strict-naming option, which produces dotted package names that PyPI will accept.

Using a build backend that doesn’t normalize distribution file names is the only viable way to fix this. According to the previous discussion, setuptools and hatchling with strict-naming = false should work.

However, this is more like a workaround because the PEP 345 says the file names should be normalized. It needs to be fixed on the warehouse eventually. It should use packaging.utils.canonicalize_name in favor of pkg_resouces.safe_name since the latter doesn’t normalize dots.

Sorry some personal stuff has been happening, will be trying to get the PEP out tomorrow. Since it appears this issue has gotten all the backends to agree now, that PEP will clarify, and given that there’s agreement now, it should be fine to fix the bug in Warehouse.

2 Likes

Thanks @dstufft and hope all is well.

Can you summarize what we’ve agreed on to be sure we’re all agreeing to the same thing? :wink:

1 Like

The general consensus as I understand it, per the discussion here and with the Setuptools maintainers, and which all backends implement or plan to (Setuptools) appears to be:

  • The name field in PKG-INFO, METADATA and the project name on/uploaded to PyPI MUST NOT be normalized.
  • The package name as presented to the user SHOULD NOT be normalized.
  • The *.dist-info dirname and sdist/wheel filenames MUST be normalized.

(with “normalized” meaning PEP 503 normalization with - escaped to _)

@dstufft , is that your understanding as well?

@CAM-Gerlach Agreed, except I think the package name as presented to the user MUST NOT be normalized. This follows from point 1.

1 Like

Yes that’s basically it.

MUST NOT vs SHOULD NOT is one area I’m torn on still, but I think MUST NOT with stipulations is the way we’ll go.

The problem is basically just that you aren’t always operating in a context where you know what the “correct” project name is. For instance, if a user does pip install flufl-enum, pip can’t know until it’s selected a particular file to download what the “preferred” name is.

So at a minimum it can’t provide the flufl.enum name until that point, and it may end up being more confusing to users to have the name switch from flufl-enum to flufl.enum in pip’s output part way through the installation process?

So I may end up adding a carve out or something to handle the edge case where the project’s preferred name isn’t known.

1 Like

I see what you’re getting at @dstufft. The contexts I care about are that the PyPI home page for flufl.enum MUST state flufl.enum as the package name and provide pip install flufl.enum as installation instructions. This matches the package’s own README, description, and RTD documentation. It would also match generally how dependents would specify a dependency on the package and pip (or whatever tool) would install it.

So I think that since flufl.enum is my canonical, preferred name of the package, that will be the most common use case, once this all lands.

For the case where someone tries to pip install flufl-enum, as long as flufl.enum ends up getting installed, I think it SHOULD display the proper name, but agree that a carve out is okay. Maybe once it find the file it can say something like “you requested flufl-enum, but the package is actually called flufl.enum”. Of course, no one will ever read that output, so :man_shrugging: :stuck_out_tongue_winking_eye:

4 Likes

In a hypothetical future where more Unicode characters are allowed in non-normalised project names, would PyPI be expected to shell quote the install command?

I don’t think so. Even now, the difference between pip, python -m pip, py -m pip, python3 -m pip, pip3, etc. is ignored. I’d think it just as reasonable to ignore the question of shell metacharacters and quoting. We went through a lot of this on pip, and came to the conclusion that you can’t win, so just put something straightforward and leave it at that.

2 Likes

Besides the fact that (AFAIK) that’s what the PEP currently uses, after some similar thought I went with the SHOULD NOT wording because (as you mention) there are some valid use cases (both that we can think of now, and potentially some we haven’t) where a tool might have to (or in rare cases, want to) display the normalized name in some form. Per RFC 2119, SHOULD NOT does not mean that backends can show the normalized name if they feel like it or out of mere convinience, but rather only for specific cases with a valid, carefully thought out reason appropriate to the particular circumstances:

4. SHOULD NOT This phrase, or the phrase “NOT RECOMMENDED” mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

I worry it might be a little too prescriptive and over-specified to try to enumerate the totality of cases where showing the normalized name might be acceptable and forbidding conforming tools for showing it in all others, as opposed to relying on the defined meaning of SHOULD NOT and a careful evaluation of the specific circumstances under the presumption of using the normalized name unless a clear reason exists otherwise.

2 Likes

@CAM-Gerlach

I see what you’re getting at, but I just want to narrowly push back a bit:

I just don’t want this to be used as a loophole to show the normalized name on the package’s home page on PyPI (or any other package index). I think it should be clear that in that narrow context, the non-normalized name MUST be used.

Has the following been discussed already?

What happens if the sdist and the wheel(s) contain different “display names”? Or if the display name is not consistent throughout releases? Is it even possible?

That’s a particular case of the general point that PyPI tries to infer project-wide metadata values, when metadata is actually per build artifact. There’s no good answer to that short of a new standard, but I think that “PyPI does the same in this case as it does in other cases where it displays metadata” is a practical answer for now.

FWIW, Metadata 2.2 (PEP 643) prohibits the Name metadata field from being dynamic, so when PyPI supports that PEP (and projects start using it) we can be sure that the project name is consistent within a release, at least.