Amending PEP 427 (and PEP 625) on package normalization rules

To be perfectly honest, I hate the wheel spec, it’s insufficiently precise for the way things are these days - it was great when written, it got to the core of the issue and “sold” wheels when they were introduced, but now it’s showing its age, IMO.

The problem is that a wheel is defined by the filename format, so if you read the spec really, really strictly (and assume “must” rather than “should”…) then Django-1.0-py3-none-any.whl isn’t actually a wheel, because it doesn’t conform to the spec. No-one’s that pedantic in reality, of course, and compatibility means that we have to think about .whl files that don’t conform to the filename spec in any case, but it does mean that when we think too hard about details, things start to unravel a bit.

Hmm, you’ve nerd-sniped me a bit here. Maybe I’ll try to write a mathematically formal statement of some of these ideas and see where it leads. It’ll probably never work as a real spec, but it might clarify things a little (for people who consider mathematically formal definitions “clear” :slightly_smiling_face:)

Arttifactory supports currently this via the period. Sure we can ask them to switch to _, but note that development request and deployment in an enterprise environment is a one year period, at least. We should not break this workflow in the meantime. We should have a longer transition period when the old way of things keep working.

Given that there’s never been a spec yet that requires wheel producers to preserve dots in the project name (allows, yes, but it’s never been a requirement) that’s a reasonable but risky implementation-defined behaviour. I’ve no idea what Artifactory’s position is on complying to standards, but hopefully having a tighter standard would be better for them in the long run (even if it caused them short-term pain because they have to transition to a new approach).

If we change the current spec, then any PEP to do so should discuss the transition process and giving 3rd party index implementations time to change should be part of that. As for the current spec, it wasn’t done via a PEP and didn’t consider transition. I’ve already said that with hindsight I consider that a mistake, but I don’t think that we should compound the problem by making a second change without a PEP.

To be clear on my feelings here, I can completely understand people wanting project names that use uppercase, or dots, hyphens or underscores. All of “Django”, “flufl.enum”, “pip-tools” and “youtube_dl” make sense to me. And I have no problem with tools interpreting those unnormalised names (for example, to interpret a dot as a namespace separator). Normalisation should be solely for the purposes of allowing simple string-based comparisons and checks to work, and users shouldn’t need to deal with them directly. IMO, artifact names need to use normalised forms, because filesystems enforce uniqueness based on filenames, and we want to enforce uniqueness for artifacts. I think that Artifactory’s rules should be driven by the project display name, not by the normalised name, but that’s easy to say in a purely theoretical sense - the reality simply isn’t that straightforward, and I’m certain the choices they made were right in context. We should be evolving our standards in a way that helps tools like Artifactory move to a robust and sustainable model, but that needs engagement from both sides. In my experience the Artifactory developers are very approachable and flexible, but they don’t get involved in the packaging community much, so the more we can do to work with them (and other tool developers) the better. And one aspect of that is definitely to understand their timecales and transition challenges. But equally we can’t hold back the whole ecosystem just because enterprise tools have a slow adoption cycle. We already have enough trouble managing adoption with open source tools.

While “project names are not the same as (import) package names” I think it should not be prevented to name the project the same as its only import package name. …and allowing hierarchy in import names (using namespace import packages) is only a good thing. So I think allowing to express hierarchy using the dot character would be certainly beneficial.

Having namespaces would be great! IMHO It should be standardised. We can hear how difficult it could be to get a suitable project name:

This is allowed in the project name, and I agree it’s useful there, if people want to use it. But it’s not necessary to have it in the normalised version as well.

The one issue here is that we don’t have an API to get the unnormalised project name without reading the actual project metadata (actually, between them, PEP 643 and PEP 658 between them provide this, but they are not implemented fully yet). So tools try to use the normalised name (which is exposed), but normalisation isn’t reversible so they can’t get the actual project name.

So:

  1. Yes, dots in the project name are reasonable and useful.
  2. We have accepted PEPs that would let tools access that information easily.
  3. Those PEPs haven’t been implemented yet by key tools.
  4. In the meantime, tools are trying to “make do” by using the normalised name, but that loses information (and in particular, the dot).

I think the biggest benefit all round for the community would be if we push to get PEPs 643 and 658 widely implemented. Unfortunately, the key projects that need to do this are warehouse and setuptools, and both projects are extremely tight on resources.

It would be an extremely useful fundable packaging project, IMO, to get these two PEPs rolled out. I created Add 'Implement metadata PEPs' by pfmoore · Pull Request #48 · psf/fundable-packaging-improvements · GitHub to suggest it.

Yea :slight_smile:

Something I want to do at some point is a Wheel 2.0, but it’s pretty low on my stack and I hope someone else beats me to it.

There is PEP 423 which is somewhat related, but it is from 10 years ago, Informational and Deferred indefinitely. Nevertheless, it may provide some useful foundations for such an effort.

In theory, though it does create a number of tooling issues in practice.

I was about to suggest that, but you were way ahead of me :smile:

Not only that, but as we’ve discussed before, because the Metadata-Version is sequential, PEP 643 also blocks adoption of PEP 685, PEP 639 (which I’m about to release a new version of today, finally) and any future metadata changes.

1 Like

FWIW, Fedora is finishing up a transition to use PEP 503 normalization (no dots) for its Python package dependencies.

1 Like

Another FWIW is packaging.utils.parse_wheel_filename() normalizes the name. And if the rules for escaping do end up differing this much from normalization then we should probably create packaging.utils.escape_name() as tracked in Add an `escape_name()` function · Issue #542 · pypa/packaging · GitHub .

FYI in Release Hatchling v1.5.0 · pypa/hatch · GitHub I added a way to disable normalization in file names, satisfying the Artifactory situation. Whenever they work around this I’m hoping to remove that option.

I think that normalizing during parse is a reasonable way to go regardless of what happens, so that seems good to me.

1 Like

One semi-related issue I had recently is that I created a project named pkg-resources. The name wasn’t a big issue to me, and I used a hyphen for no really good reason other than I had to pick something. But it turned out that some of the tools I used (I can’t recall which right now) didn’t really like unnormalised project names. So I’ve switched to thinking of the project as pkg_resources and referring to it as that everywhere.

However, PyPI still considers the hyphenated form as the canonical name of the project. I haven’t dated try uploading a new version with the package metadata changed to use the underscore form for the name, but I suspect that wouldn’t work.

Ideally, I’d like to rename the project. Or if that’s not feasible, then delete it and re-upload everything with the new name. But of course I can’t do that either as deletion of projects isn’t allowed.

So maybe PyPI should allow renaming of projects when the old and new names both have the same canonical form?

(And yes, I know I could ask the PyPI admins for assistance renaming my project, but that seems like a lot of effort for them, just for a cosmetic change, and I know they are busy with much more important requests…)

1 Like

PyPI looks the project up by normalized name during upload and will rename the project if the “display” name is different.

So, this should just work™ today the next time you upload that project to PyPI, assuming that your build tools and upload tool don’t forcibly normalize the name prior to artifact creation / upload [1].


  1. I know setuptools/twine does not forcibly normalize, unsure for any other projects. ↩︎

2 Likes

Do you mean normalised? Underscores are converted to hyphens

I’m sorry, it looks like I was unclear, not least because I don’t recall how things got into the state they are in now. Also, I can’t even get the name of my own project right :slightly_frowning_face:

The project is pkg-metadata · PyPI. Note that the URL, and the project name on the PyPI page, contain a hyphen. The wheel metadata file says Name: pkg_metadata, and the project name in pyproject.toml (used by flit_core) also says pkg_metadata.

I don’t honestly recall how it got this way - I suspect that I registered the project as pkg-metadata, and then found out that flit normalises the project name, but I can’t find any evidence of having uploaded any artifacts under that name.

What I want now, though, is to make PyPI show my project name as pkg_metadata. And I don’t have any means of doing that, which I can locate (short of deleting and re-creating the project, which I can do right now, but I don’t want to, and which I won’t be able to do when deletions are prohibited).

That doesn’t seem to be happening for me. All of my metadata uses the normalised name, but PyPI still has the hyphenated version.

Sounds like either one of the tools in the process of getting that data to PyPI normalized the name prior to sending that data to PyPI or Warehouse has a bug. I don’t have time at the moment to dig into which of those things are causing it.

If you wanted to dig into it, you could see what (it looks like you uploaded with twine?) twine is sending on the wire, since Warehouse doesn’t introspect the package currently, it’s trusting data that is sent in a HTML form alongside the file by twine. I believe that happens here, if you wanted to drop some debugging code in there to see what is being sent.

If your uploader is sending a form field with the name name and the value pkg_resources and Warehouse isn’t updating the display name to pkg_resources then that’s a bug in Warehouse.

Looks like it’s twine that’s doing it. From twine/package.py:

def _safe_name(name: str) -> str:
    """Convert an arbitrary string to a standard distribution name.

    Any runs of non-alphanumeric/. characters are replaced with a single '-'.

    Copied from pkg_resources.safe_name for compatibility with warehouse.
    See https://github.com/pypa/twine/issues/743.
    """
    return re.sub("[^A-Za-z0-9.]+", "-", name)

So this is probably more appropriate for Amending PEP 427 (and PEP 625) on package normalization rules

@admins, is it possible to move the thread from Stop Allowing deleting things from PyPI? - #79 above in this topic over to that topic?

Yes and someone has done it!

1 Like