Source distribution specification clarification

In Source distribution format — Python Packaging User Guide, it says that, “The name and version components of the filename MUST match the values stored in the metadata contained in the file.” A few questions:

  • Which “file” is this, is it PKG-INFO?
  • Where it says that the name must match, is that before or after canonicalisation, i.e. are hyphens permitted, and would that be considered a “match” if the filename has underscores?
  • Does this requirement apply to sdists not conforming to Core Metadata 2.2?

As the filename of a sdist is not standardised, it’s not entirely clear.

It would be good to standardise this. In a practical sense though, I’d say the name must match up to canonicalisation, because pip assumes that. I’d personally prefer if we insisted that the filename used canonical form, but that’s not mandated at the moment. Similarly, the filename version must not have hyphens, because how do you tell where the name ends and the version starts otherwise. But again, tools might try to guess, so maybe it’s OK.

It’s hard to mandate anything about sdists that were created before the relevant documents were written. I’d say that you can probably assume these rules work, and can reasonably choose to reject sdists that you can’t process because they don’t follow these rules, but that’s a pragmatic answer, not a formal ruling.

We’ve tried to start discussions on standardising sdists, but they’ve always got bogged down for one reason or another. Feel free to have another go, if you feel it’s worth it.