PEP 376 vs. RPM packaging: Is RECORD necessary?

PEP 376, the current standard, specifies that the RECORD file is a mandatory part of the *.dist-info/ directory.

In Fedora, we pack Python packages in RPM, and we have trouble generating correct RECORD files. When a file changes (e.g. a shebang is adjusted), the hash needs to change. When a file is removed or added, that needs to be reflected in the RECORD. We don’t really have the tooling to reflect file-set changes in a file that’s also part of the set.
It could be possible to build Python-specific tooling for this. It seems the solution would be quite fragile. (Tell people to use python-mv instead of rm to remove files? Run a tool, ensuring it runs as the very last build step and hoping Python can be privileged enough to “own” the last step?) But it could probably be done.

But before we go there, I’d like to ask if RECORD is actually necessary at all.
I know it’s used to uninstall packages. But Using pip (or other PyPA-standard-based tools) to uninstall/update system-installed packages always results in a giant mess. RPM already has its own tooling and file database for this.
I assume it can also be used to verifying integrity of the installation. Again, RPM has its own tools.
Is it useful for something else?

Could the PEP be updated to say e.g.:

The METADATA and INSTALLER files are mandatory. The REQUESTED and RECORD files may be missing.
If the RECORD file is missing, it will not be possible to use tooling to uninstall the package. An alternative way to uninstall the package should be provided.

Also, the PEP includes some scattered info of when the hash of a file can be left out (for .pyc files and RECORD itself), but doesn’t clearly specify when this can/should be the case. It also doesn’t mention the file size can be left empty, but from the examples it looks like it should be left empty whenever the hash is.
What are the rules here?

The PEP doesn’t read like a spec. Would it be a good idea to distill the actual spec out of the document, clarify it and put it under ?

1 Like

Probably, yes.

We’ve been moving things to the specifications page whenever we “touch” that area, like writing a PEP or something else.

I don’t think there’s any reason we can’t actively move things proactively (although idk if we want a PEP saying “hey, this moved” - that’s something for @pf_moore to declare on). :stuck_out_tongue:

As per this section of the PyPA specs page,

The preferred approach to handling corrections and clarifications for all recent interoperability specifications is to designate in the PEP that the actively maintained version of the specification is hosted in the PyPA Specifications section of the user guide

PEP 566 is an example of formally moving the “master location” for the spec to

I think that we probably should do something similar for the wheel spec - write a revised version of the spec that clarifies the formal specification of a wheel file, and support it with a PEP stating that the wheel specification is now maintained at such-and-such a location on, and future changes to the spec would need to be handled as a PEP proposing updates to that document. If nothing else, having a PEP cycle that moves the spec over will also give people a chance to debate any potentially controversial "interpretations"1.

Note: I definitely don’t think we should allow changes to a spec as fundamental as the wheel one without those changes going through a PEP, so we’d need to be careful to make it clear that any future PR changing the spec must be backed by a PEP. That’s covered in the process, but I just want to make the point explicitly.

1 For example, mandating a particular encoding for the RECORD file…

1 Like

I’m +1 on no record on the filesystem for rpm. Which is not the same as record in wheel and comes from an older pep.

You could probably always omit the hashes on disk as well. Haven’t seen a tool that checks.