Updating PEP 376; making RECORD optional in installed .dist-info

Hello,
PEP 376, the current standard, specifies that the RECORD file is a mandatory part of the *.dist-info/ directory.

In Fedora, we pack Python packages in RPM, and we have trouble generating correct RECORD files. When a file changes (e.g. a shebang is adjusted), the hash needs to change. When a file is removed or added, that needs to be reflected in the RECORD. We don’t really have the tooling to reflect file-set changes in a file that’s also part of the set.
It could be possible to build Python-specific tooling for this. It seems the solution would be quite fragile. (Tell people to use python-mv instead of rm to remove files? Run a tool, ensuring it runs as the very last build step and hoping Python can be privileged enough to “own” the last step?) But it could probably be done.

But before we go there, I’d like to ask if RECORD is actually necessary at all.
I know it’s used to uninstall packages. But Using pip (or other PyPA-standard-based tools) to uninstall/update system-installed packages always results in a giant mess. RPM already has its own tooling and file database for this.
I assume it can also be used to verifying integrity of the installation. Again, RPM has its own tools.
Is it useful for something else?

Could the PEP be updated to say e.g.:

The METADATA and INSTALLER files are mandatory. The REQUESTED and RECORD files may be missing.
If the RECORD file is missing, it will not be possible to use tooling to uninstall the package. An alternative way to uninstall the package should be provided.


Also, the PEP includes some scattered info of when the hash of a file can be left out (for .pyc files and RECORD itself), but doesn’t clearly specify when this can/should be the case. It also doesn’t mention the file size can be left empty, but from the examples it looks like it should be left empty whenever the hash is.
What are the rules here?


The PEP doesn’t read like a spec. Would it be a good idea to distill the actual spec out of the document, clarify it and put it under https://www.pypa.io/en/latest/specifications/ ?

1 Like

Probably, yes.

We’ve been moving things to the specifications page whenever we “touch” that area, like writing a PEP or something else.

I don’t think there’s any reason we can’t actively move things proactively (although idk if we want a PEP saying “hey, this moved” - that’s something for @pf_moore to declare on). :stuck_out_tongue:

As per this section of the PyPA specs page,

The preferred approach to handling corrections and clarifications for all recent interoperability specifications is to designate in the PEP that the actively maintained version of the specification is hosted in the PyPA Specifications section of the user guide

PEP 566 is an example of formally moving the “master location” for the spec to https://packaging.python.org/specifications/.

I think that we probably should do something similar for the wheel spec - write a revised version of the spec that clarifies the formal specification of a wheel file, and support it with a PEP stating that the wheel specification is now maintained at such-and-such a location on packaging.python.org, and future changes to the spec would need to be handled as a PEP proposing updates to that document. If nothing else, having a PEP cycle that moves the spec over will also give people a chance to debate any potentially controversial "interpretations"1.

Note: I definitely don’t think we should allow changes to a spec as fundamental as the wheel one without those changes going through a PEP, so we’d need to be careful to make it clear that any future PR changing the spec must be backed by a PEP. That’s covered in the process, but I just want to make the point explicitly.

1 For example, mandating a particular encoding for the RECORD file…

1 Like

I’m +1 on no record on the filesystem for rpm. Which is not the same as record in wheel and comes from an older pep.

You could probably always omit the hashes on disk as well. Haven’t seen a tool that checks.

My proposed update to “Recording installed distributions”, moving from PEP 376 to a specification under PyPA, is now proposed as PEP 627 and a PR for packaging.python.org.

See the proposed spec rendered by GitHub. PEP 627 has rationales of changes.

I might have gone too far with some changes when “distilling” a spec out of PEP 376; I’ll be happy to limit the scope of the changes if something is controversial.

2 Likes

The newest pip writes the REQUESTED file (congrats!), so I’ve added REQUESTED back to the proposal in peps#1549. (Until the PEPs page is updated, see the GitHub render.)

What do you think?

I don’t think it’s necessary. REQUESTED is already an optional file right now, and any installer can already choose to not write it. pip does not always write it either (there’s implication whether it writes this file, but only matters when INSTALLER is pip).

I don’t understand the purpose of the file, then. If some tools write it but some don’t, it’s impossible to know when a project can be automatically cleaned up.
But, this should be discussed separately; I call it out of scope of PEP 627?

Note that PEP 627 now preserves the status quo from PEP 376, and lists the issue in its deferred ideas section.

Installers should only clean up a distribution installed by themselves, so pip’s clean-up logic1 associated with the REQUESTED information only matters if the distribution also writes INSTALLER and set it to the same installer. Since non-pip installers persuambly wouldn’t write the INSTALLER value as pip, whether the installed distribution contains REQUESTED does not matter to pip.

1 Which pip does not actually implement right now, REQUESTED is only the first step toward this feature.

Where is this discussed/specified?

Hmm, I thought PEP 376 says that (it does not); I guess I was over-interpreting. pip does indeed only clean up packages installed by pip though, so I should have said this instead:

pip only cleans up a distribution installed by pip, so its clean-up logic associated with the REQUESTED information only matters if the distribution also writes INSTALLER and set it to pip.

Trying to give some kind of semantics to INSTALLER has mostly happened on various GitHub issues and at the in-person packaging summits rather than being part of the original PEP 376.

Trawling around a bit brought me back to Playing nice with external package managers, which reminded me that there are good reasons pip isn’t entirely strict when it comes to respecting INSTALLER. Anyway, I think we can declare resolving that mess out of scope for Petr’s PEP, and just go for the simple change of making RECORD optional (which I suspect will mostly solve the problem the MANAGED-BY idea was aimed at solving anyway - with RECORD missing, no Python level utility is going to be able to uninstall the project)

Edit to make my view on the proposal more explicit: big +1 from me. It makes life easier for system package managers, and provides a clear and obvious “this is managed externally, so leave it alone” signal to Python level tooling.