Draft PEP: Recording the source hash of installed distribution

pf_moore · July 6, 2020, 7:45am

You can’t really say that in a PEP though, without defining what “freeze” is. This is where I think this PEP is too closely tied to pip, in current terms.

Things that have no standard meaning at the moment:

Freezing
Requirements files
Hash mode

I could, for example, write a script that introspected my site-packages, read the HASH files, and wrote a file that included the names of everything and a hash for pip. Is that script allowed by the PEP? (Hint: It should be, because you don’t know what I want and can’t mandate that I follow any rules). If it is, why? Because it’s not a “tool”? Because the operation isn’t a “freeze”? Because the hashes were found but I chose not to write them?

I know this feels like nit-picking (and it is!) but insufficiently precise standards can be a real problem for implementors.

I’d suggest that you strip back the scope of this PEP and concentrate solely on something that:

Allows (but doesn’t require) the existence of a HASH file in .dist-info.
States what it will contain, if it exists.

Leave handling of cases where it doesn’t exist, and deciding whether to write it or not, to the individual tools (pip, other installers, etc). That way you don’t have to think about questions like those I raise above.

Some further thoughts:

It’s not actually clear to me whether PEP 376 allows arbitrary files in .dist-info (see what I said above about unclear standards ). If arbitrary files are allowed, pip could just use an implementation-defined HASH file. But that risks clashes with other tools - having a namespacing mechanism for tool-specific files would be better (as would clarifying the intent of PEP 376!)
This PEP suggests recording the hash of the distribution source (where the source is a single file) but it doesn’t record what that file was, or where it came from. pip freeze might not need this information, but other tools might. Has this been considered? Maybe at least the source filename (if not the actual location) would be useful?
We’re getting very much into the area of lock files here (after all, requirement files with hashes are basically a form of lock file) so this discussion should probably be taken into account.