Hi everyone,
I would like to propose a small amendment to the PEP 610 direct_url.json data structure to allow multiple hashes of different algorithms in the archive_info dictionary.
The text below, in PEP format, explains the why and how.
Looking forward to reading your feedback.
Abstract
This PEP amends :pep:610 direct_url.json data structure to allow multiple hashes of different algorithms in the archive_info dictionary.
Motivation
When :pep:610 was written, it allowed only one hash in the archive_info dictionary. This is a limitation of the original design which was influenced by the :pep:503 and :pep:440 URLs examples.
The direct_url.json data structure is proving useful beyond its original intent, as a generic abstract representation of a source URL in the Python packaging ecosystem. In particular it is used in the pip inspect and pip install --report formats.
Users of pip inspect and pip install --report have suggested it would be useful to report multiple hashes if available. This feature is also useful in the original context of :pep:610.
So, to avoid loosing information about multiple hashes of different types and allow for more flexibility, this PEP amends the direct_url.json format to allow multiple hashes in the archive_info dictionary.
Rationale
The Specification below extends the original in a backward-compatible manner, with a long-term goal of phasing out the single hash key.
The specification takes inspiration from the corresponding section of :pep:691 and is compatible with it.
Specification
A new, optional, hashes key is added to the archive_info dictionary. It is a dictionary mapping a hash name to a hex encoded digest of the file. Multiple hashes can be included, and it is up to the consumer to decide what to do with multiple hashes (it may validate all of them or a subset of them, or nothing at all). These hash names
SHOULD always be normalized to be lowercase.
The hashes key SHOULD be present, and it is recommended that at least one secure, guaranteed-to-be-available hash is included.
Any hash algorithm available via hashlib (specifically any that can be passed to hashlib.new() and do not require additional parameters) can be used as a key for the hashes dictionary. At least one secure algorithm from hashlib.algorithms_guaranteed SHOULD always be included. At the time of this PEP, sha256 specifically is recommended.
When both the hash and hashes keys are present, the hash represented in the hash key MUST also be present in the hashes dictionary, so consumers can consider the hashes key only if it is present, and fall back to hash otherwise.
Backwards Compatibility
This Specification is backwards compatible with the original :pep:610 specification.
This Specification prepares for a long-term goal of abandoning the hash key.
Therefore, producers of the data structure SHOULD emit the hashes key whether one or multiple hashes are available. Producers SHOULD continue to emit the hash key in contexts where they did so before, so as to keep backwards compatibility for existing clients.
New implementations MUST emit the hashes key whenever they want to record hashe(s) and MAY choose to not emit the hash key.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.