Hi everyone,
I would like to propose a small amendment to the PEP 610 direct_url.json
data structure to allow multiple hashes of different algorithms in the archive_info
dictionary.
The text below, in PEP format, explains the why and how.
Looking forward to reading your feedback.
Abstract
This PEP amends :pep:610
direct_url.json
data structure to allow multiple hashes of different algorithms in the archive_info
dictionary.
Motivation
When :pep:610
was written, it allowed only one hash in the archive_info
dictionary. This is a limitation of the original design which was influenced by the :pep:503
and :pep:440
URLs examples.
The direct_url.json
data structure is proving useful beyond its original intent, as a generic abstract representation of a source URL in the Python packaging ecosystem. In particular it is used in the pip inspect
and pip install --report
formats.
Users of pip inspect
and pip install --report
have suggested it would be useful to report multiple hashes if available. This feature is also useful in the original context of :pep:610
.
So, to avoid loosing information about multiple hashes of different types and allow for more flexibility, this PEP amends the direct_url.json
format to allow multiple hashes in the archive_info
dictionary.
Rationale
The Specification below extends the original in a backward-compatible manner, with a long-term goal of phasing out the single hash
key.
The specification takes inspiration from the corresponding section of :pep:691
and is compatible with it.
Specification
A new, optional, hashes
key is added to the archive_info
dictionary. It is a dictionary mapping a hash name to a hex encoded digest of the file. Multiple hashes can be included, and it is up to the consumer to decide what to do with multiple hashes (it may validate all of them or a subset of them, or nothing at all). These hash names
SHOULD always be normalized to be lowercase.
The hashes
key SHOULD be present, and it is recommended that at least one secure, guaranteed-to-be-available hash is included.
Any hash algorithm available via hashlib
(specifically any that can be passed to hashlib.new()
and do not require additional parameters) can be used as a key for the hashes dictionary. At least one secure algorithm from hashlib.algorithms_guaranteed
SHOULD always be included. At the time of this PEP, sha256
specifically is recommended.
When both the hash
and hashes
keys are present, the hash represented in the hash
key MUST also be present in the hashes
dictionary, so consumers can consider the hashes
key only if it is present, and fall back to hash
otherwise.
Backwards Compatibility
This Specification is backwards compatible with the original :pep:610
specification.
This Specification prepares for a long-term goal of abandoning the hash
key.
Therefore, producers of the data structure SHOULD emit the hashes
key whether one or multiple hashes are available. Producers SHOULD continue to emit the hash
key in contexts where they did so before, so as to keep backwards compatibility for existing clients.
New implementations MUST emit the hashes
key whenever they want to record hashe(s) and MAY choose to not emit the hash
key.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.