Draft PEP: Amendment to the PEP 610 direct_url.json data structure

Hi everyone,

I would like to propose a small amendment to the PEP 610 direct_url.json data structure to allow multiple hashes of different algorithms in the archive_info dictionary.

The text below, in PEP format, explains the why and how.

Looking forward to reading your feedback.


Abstract

This PEP amends :pep:610 direct_url.json data structure to allow multiple hashes of different algorithms in the archive_info dictionary.

Motivation

When :pep:610 was written, it allowed only one hash in the archive_info dictionary. This is a limitation of the original design which was influenced by the :pep:503 and :pep:440 URLs examples.

The direct_url.json data structure is proving useful beyond its original intent, as a generic abstract representation of a source URL in the Python packaging ecosystem. In particular it is used in the pip inspect and pip install --report formats.

Users of pip inspect and pip install --report have suggested it would be useful to report multiple hashes if available. This feature is also useful in the original context of :pep:610.

So, to avoid loosing information about multiple hashes of different types and allow for more flexibility, this PEP amends the direct_url.json format to allow multiple hashes in the archive_info dictionary.

Rationale

The Specification below extends the original in a backward-compatible manner, with a long-term goal of phasing out the single hash key.

The specification takes inspiration from the corresponding section of :pep:691 and is compatible with it.

Specification

A new, optional, hashes key is added to the archive_info dictionary. It is a dictionary mapping a hash name to a hex encoded digest of the file. Multiple hashes can be included, and it is up to the consumer to decide what to do with multiple hashes (it may validate all of them or a subset of them, or nothing at all). These hash names
SHOULD always be normalized to be lowercase.

The hashes key SHOULD be present, and it is recommended that at least one secure, guaranteed-to-be-available hash is included.

Any hash algorithm available via hashlib (specifically any that can be passed to hashlib.new() and do not require additional parameters) can be used as a key for the hashes dictionary. At least one secure algorithm from hashlib.algorithms_guaranteed SHOULD always be included. At the time of this PEP, sha256 specifically is recommended.

When both the hash and hashes keys are present, the hash represented in the hash key MUST also be present in the hashes dictionary, so consumers can consider the hashes key only if it is present, and fall back to hash otherwise.

Backwards Compatibility

This Specification is backwards compatible with the original :pep:610 specification.

This Specification prepares for a long-term goal of abandoning the hash key.

Therefore, producers of the data structure SHOULD emit the hashes key whether one or multiple hashes are available. Producers SHOULD continue to emit the hash key in contexts where they did so before, so as to keep backwards compatibility for existing clients.

New implementations MUST emit the hashes key whenever they want to record hashe(s) and MAY choose to not emit the hash key.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

4 Likes

In the absence of anyone having raised any objections, I’m happy to consider this proposal good to go.

Does anyone feel that it needs to be formally raised as a PEP? If not, I propose that we accept it as a text-only change, as per the standard process. I personally don’t think the change is disruptive enough to warrant a new PEP.

I’ll leave this for a further week, and assuming no objections, we can make the change directly as a PR to the spec.

1 Like

As the person who first asked for this, doing this via a PR is something I’m 100% on board for. :slight_smile:

This has had a week with no objections, so I’m going to formally say this can be implemented via a PR to the spec.

@sbidoul can you please ensure that the PR includes a changelog section in the spec that links back to this thread, so that the approval for the change is easy to locate? And ping me on the PR so that I can approve it there.

3 Likes

The PR is at Add hashes key to the Direct URL data structure by sbidoul · Pull Request #1198 · pypa/packaging.python.org · GitHub.

I took the liberty to extract the data structure specification into a standalone document, so it easier to refer to when used in other contexts than PEP 610.

2 Likes