PEP 710 - Recording the provenance of installed packages

Pinging this thread again since there hasn’t been any pushback on the suggested changes, if we get those incorporated and then push the PEP draft to be review-ready does that seem like a good plan? @fridex if you’d like me to add the updates to the PEP I am happy to, let me know.

There is one pending PR I need to revisit (will do ASAP). If you have time, feel free to contribute - I’m more than happy if people want to get involved.

Also, if there are any other comments to be incorporated, please feel free to raise them (also others).

My point here is not to re-use PEP 610, but rather the direct url data structure specification, which is a standalone page, that is independent of PEP 610.

For instance the parts about user:password and hashes could be dropped and the specification part of the PEP reduced to something like “a direct URL data structure, restricted to Archive URLs”.

I think I insist on this partly because your prototype implementation in pip reuses the direct_url implementation, and if we intent a common implementation to be used, then there are benefits to having a single spec, to make sure spec and code will stay in sync if and when the spec evolves, or avoid problems when we change PEP 610 related implementation details that would happen not to be compatible with PEP 710.

Alternatively we may consider that reusing the direct URL data structure is overkill for PEP 710, because all we need here is an archive URL and hashes. And have a completely independent code base for PEP 610 and PEP 710. In that case, it may be interesting to depart more radically from the existing data structure, to simplify and avoid potential confusion. For instance, the archive_info field may not be necessary here as it was needed in PEP 610 to discriminate various kind of URLs (archives, vcs, local directories), which is a requirement that is not present in PEP 710 that needs to supports only one kind of URL.

@sbidoul I noted this in the GitHub PR as well, but if we’re going to reference the Direct URL data structure and we want an additional field index_url it may require updating the Direct URL data structure to have that field? I believe having index_url is a good idea, it distinguishes between mirrors that are serving their own content versus forwarding along to upstream and captures user intent more clearly.

If Direct URL data structure handlers are resilient to additional new fields I think it may be worth updating the structure so we’re not reinventing more formats? The Direct URL archive type meets almost all of the needs of this PEP already.

+1 on including the index_url field, seems like a valuable addition.

The prototype reused the Direct URL implementation in pip, nevertheless, I’m not sure whether we should be influenced by this implementation detail. We could diverge from the Direct URL data structure if it makes sense.