Thanks for sharing!
Absolutely!
Would you envision this feeding into or using Sigstore to publish the hashes of the builds?
Yeah we’ve been working with the Sigstore team on this project and that’s one of the options for bringing this into wider use.
Another smaller opportunity might be to provide feedback directly to maintainers when we have high confidence we detect a change not present upstream. There’s definitely a spectrum of short- to long-term applications of this technique and many ways of getting to an end state of guaranteed source metadata for Python packages.
For those projects for which you couldn’t infer the repo, did you try against the sdist for at least that level of reproducibility?
That’s a great idea! I hadn’t thought of it but it’s certainly worth exploring! I had some trouble with sdists early on and dropped them from the prototype but I think sdist packaging processes have a lot of room for improvement in the ecosystem, too.
It seems like you would benefit the most from the core metadata being expanded upon to record source code provenance a bit more (or as a separate file like the direct URL recording spec). Is that a fair assessment?
An in-package indicator as found in pbr.json
, an adaptation of PEP 610, or others like it (e.g. .cargo_vcs_info.json
) is certainly an option. I think my intuition would be that, if present at all, it belongs alongside a more complete record of the steps to reproduce like buildinfo rather than just being included as a lone hint to the package user.
I’d probably favor keeping this data in the API (or even in an entirely separate data store like a transparency log a la sigstore) over changing the internal package format just yet.
Regardless of the mechanics, though, the overall goal would be incorporating this source info into the package metadata such that users are better able to understand the code they use. I’m sure there are many ways of doing so and those with background on Python packaging’s recent history are well-placed to make suggestions (I’d love to hear them!).
If so are you thinking of starting a discussion to submit a PEP?
I don’t think the path forward is clear enough for a PEP just yet but that is the goal. Getting some form of rebuilder integration agreed upon in the community would be a great goal for the next few months!