I’m proposing creating and distributing Software Bill of Materials documents alongside Python releases. This is not a change to CPython itself, however would have an impact on core developers that are updating these dependencies and would add more visibility to dependencies used (and thus vulnerabilities) in CPython artifacts (such as expat, libb2, etc) which may lead to more demands from consumers to provide security releases.
There are some things that can be done to alleviate these issues such as providing Vulnerability Exchange (VEX) documents when there are false-positives/not-vulnerable situations in dependencies and creating working relationships with our dependencies to get information about vulnerabilities ahead of publication.
I’ve created a GitHub issue for tracking the implementation itself. I wanted to create this Discourse topic to discuss reservations and answer questions (if any) about my proposal.
I’m happy to make all the changes required to implement this proposal. I’m also happy to be the reviewer for all SBOM related PRs while I’m the Security Developer-in-Residence.
Great question, it would be an SBOM for each artifact but I consider three distinct categories of artifact (source, Windows installer, and macOS installer). The Windows installers might have more complexity than the others, I’ll need to work with @steve.dower to confirm the details there. For the two types of source artifacts we provide (.tgz and .tar.xz) the only difference would be the name of the top-level element (ie tarball name).
The “SBOM” that I’ve proposed via PR here only covers the dependencies checked in to the python/cpython repository and will be used as a base for the actual SBOMs that would be distributed.
Haven’t looked at the details yet, but I’m in favour of this.
I’m especially in favour of using whatever metadata we have to indicate that known vulnerabilities are not impactful. That’s a massive issue (specifically, people not understanding that “known vulnerability” doesn’t necessarily mean an actual problem or risk), and it would be great for us to lead in showing that there’s a way to handle that.
One SBOM per artifact seems like a good level, even if there’s some redundancy (e.g. I expect it’d be generated once-per-platform during the Windows build and used for all the different packages for that platform, even though not all the packages include all the files). I don’t think we need to include them in the packages themselves, but can just publish them at predictable URLs. But I’m not quite sure where the tooling has landed around this yet - certainly nothing I’ve encountered expects to find them in the actual install files.
Thank you for working on this. The overview looks really nice …
… but how can we add the probably most important OS to this scheme, which is Linux ?
Now, I know that we’re not responsible for the SBOMs of Linux distribution provider builds of Python, but since many PyPI wheels are built against the manylinux images, it would be useful to include SBOMs from those / for those images to the picture.
And perhaps even provide entry points for the SBOMs of the generated wheels themselves.
This is something I’m figuring out how to do right now, I agree we’d want to have this when we start publishing SBOMs (though I suspect uptake of SBOMs themselves might be slow, so maybe we have some time in-between?)
I know that specifically Grype has support for ingesting VEX statements (other tools may too) in order to show true affectedness of vulnerabilities for components in an SBOM and they want to make VEX statement usage automatic rather than manual to increase usage. I posted my proposed architecture and that seemed to have a positive response, so I am hopeful we’ll be able to ship with this at least for some tooling?
I’ve heard a few folks asking about whether we could add Linux builds, I am not opposed to it so if the team adds support I’ll happily add SBOMs for those binaries
I haven’t spent a ton of time thinking about manylinux images yet so if there’s insights here that you have I would be happy to hear them! I definitely want tooling to be able to gather “SBOM” metadata from manylinux’s environment so it can be added to packages which compile with/bundle dependencies from those images.
They have a Python tool for generating the SBOMs on, what appears to be, a per package basis. The page also refers to a git based notary service, but I’m deep enough into all this to be able tell, whether we need something like that as well.
One issue I can see with going down this rabbit hole is rather frequent changes to SBOMs of those images (eg. due to security fixes), so I guess there’s a versioning challenge to be solved (possibly using git hashed).