I’m proposing creating and distributing Software Bill of Materials documents alongside Python releases. This is not a change to CPython itself, however would have an impact on core developers that are updating these dependencies and would add more visibility to dependencies used (and thus vulnerabilities) in CPython artifacts (such as expat, libb2, etc) which may lead to more demands from consumers to provide security releases.
There are some things that can be done to alleviate these issues such as providing Vulnerability Exchange (VEX) documents when there are false-positives/not-vulnerable situations in dependencies and creating working relationships with our dependencies to get information about vulnerabilities ahead of publication.
I’ve created a GitHub issue for tracking the implementation itself. I wanted to create this Discourse topic to discuss reservations and answer questions (if any) about my proposal.
I’m happy to make all the changes required to implement this proposal. I’m also happy to be the reviewer for all SBOM related PRs while I’m the Security Developer-in-Residence.
Great question, it would be an SBOM for each artifact but I consider three distinct categories of artifact (source, Windows installer, and macOS installer). The Windows installers might have more complexity than the others, I’ll need to work with @steve.dower to confirm the details there. For the two types of source artifacts we provide (.tgz and .tar.xz) the only difference would be the name of the top-level element (ie tarball name).
The “SBOM” that I’ve proposed via PR here only covers the dependencies checked in to the python/cpython repository and will be used as a base for the actual SBOMs that would be distributed.
Haven’t looked at the details yet, but I’m in favour of this.
I’m especially in favour of using whatever metadata we have to indicate that known vulnerabilities are not impactful. That’s a massive issue (specifically, people not understanding that “known vulnerability” doesn’t necessarily mean an actual problem or risk), and it would be great for us to lead in showing that there’s a way to handle that.
One SBOM per artifact seems like a good level, even if there’s some redundancy (e.g. I expect it’d be generated once-per-platform during the Windows build and used for all the different packages for that platform, even though not all the packages include all the files). I don’t think we need to include them in the packages themselves, but can just publish them at predictable URLs. But I’m not quite sure where the tooling has landed around this yet - certainly nothing I’ve encountered expects to find them in the actual install files.
Thank you for working on this. The overview looks really nice …
… but how can we add the probably most important OS to this scheme, which is Linux ?
Now, I know that we’re not responsible for the SBOMs of Linux distribution provider builds of Python, but since many PyPI wheels are built against the manylinux images, it would be useful to include SBOMs from those / for those images to the picture.
And perhaps even provide entry points for the SBOMs of the generated wheels themselves.
This is something I’m figuring out how to do right now, I agree we’d want to have this when we start publishing SBOMs (though I suspect uptake of SBOMs themselves might be slow, so maybe we have some time in-between?)
I know that specifically Grype has support for ingesting VEX statements (other tools may too) in order to show true affectedness of vulnerabilities for components in an SBOM and they want to make VEX statement usage automatic rather than manual to increase usage. I posted my proposed architecture and that seemed to have a positive response, so I am hopeful we’ll be able to ship with this at least for some tooling?
There is guidance for SBOM naming, this guide was put together by the OpenSSF SBOM Everywhere SIG (and I was a contributor) which applies to releases which are a “flat” lists of artifacts (like the ones on Download Python | Python.org).
I’ve heard a few folks asking about whether we could add Linux builds, I am not opposed to it so if the team adds support I’ll happily add SBOMs for those binaries
I haven’t spent a ton of time thinking about manylinux images yet so if there’s insights here that you have I would be happy to hear them! I definitely want tooling to be able to gather “SBOM” metadata from manylinux’s environment so it can be added to packages which compile with/bundle dependencies from those images.
They have a Python tool for generating the SBOMs on, what appears to be, a per package basis. The page also refers to a git based notary service, but I’m deep enough into all this to be able tell, whether we need something like that as well.
One issue I can see with going down this rabbit hole is rather frequent changes to SBOMs of those images (eg. due to security fixes), so I guess there’s a versioning challenge to be solved (possibly using git hashed).
The initial SBOM for vendored dependencies in the CPython source code along with tooling to ensure the SBOM is kept up-to-date with changes to dependencies has been merged (thanks @hugovk for reviews!)
I’ve created several other tracking issues which may be of interest to folks:
Looking towards future information sharing, would it make sense to add either an informational PEP or specific pages in the dev guide that specifically explain the “Why?” of adding the SBOM artifacts?
This feels like something worth mentioning in the 3.13 What’s New doc given the regulatory interest in SBOMs, and while the GH issue and this thread are plenty for explaining the change to core devs, it doesn’t take long for the full set of changes to end up spreading out over multiple issues and PRs, potentially leaving folks that want to know more at release time getting lost in the practical details of implementing the change.
With the help of @hugovk I’ve published a new resource to the CPython Developer Guide on Software Bill-of-Materials and the tooling around it with processes to add, update, and remove dependencies in CPython’s source tree.
I’ve done some thinking and I believe that Software Bill-of-Materials documents could be created for release streams of CPython that already exist and made public in CPython patch releases, here is my thinking:
Software Bill-of-Materials isn’t a CPython feature, it’s more analogous to the Sigstore signatures of CPython, being only an artifact that’s “released” alongside the actual CPython releases. Sigstore signatures were introduced in a patch version of CPython with separate announcements and documentation outside of the CPython release and docs.
Releasing only a single SBOM for the latest version of CPython would vastly limit its utility as we know that many users will be using older CPython versions for some time. Given we have the ability to create these artifacts it makes sense to do so for those users.
Security releases are just as likely to receive updates to bundled dependencies due to vulnerabilities and users would benefit from knowing when to upgrade based on that information (and when they don’t need to upgrade due to a dependency that has a CVE, but isn’t vulnerable in our usage). That is the primary use-case that SBOMs with VEX so we want to encourage users to not need to ask for security releases that are unnecessary.
Release tooling needs to be able to handle more than one release stream anyways, having that running live instead of hypothetically will help with integration.
It’s likely that everything will be in place to deliver a Software Bill-of-Materials for CPython in the first quarter of 2024. Having to wait artificially until October for 3.13.0 to be released doesn’t feel necessary for this sort of development.
Thus my proposal is to backport the existing Software Bill-of-Materials infrastructure and tooling that currently exists in the CPython repository to security-fix branches. I’ll run this proposal by release managers.
How much needs to be in the CPython repository, vs say the release tools repository? It seems like the tooling could probably live outside of the repo (though it would obviously need to be given a commit from the main repository to do its job).
Do you mean “expected” as in we think they should? Or as in we think they’re going to need an SBOM and will reuse our (Seth’s) work since we seem to be ahead of the pack for open-source projects?
Given SBOMs are basically the expected way for products to comply with both US and EU (proposed) rules, I’d expect everyone is going to want them at some point. If we’re not going to take the legal burden on ourselves (and I’d argue that we shouldn’t), then we definitely want third-party redistributors to use SBOM tooling.