Thanks everyone for the reviews!
I’ll start off by saying that this PEP isn’t trying to push SBOMs onto any/all projects, this PEP is meant to add a place to record information that primarily the build system (and secondarily, people manually annotating) has about the Python package archive being built so that that build information can be used later for the variety of use-cases detailed in the PEP. I’ve tried to keep the vibe of this being an optional feature throughout the PEP.
Also I’ve read a handful of comments about the wording and structure of the PEP, how I’ve structured this PEP came from PEP 639 which IMO is a similar PEP due to primarily being about specifying file(s) from pyproject.toml
, core metadata, etc for a Python package. Will definitely address any confusion in how the PEP is specified, this is my first packaging PEP so whatever makes the doc easier to read and implement is good feedback to give!
Now for specifics:
Is the
Sbom-File
field necessary?
I included this field because PEP 639 also includes a field specifying a file. I am likely wrong, but I thought this would be necessary to specify the locations of SBOM files inside of source distributions.
The PEP is over-specifying / under-specifying SBOM format / content
Replying to @steve.dower @dustin @mgorny @woodruffw:
Given the large number of tools for building/repairing package archives before publication I opted to treat SBOMs inside archives as opaque and independent from each other and instead placing the burden of “merging” them together afterwards on consumers. This would avoid tools stepping on each-others toes when attempting to record data into an SBOM.
Given the above, I didn’t see selecting a single standard as critical. I am open to refactoring the PEP to select a single SBOM standard if that’s desirable. I think this would be an important thing to do if there genuinely is a use-case for intermediary tools modifying SBOM documents produced by other tools while a Python package archive is being built. Does such a use-case exist?
If I were forced to select a single SBOM standard in this moment, I would select CycloneDX due to simplicity. SPDX 3 chose to use JSONLD which is not very ergonomic to write by hand.
Separately from the above, I opted to require JSON as many SBOM standards support other formats but by-and-large most producers provide JSON and that would make the work of package indices easier in checking the content of SBOMs for standards that the index understands.
Should PyPI enforce this PEP? How deeply should PyPI inspect SBOM documents?
I’ll defer to PyPI maintainers on this, but I’m in the same vein as @dustin that anything not being checked will make future work tougher when folks try to use the data encoded into SBOM documents.
From experience, SBOM standards have simple markers to detect which standard is in use for a given document, each standard has a handful of required fields, and in order for tools to automatically recognize “what” is being referenced by the SBOM (in our case, the Python package) requires a few fields to be set a certain way. If those three things are set then whatever other data is encoded in the SBOM will get a free-ride to being included correctly for the Python package. I can update the language to “MAY” or make the justification for PyPI checking SBOMs more clear?
Also, I am ++ on having an informational PEP about SBOM data for Python packages, that is already my plan: GitHub - psf/sboms-for-python-packages: Software Bill-of-Materials documents for Python packages
I believe the inclusion of SBOMs into Python packages would enable a package to specify its VEX data stream via an external reference. There are already tracking issues (1, 2) on vulnerability scanning tools automatically detecting and using VEX data streams from an SBOM document this way.
I included an example using Pillow and a forked copy of auditwheel in the references for the PEP, the file is available for download. I want to work on creating a few more examples, Maturin was already on my list for build-backends to build an example with.
Indeed, I’ll fix this copy-paste issue.