PEP 770: Improving measurability of Python packages with Software Bill-of-Materials

Hello all and happy new year! I’m sharing PEP 770 for your review. This PEP specifies how to include Software Bill-of-Materials (SBOM) documents in Python packages both for manual and automated annotation of software included inside package archives. Big thank-you to @brettcannon for sponsoring the PEP and @ksurma for all the work on specifying file globs in PEP 639. This PEP was previously discussed in this thread.

Please take a look: PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials | peps.python.org

9 Likes

Thanks for the PEP!

Overall I think it seems fine, though I feel that the Core Metadata specification is quite over-specified (perhaps to the point where it can’t actually be used?):

  • That file MUST be included in the distribution archive at the specified path relative to the root license directory.

This might mean “root SBOM directory”, which was defined inline a couple of lines earlier (which I missed the first time, so it might be worth a more explicit definition if you’re going to refer to it).

  • That file MUST be installed with the project at that same relative path.

Perhaps rephrase this to “Installers must install this file …” to emphasize that all I need to do is include the file, and not figure out how to make someone else do their job properly. (I assume installer maintainers will recognise that this just means “keep doing what you’re already doing” and not “do work to make sure it extracts properly”. In other words, this is a no-op requirement for everyone involved.)

packaging tools MUST reproduce the directory structure under which the source files are located relative to the project root

This isn’t relevant to core metadata. What we need specified here is that the file path listed in METADATA includes the directory structure that appears in the distribution package. The tools that are going to reproduce that structure need this reminder in the later section about source metadata.

SBOM document contents MUST be UTF-8 encoded JSON

SBOM document contents MUST use an SBOM standard

Why? Seriously, at this point, why do we care? Is PyPI supposed to check that the SBOM referenced by METADATA is following a standard and reject the package from being uploaded?[1] Given we aren’t defining the standard ourselves (or specifying it in METADATA), then it’s really up to the final consumer to figure out what the format is. From a core metadata point of view, we just need the info to tell a consumer where they should be looking. I would strike these two bullet points completely.

The “primary” component being described in included SBOM documents MUST be the Python package.

SBOM documents MUST include metadata for the timestamp when the SBOM document was created.

SBOM documents SHOULD include metadata describing the tool creating the SBOM document.

Again, this is important for being a well-behaved project with regards to someone consuming your SBOMs, but isn’t something we can specify in core metadata. Perhaps these should move to a new Background section covering how your SBOM is likely to be used and how you can play well?

PyPI SHOULD validate that all specified files are present in the distribution archives,

Okay, just got up to this point (skimmed over it on the first read, I guess). I don’t like this at all - let’s leave PyPI out of it, and let people handle SBOM formats themselves. There are endless numbers of tools that can do this same thing for those who care about it, and you can suggest that package builders/SBOM generators should do it, but I think making PyPI the enforcer is overbearing.


I’m skipping over the source metadata section, because I neither use nor care about it. I’m sure it’s fine :slight_smile:

The sdist specification will be updated to reflect that if the Metadata-Version is 2.5 or greater, the sdist MUST contain any SBOM files specified by the Sbom-File field in the PKG-INFO at their respective paths relative to the sdist

The sdist specification doesn’t actually have anywhere for this to go. PKG-INFO for metadata version 2.2 or later just refers to the core metadata spec, and I believe the same is true for the wheel spec. In either case, defining the metadata as “relative to the metadata file (either PKG-INFO or METADATA` depending on context)” should get you what you want without touching the sdist or wheel spec.

It’s also a bit awkwardly worded compared to the later ones - try something like “if the metadata version is 2.5 or later, any Sbom-File fields must only contain relative paths from the metadata file’s directory to an SBOM file included in the package”. This avoids the “must contain SBOM” wording, which is scary for people who don’t currently use SBOMs (at least until they finish reading and parse it all, but they’re already feeling scared :wink: ).

the .dist-info directory MUST contain an sboms subdirectory, which MUST contain the files

Why? “Relative to the metadata file” is enough, and it keeps things simpler if we don’t have one case with an extra subdirectory (I’d expect in most cases everyone’s going to put them in a subdirectory in their sources anyway, so they’ll just end up nested deeper).

What is probably needed here is clarification that SBOM files should be included in RECORD.

There are no backwards compatibility concerns for this PEP.

Well, you’re claiming some directory names. It probably doesn’t need to be stated, but the reason you’re increasing the metadata version is to avoid compatibility concerns - anyone currently using the Sbom-File field for something different will be affected, but they can choose whether to opt into the new meaning. (And if you want to argue that they shouldn’t be using it, then I’ll argue that means we don’t need to change the version :wink: I don’t particularly care who wins that argument)

How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional?

I assume you mean in the source metadata? Probably at this stage you add text along the lines of “build tools may choose to use or ignore the sbom-files specification if requested by the user” and don’t worry about it.

We don’t have conditional core metadata. Once it’s in PKG-INFO or METADATA, it’s basically gotta stay there.


Another question that occurs to me that I don’t recall seeing an answer in the PEP: are SBOMs allowed to differ between separate wheels for the same release (I sure hope so!), and are they allowed to changed between what’s in the sdist and what goes into a wheel?


  1. Okay, just saw that yes, this is your intent. I disagree - more later. ↩︎

1 Like

Thanks for producing this!

One immediate question came to mind. Why do we need a metadata field for this? Could we not simply reserve a directory (say sbom) under the .dist-info directory in the wheel and installed distribution, and say that all files in that directory must be SBOM files relating to the project? The PEP suggests that there are project using that directory already, without a standard existing. If that’s an issue, we could choose another directory name[1].

We could still have a field in the [project] section of pyproject.toml, as that will tell build backends what to put in that directory, but once the project is built, the SBOM data will be identifiable by its mere presence in the distribution.

Regarding conditional SBOM files, presumably a build backend could mark the sbom-files field as dynamic, and use a backend-specific way of specifying conditional files? There’s no need to cover it in the standard, as it’s presumably niche enough that a backend-specific solution would be sufficient.


  1. We really should get around to reserving all dist-info directories for future standards (except maybe a tool one), but that’s a separate matter… ↩︎

1 Like

One other thought. The “How to Teach This” section glosses over how to inform project authors about SBOM data. There are a number of issues there:

  1. Do we need to do anything about project authors with no interest in publishing SBOM data? Is it OK to just say “that’s fine, they can leave it out”?
  2. How do we teach project authors who (maybe reluctantly) are willing to include SBOM data, how to create it?
  3. How do we guide users who need SBOM data, when it comes to asking projects to provide it? I can foresee a lot of “please add SBOM data” issues, or maybe if we’re lucky, PRs, being submitted to projects. Users need to understand that projects are under no obligation to provide that data, and projects need guidance on how to advise people who need the data, if the project isn’t willing to supply it.

To be honest, I think the technical issues around the PEP are trivial compared with the education and mediation aspects. Maybe I’m biased, as I no longer have any involvement in the sort of corporate environments where I assume SBOMs are important. But I think it’s something the PEP should cover in more detail than it currently does.

FWIW, I’m very much in favor of PyPI becoming more opinionated on what it accepts, not less. We have multiple examples of PyPI’s looseness or lack of opinion making features unusable at scale (e.g. PGP signatures being worse than useless) or requiring painfully long deprecation periods to undo once we realize the downsides (e.g. PEP 625 support taking more than two years).

If PyPI permits an endless number of SBOM formats here, this very likely runs the risk of being extremely hard to use for end users across a large enough set of projects and effectively makes this functionality useless.

7 Likes

I don’t see anything here about VEX, while this could be a later addition, it’s worth pointing out that VEX are meant to attach to existing distributions as vulnerabilities are discovered and analysis of them is done.

For those unaware, VEX is a powerful tool for attaching analysis of vulnerabilities to a distribution. This can even can mark that even though you have a vulnerable transitive dependency, the vulnerable code path isn’t even in the built distribution and does not apply.

I don’t see the facilities to utilize this here (the ability to append files and append to metadata), and while it could reasonably be a later addition, it would be easier to take this to my day job that wants SBOMs and VEX that it’s worth participating in this and not continuing to generate our own in parallel to the packages we use if it’s explicitly pointed out as something that has to be added to specification seperately.

These were always an index feature, though (not restricted to PyPI, but also the functionality you’re referring to wasn’t at all useful without index support). If SBOMs were going to be a parallel upload, managed and made available by the index, then sure. But they’re not - they’re just additional content inside a package.

Since PyPI can’t require an SBOM be included, then one possibility is “no SBOM” which is exactly as hard to use as an unknown format would be.

Allowing multiple SBOM formats inside the package is a great way to let people play nicely with whoever is asking for their SBOMs. But adding artificial limitations on what can go into their package - and worse, policing their distribution ability based on it - isn’t going to help here.

Maybe I’m biased because I’m heavily involved in those corporate environments, but we wouldn’t use SBOMs out of a wheel or sdist from PyPI at all. If we care enough about a library that we’re going to ship it to a customer who demands SBOMs, then we’ll likely have forked and built it from source ourselves, generating our own SBOM in the process. That’s the bare minimum responsibility we can take for looking after our customers and not putting the burden directly back on the OSS contributors - by treating open source like it’s open source, not like really cheap software developers.

But that’s irrelevant to my earlier feedback. That was all from the POV of a regular project maintainer who happens to be maintaining a build backend.

3 Likes

Yes. It’s bonus metadata that projects can choose to provide when it applies to them (which is only projects that vendor something, whether that’s in source or compiled in).

But that assumes that SBOMs will only be produced by volunteer open source projects. As you pointed out, our workplace might produce SBOMs since we participate in corporate open source. It also be used internally by a company such as ours as you suggested by adding it to a wheel or such. In all of those cases having a standard at least makes sure that people can share some tooling, not have to guess where to put SBOMs, etc.

But yes, the community will very much need to understand this is purely a corporate security thing and people should only do it if they are motivated and want to, else let people asking for it do it themselves at the point of ingestion.

4 Likes

How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional?

I was going to ask the what about platform specifics question until I saw that it was unanswered. In which case, I can give an example:

PyInstaller links against zlib:

  1. Usually we dynamically link against the system installation [1].
  2. On Windows, we’re forced to statically link.
  3. Very occasionally, people are forced to statically link on Linux platforms that don’t allow dynamic linking [2].

So we’d put zlib in the SBOM but only include the SBOM in cases 2 and 3 (although I wouldn’t lose any sleep if case 3 was too dynamic to be supportable).


  1. We even deliberately boycott auditwheel to prevent it from vendoring it ↩︎

  2. OpenWRT uses sstrip on its system libraries which removes so much debug information that ld can’t link anymore ↩︎

I’m not opposed to the standard - I’m opposed to PyPI being the place where the standard is enforced. I don’t believe this standard needs to be enforced anywhere mechanically, and we can rely on producers and consumers to sort out whether an SBOM meets their needs or not (as you describe).

1 Like

Some minor technicalities:

  1. If we are including an explicit list of SBOM paths, I think we should also include a Content-Type for every file (much like we have for Description) that would indicate the specific file format.
  2. If we are not forcing a specific SBOM standard, then I don’t think we should be enforcing JSON format.
1 Like

As a potential producer of SBOMs I would certainly like tools or services like PyPI to tell me if I’ve screwed something (and/or issue warnings/suggestions based on these SBOMs’ contents).

5 Likes

I agree. Maybe it would make more sense for cibuildwheel and the various repair tools to handle this rather than PyPI though (assuming that is the case you are thinking of). It would be the repair tools that know what libs have been bundled and would presumably fill out the SBOM and then perhaps there could be some CIBW_CHECK_SBOM step to make sure all files are accounted for.

2 Likes

This seems reasonable to me. PyPI doesn’t check manylinux compliance, for example, it relies on publishers using something like auditwheel to make their manylinux wheels, and then the upload API takes their word for it.

That said, we’ll presumably end up with some kind of check-sbom library that handles checking whether the SBOM metadata in an artifact is valid, and once that library exists it would be good if the PEP allowed PyPI to enforce that check on upload. I’d use a “MAY” clause rather than a “SHOULD” or “MUST”, though.

3 Likes

Not every one uses cibuildwheel. If that’s incorporated into auditwheel and delvewheel, then why not? However, a dedicated tool may make better sense.

Edit: Alyssa said it better than me :slight_smile:

It doesn’t, but note that this is mostly due to lack of resources and not because we’re fundamentally opposed to it. There is a feature request to add it here, and this is another example of PyPI’s lack of validation causing issues and confusion for consumers.

4 Likes

Thanks for opening this @sethmlarson!

Overall, I think this is a good idea. My main remarks:

  • I agree with @dustin about wanting to see a more opinionated, rather than less opinionated, PEP here. I agree that stronger opinions make it easier for PyPI to enforce data quality/data invariants, which in turn makes features that add more metadata to Python distributions more valuable/easier for consumers to use.
    • To whit: I think this PEP should at least consider (and possibly reject if you come to the opposite conclusion as me!) specifying a single SBOM format. My (weak) preference would be for having this be CycloneDX format, since it’s what pip-audit already supports and the library/data models for CycloneDX are already mature within Python. This would also eliminate the need for the PEP to specify things like “UTF-8 only and JSON,” since that’s transitively closed by accepting only CycloneDX JSON.
    • As a knock-on to the above: does Python packaging need to support multiple SBOMs per distribution, if we’re only supporting a single format?
  • Concurring with @pf_moore: is the Sbom-File metadata field necessary? My read of the PEP’s language is that SBOMs within installed distributions invariably end up within .dist-info/sboms/, which means that they’re uniquely identified without needing to appear as new metadata fields.
2 Likes

To tack on: I think this PEP would benefit from supplying a non-trivial concrete example, such as PyCA Cryptography:

  1. Both Python (runtime) and Rust/native (build-time) dependencies should appear in the SBOM
  2. PyCA Cryptography statically links to its own build of OpenSSL in the binary wheel case, which results in some nuance (wheels uploaded to PyPI are tied to a specific OpenSSL version, while sdist builds are tied to whatever local version of OpenSSL was built against)

Cases like the above are also likely ones where the maintainers will not want to maintain a hand-written SBOM, but instead have the build backend (e.g. maturin) potentially do it for them.

5 Likes

Thanks everyone for the reviews!

I’ll start off by saying that this PEP isn’t trying to push SBOMs onto any/all projects, this PEP is meant to add a place to record information that primarily the build system (and secondarily, people manually annotating) has about the Python package archive being built so that that build information can be used later for the variety of use-cases detailed in the PEP. I’ve tried to keep the vibe of this being an optional feature throughout the PEP.

Also I’ve read a handful of comments about the wording and structure of the PEP, how I’ve structured this PEP came from PEP 639 which IMO is a similar PEP due to primarily being about specifying file(s) from pyproject.toml, core metadata, etc for a Python package. Will definitely address any confusion in how the PEP is specified, this is my first packaging PEP so whatever makes the doc easier to read and implement is good feedback to give!

Now for specifics:

Is the Sbom-File field necessary?

@pf_moore @woodruffw

I included this field because PEP 639 also includes a field specifying a file. I am likely wrong, but I thought this would be necessary to specify the locations of SBOM files inside of source distributions.

The PEP is over-specifying / under-specifying SBOM format / content

Replying to @steve.dower @dustin @mgorny @woodruffw:

Given the large number of tools for building/repairing package archives before publication I opted to treat SBOMs inside archives as opaque and independent from each other and instead placing the burden of “merging” them together afterwards on consumers. This would avoid tools stepping on each-others toes when attempting to record data into an SBOM.

Given the above, I didn’t see selecting a single standard as critical. I am open to refactoring the PEP to select a single SBOM standard if that’s desirable. I think this would be an important thing to do if there genuinely is a use-case for intermediary tools modifying SBOM documents produced by other tools while a Python package archive is being built. Does such a use-case exist?

If I were forced to select a single SBOM standard in this moment, I would select CycloneDX due to simplicity. SPDX 3 chose to use JSONLD which is not very ergonomic to write by hand.

Separately from the above, I opted to require JSON as many SBOM standards support other formats but by-and-large most producers provide JSON and that would make the work of package indices easier in checking the content of SBOMs for standards that the index understands.

Should PyPI enforce this PEP? How deeply should PyPI inspect SBOM documents?

I’ll defer to PyPI maintainers on this, but I’m in the same vein as @dustin that anything not being checked will make future work tougher when folks try to use the data encoded into SBOM documents.

From experience, SBOM standards have simple markers to detect which standard is in use for a given document, each standard has a handful of required fields, and in order for tools to automatically recognize “what” is being referenced by the SBOM (in our case, the Python package) requires a few fields to be set a certain way. If those three things are set then whatever other data is encoded in the SBOM will get a free-ride to being included correctly for the Python package. I can update the language to “MAY” or make the justification for PyPI checking SBOMs more clear?

Also, I am ++ on having an informational PEP about SBOM data for Python packages, that is already my plan: GitHub - psf/sboms-for-python-packages: Software Bill-of-Materials documents for Python packages

I believe the inclusion of SBOMs into Python packages would enable a package to specify its VEX data stream via an external reference. There are already tracking issues (1, 2) on vulnerability scanning tools automatically detecting and using VEX data streams from an SBOM document this way.

I included an example using Pillow and a forked copy of auditwheel in the references for the PEP, the file is available for download. I want to work on creating a few more examples, Maturin was already on my list for build-backends to build an example with.

Indeed, I’ll fix this copy-paste issue.

3 Likes