These two mechanisms could be considered separately but because they both would potentially result in a Metadata-Version rev (one for adding project.sbom-files/Sbom-File and another for reserving the .dist-info/sboms directory) I think it’s best to consider them together?
Thinking in terms of the standards, I don’t see a problem with build backends and/or tools like auditwheel adding files to the sboms directory in a wheel - that’s no different than generating .so files or similar. As long as those SBOMS aren’t listed in the metadata[1] we’re fine.
Looking at it the other way around, though, there’s really no need for metadata at all here. Tools can simply look at what’s in the sboms directory. For source trees, there might be an issue, but given that the build process can add dependencies, is it even useful to look at the SBOM of a bunch of source code? If it is, then maybe we should simply standardise a way to list static sbom files in pyproject.toml, without getting metadata involved. Something like
[additional-files]
sboms = [...]
I’ve put it in an “additional files” section for future expansion, but it could just as easily be a top-level key. Putting it in [project] just unnecessarily (IMO) ties it to the project metadata. (Of course, you could say the same about license files, and maybe we missed a trick not doing the same with that - but we already had license metadata and we wanted the license expression in the metadata in any case, so I don’t think the situations are equivalent).
because that’s when the whole static/dynamic issue arises ↩︎
Yeah any mechanism that lets a user tell a build backend “hey, you’re going to need to copy these files to .dist-info/sboms so they get detected properly” would work fine. I don’t know about the full range of levers available to pull to make that happen, my design followed closely to project.license-files because it was from an approved PEP and the logic is already implemented in some build backends.
Thanks for the detailed example @sethmlarson, that looks good and clear to me.
I think I like this idea. It fits conceptually, and it avoids having to bump the core metadata version, so projects could start using it immediately without a long rollout process. PEP 725 uses a similar approach with a top-level [external] table, and it allowed things to work without changes to Pip or other tools.
We also already had tools putting this stuff in the metadata, so there was precedent. (I could also just be trying to make myself feel better due to not thinking of this for the license PEP before I approved it. )
I personally think a new table is better than a top-level key. My reasoning is I think of any top-level key that isn’t a table as applying to the file itself, not data stored by the file. I personally like starting with a generic table like [additional-files] as I could see that being expanded upon later while having an [SBOM] table for a single SBOM.files key seems like overkill. Obviously this could be too limiting long-term, but I think we also have to balance ease of use here and not drown pyproject.toml with a dozen tables if we can help it (people can end up with enough tables thanks to [tool] today as it is).
@sethmlarson can you think of any need for other keys related to this and thus a generic table like [additional-files] is a bad idea?
I’d be happy with the [additional-files] approach! I’m assuming we’d adopt the same definition of a list of file globs to define the files in keys in this table?
I don’t think there is a need for other keys in this table for SBOMs, in this case we just want the files to end up in a certain place and the SBOMs themselves are the place where the actual information is encoded.
How specific do you want to be in this PEP? I can see two obvious approaches:
[additional-files] is simply a new top-level key, and future standards can use it however they like (or ignore it and invent their own mechanism).
[additional-files] is reserved for use by standards to define data that will end up in the .dist-info directory, specifically. This avoids any need to start worrying about whether it’s going to work for specifying files that will become part of the installed package itself (which is a much more complex question).
Good question. I assumed the keys would also need to be standardized, so that’s a bit of a guard rail so people don’t toss in random files into random places. Is there any reason to not start with the assumption that [additional-files] will only pertain to .dist-info but can be changed later if we find it’s too strict? Or do we even need to define this upfront and we all just have shared expectations for the purpose of the table so we know what we are bringing into the pyrproject.toml spec? Otherwise we can just avoid being general and just have an [sbom] table if people are too worried about planning for the future that might not be.
My personal preference would be for this PEP to specify that [additional-files] is a reserved table, for specifying files that will end up in the .dist-info directory. The only defined subkey at the moment is sboms, all other keys are reserved for use by future PEPs.
I think it’s worth being explicit in the spec, if only so that people have somewhere to refer to rather than having to hunt down this discussion later.
I’m -1 on a [sbom] table - as you already pointed out, we don’t want to start a trend of proliferating top-level tables in pyproject.toml.
This is also my preference, I can create a PR which defines how tools are expected to find SBOMs in wheels and the [additional-files] table. Thanks all for the discussion
Okay, I’ve put together a pull request which captures everything we’ve been discussing:
New registry of reserved subdirectory names under .dist-info (along with backwards compatibility testing of existing subdirectories)
Addition of [additional-files] table to pyproject.toml and the optional sboms key.
Removed the Sbom-File metadata field, added to “Rejected Ideas” with justification.
Because a lot of the above are net-new mechanisms for packaging I would appreciate a thorough review of the language I’m using to specify them, I took my best shot on getting something to look at quickly. Happy to incorporate any feedback!
I didn’t attempt to define future [additional-files] keys very tightly, only that it’s a table for putting files into specific places in the archive based on the key. If we want to define it more rigidly that’s fine, in the current draft I’m leaning on the definition of the sole sboms key.
Anyway, I believe this proposal constitutes a significant scope change and is not aligned with the original objectives.
The initial scope was to add SBOM to packages. However, the proposed changes introduce the addition of arbitrary data to packages, which significantly broadens the scope. This new capability/topic deviates from the original focus and introduces complexities that will progress at a different pace than the initial scope. It appears that this change could be an attempt to shift the focus or slow down the progress by diverting from the original scope.
I strongly oppose this and the proposed changes. If there is a substantial interest in including arbitrary data in packages, a separate PEP should be initiated.
We’d not found a workable way to add SBOM data to packages, precisely because there wasn’t a way to add arbitrary data to packages. The new additional-files key is necessary to enable adding SBOM data. It could have been made into a separate PEP, but then the SBOM PEP would have been delayed waiting for that to be approved before SBOMs could build on it.
If you have a different suggestion for adding SBOMs without the additional-files mechanism, feel free to propose it, but you need to solve the issues that were raised (particularly those around dynamic vs static metadata) with the original approach of putting something in metadata.
The main concern seems to be this. So to be clear: the new approach with [additional-files] will be way faster to roll out. PEP 639 still isn’t usable by most projects today, because we had to wait for support in PyPI, the packaging library, then twine and a release of that, folding packaging changes into pip and a release of that, then build backend updates and releases of those, and then finally regular Python packages can bump to the latest build backend version with support and start adding the PEP 639 metadata. And then still there’s a concern of older versions of tools floating around and not understanding new versions of packages that started using license/license-files and hence containing core metadata 2.4.
With SBOM support as a separate directory and no core metadata version bump, the process is: (1) add build backend support and release that, (2) Python packages update to new version of the build backend they use and can start shipping SBOMs if they want.
From my review comment on the PEP, posting here as requested by @pf_moore & @sethmlarson. (We may want to have this discussion in a different topic, to avoid cluttering the SBOM discussion, so posts may need splitting.)
[additional-files] feels a somewhat out of place to me as a top-level table name (“additional files for what?”).
From Seth’s PR, it seems that this is explicitly meant for files to be included in a distribution package.[1]pyproject.toml, though, contains metadata for use in source trees and during development, rather than purely metadata for use in distribution pacakges. The obvious example beyond the [tool] table is PEP 735’s [dependency-groups], which is explicitly specified to have no impact on or inclusion in source or binary distributions.
Sorry to raise such a question on naming, but it might be nice to have a name for this top-level table which better describes the built/source distribution use-case, rather than the (very!) generic [additional-files]. The best I can come up with is [distribution-files], but this isn’t great!
A
though I’m unclear if this would ever go into PKG-INFO↩︎
This is great, I’ve wanted something like this at work to ship signing metadata! Thoughts:
I’m strongly opposed to a field outside [project]. My preference would be [project.metadata-files].
The wheel builder in Hatchling has an extra-metadataoption that allows one to ship arbitrary files under .dist-info/extra_metadata. Can we please standardize something like that for folks to use? It would correspond to the extra key under the [project.metadata-files] table.
edit: @steve.dower am I hallucinating or did you also mention something in the past about a desire to ship files related to security like signatures?
Can you explain why? The [project] table is (per PEP 621) specifically for storing core metadata, and this is not core metadata. Specifically, I suggested something outside of [project] so that we’d avoid the problems arising from how the dynamic field is defined. Personally (speaking as the person who proposed a new key) I’m strongly opposed to putting it under [project] as that invalidates most of the benefits I saw in having a new key.
Maybe we could call the new table [distribution] with the meaning that this is for data that will be added to any distribution built from this project? I don’t know if you’d want an extra level - something like [distribution.metadata-files.sbom] or if that’s nested too deep (it matches what you’re proposing under [project], but I’m not sure there’s anything else we’d add here beyond metadata files…
Signing data sounds like a great idea for another use of this feature. Let’s make sure whatever we agree on will support that (although it’s important to keep the focus for this PEP on SBOMs, and not get sidetracked too far into defining a general mechanism).