PEP 770: Improving measurability of Python packages with Software Bill-of-Materials

A [distribution.metadata-files] table sounds great! I was trying to maintain the relationship between TOML fields and metadata in an artifact, and a top level [distribution] table is indeed much better.

1 Like

Any reason to not have a tool directory whose own subdirectories are scoped to the build tool, e.g. .dist-info/tool/hatchling? Then you can put whatever you want in there?

My concern with that name is the files have nothing to do with core metadata. Granted, most users will probably not make that connection, but it’s where my brain went. If we wanted to stick with the .dist-info connection then distribution.info-files also makes sense, as well as distribution.info or even distribution.info.files if you really want to scope to file-related keys because “Namespaces are one honking great idea”.

The reason would be to allow users distributing whatever auxiliary package metadata they want using any backend.

I think the latter assessment is correct and is the more important consideration. Everything under the .dist-info directory is, necessarily and technically, metadata.

edit: I also think that plays nicely with “core metadata” because if you remove “core” and ask what that means, in my view the answer would be everything in that directory.

edit 2: That is actually stated explicitly by the wheel spec:

  1. {distribution}-{version}.dist-info/ contains metadata.

I agree, tool names do not belong in directory names like that. It should be possible to move backends without changing installed file locations. Something like extra seems fine.

It’s currently the case and probably a good convention to keep, but it’s not necessary for any technical reason.

As an example of non-metadata that could be put there, one proposal I’ve seen for addressing the problem of where to install header files in a way that they become discoverable (xref problem description) is to use .dist-info/include, so build tools can be taught to look for them in a consistent place. There are probably better solutions for the header problem specifically, but not all other files that need a place to go inside a Python environment and outside the import tree of the package are metadata.

So to keep the options open, best not to use “metadata” as part of the directory name. It’s either superfluous (if everything is metadata by convention anyway) or limiting. I kinda liked [additional-files]; something like [distribution.info], [distribution.info-files] or [dist-info.files] all sounds reasonable as well.

2 Likes

As I mentioned above, this is not a convention but part of the wheel spec:

{distribution}-{version}.dist-info/ contains metadata.

Therefore I think it’s unwise to, for example, ship header files there. There is already a naming convention used by the spec to ship such non-importable files which is {distribution}-{version}.{key} e.g.:

{distribution}-{version}.data/ contains one subdirectory for each non-empty install scheme key not already covered, where the subdirectory name is an index into a dictionary of install paths (e.g. data, scripts, headers, purelib, platlib).

So one could easily add a new known {distribution}-{version}.include directory to the spec.

2 Likes

I also liked the name additional-files, it felt user-focused and that seems important for pyproject.toml.

Most of the naming conversation has been .dist-info-focused so I wanted to note that there’s utility beyond the files being in .dist-info: this information is also useful to tools that are inspecting source distributions for files of a certain category. The reason files are put into .dist-info/{directory} is so that tools beyond pyproject.toml can add more files as needed without having to change the core metadata of a package.

Throwing a few names out there that I think are also fine?

  • metadata-files
  • additional-metadata

My only thought on extra-* is that the term is already used for something else in Python packaging, so I’d like to avoid adding more uses. My aversion to dist-* or distribution-* is that users don’t think about that word when they think about a Python package, IMO?

The spec already says to use {distribution}-{version}.data/headers/ for headers, no? In the bit that you quoted?

Otherwise, I’d standardise on an entrypoint rather than a directory name for things that are going to be discovered at runtime.


Which I guess also raises the question, why not {distribution}-{version}.data/sboms/? Is there a concern about all SBOMs for an environment being installed to the same location?

1 Like

Oh wow, true! @rgommers It appears like what you mentioned has been supported for a long time.

It sounds like the difference between .dist-info/{dir} and .data/{dir} is that the data directory needs to merge all the contents into one directory? If so, I’m concerned about merging all files into one directory if we’re not allowed to perform an automatic transformation on install time, I suspect many projects would have conflicting SBOM names (bom.cdx.json). I’m assuming this is the same reason .data wasn’t chosen for license files which are all named LICENSE @brettcannon and @ksurma?

The same thing applies to headers. It relies on users including their files in a subdirectory in the data directory, which doesn’t seem any more burdensome than any of the other requirements (but as we know, I’m very much “anything goes” about this stuff, so if you wanted to require that SBOMs go in .data/sboms/{package_name}/ then I’d ignore itnot complain :wink: )

My guess on licenses is that license data was previously in .dist-info, and so it continues to be in .dist-info, just moved from METADATA into a separate file.

1 Like

You should read the wheel spec for details, but essentially the directory {distribution}-{version}.data/ in a wheel contains a set of subdirectories - {distribution}-{version}.data/include, for example. The names of these subdirectories are intended to correspond to install scheme keys (see the sysconfig documentation for details, and when installing the wheel, the contents of the directories are copied into the corresponding locations.

I don’t think this is suitable for SBOM data for a few reasons:

  1. There’s no sbom sysconfig path. One could be added for Python 3.14+, but it would be unsupported on older versions of Python, delaying SBOM support until Python 3.14+ is the default version people use.
  2. The .data/{dir} directories have no corresponding location in the installed package. The target location is a shared directory, and as you say this could result in name clashes when different projects use the same SBOM file name.

Conversely, .dist-info is a directory that’s present in both the wheel and the installed distribution, and it is project-specific, so there’s no risk of clashes. It’s reserved for “metadata” - the relevant point in the wheel spec says

{distribution}-{version}.dist-info/ contains metadata.

… but I think it’s reasonable to consider SBOM files as “metadata” in this sense.

The point about include files is a bit of a distraction - I assume the reason @rgommers is suggesting that include files could go in .dist-info is because the situation around the sysconfig include location is a bit of a mess[1], so .data/include isn’t as useful as it could be.


  1. I don’t recall the details and it’s off-topic for this discussion, but I think there’s a bunch of special casing and exceptions that is done for “historical reasons”… ↩︎

4 Likes

I thought we agreed to start defining them separately, and just use sysconfig as the initial set (until we needed to add more)?

Or did I imagine that? I don’t see any updates in the current spec, but I’m sure I remember discussing them.

I’m not aware of a PEP for that (and I’m pretty sure it would need a PEP). I think we’ve briefly discussed improving things in that area (maybe in the “wheel next generation” thread) but nothing concrete.

1 Like

Thanks for clarifying that all, I’ve been slowly incorporating the pieces we’re discussing into the open pull request for adding the new top-level table. I think the only two questions I have are:

  • Is the question of using .dist-info instead of .data settled? If so I can add a section summarizing the above as rationale.
  • What are we naming our shed? :slight_smile:

I think so.

As PEP author it’s your shed to paint, just list reasons as to why you chose the colour.

2 Likes

No, this is misunderstanding the situation - the headers key has existed for a long time but is strongly discouraged, as it may install to global locations like /usr/local/include. It’s not suitable for default usage. Very few packages actually use this; one is pybind11 which has a pybind11[global] variant (rarely seen in use in the wild) which lets the user opt in to installing headers this way. Pybind11 documents the behavior with a “It’s not recommended if”. The explanation I linked to says as much: " This is technically possible with wheels, but recommended against because the install process may clobber system files.".

Anyway, it’s just one example, I did not mean to start a discussion about headers. The point was that there are going to be other needs for installing files that aren’t currently catered for.

This major caveat applies to the data dir in general, it’s possible to use it in wheels, but is mostly useful for wheels that aren’t meant for distribution on PyPI. It is not suitable as a default for major projects like Pybind11, NumPy, PyArrow, etc. because of potentially writing to system locations.

Yep, this (and your whole post was spot on). It may apply to other types files than only SBOMs (other example: what about pkg-config or .cmake files, they have nowhere to go either, and with a bit of handwaving they could even be considered metadata).

3 Likes

Okay folks, last update before I’m without internet for the weekend. I’ve decided to put forward [dist-info.files] as the new top-level table name. I’ve updated all the pull request to reflect this, my justifications below:

  • The name dist-info.files has good values for searchability and “does what it says on the tin”. This will be useful to conceptually map the table and keys in pyproject.toml, the expected behavior of tools, and the registry of reserved dist-info subdirectory names into a cohesive story.
  • Avoiding the word “metadata” means we’re not painted into a corner about what type of files are used in the future, even if the identified use-cases today are all metadata.
  • Avoiding the word “extra” to not overload the existing term for optional dependencies.
  • No ambiguity about the “what”, which was the identified problem with using “additional-*”.

Of all the names I also liked that the table name feels symmetrical for the source distribution lookup case too, for example if a tool is looking for statically defined SBOMs in an sdist they examine pyproject.toml for dist-info.files and the sboms key, this is a similar-looking story to searching in .dist-info and the sboms directory of a built distribution.

Thanks all for the great ideas and discussion, please let me know what you think or if you have reservations about my justifications/choices.

2 Likes

Only one question from me - this leaves open the possibility of [dist-info.something] for subkeys other than files. I can’t immediately think of anything else that might be needed here. So are we adding an unnecessary level? Conversely, does it matter?

I do like the readability of dist-info.files.sboms, so I don’t object to the naming, I’m just curious why you added the extra level. (Or is it simply that you weren’t aware that using a dot did add an extra level?)

I did know that adding a dot added an extra level, this was not wanting to paint “dist-info”, a potentially multi-faceted namespace in the future, into a corner, only being used for a single mechanism (plus like you said, dist-info.files.sboms seemed just fine)

2 Likes

This is my final attempt to advocate for different naming. I don’t consider it bikeshedding because this is meaningful for users.


I still think we should go with a top-level metadata-files.

This option is user-facing so when you say:

To me that doesn’t matter at all for users, exactly because of the reason you mentioned earlier in this thread:

This option is all about the user and the vast majority of users have no idea that wheels ship a directory suffixed by .dist-info. Also from a technical standpoint if we ever choose to add more metadata to source distributions, or invent new types of artifacts that require metadata, then we are stuck with that name there as well.

Please reconsider the naming so we can offer the best possible UX.

1 Like