Partially dynamic project metadata proposal (pre-PEP)

Some time ago, I proposed an idea here and at PyCon for a large dynamic-metadata PEP. It was basically three PEPs in one, and one or to of those PEPs could be implemented, at least in part, as a library (which is being worked on). However, there was one portion of that proposal that was fairly well received and requires a PEP to implement, so I wanted to write that up before this year’s PyCon packaging summit. It’s also extremely relevant, as several recent proposals (like PEP 770) and discussions would have been helped by this.

If this remains a good idea, this is the start of what I’d put in the PEP.

Partially dynamic project metadata

Need

In the core metadata specification originally set out in PEP 621 metadata can be specified in three ways. First, it can be listed in the [project] table. This makes it statically inferable, meaning any tool (not just the build backend) can reliably compute the value. Second, a field can be listed in the project.dynamic list, which allows the build backend to compute the value. Finally, a value could be missing from both the project table and the project.dynamic list, in which case the matching metadata is guaranteed to be empty.

This system provided (at least) two benefits to Python packaging. A standard specification that all major backends have now adopted, making teaching much easier; a single tutorial is now sufficient to cover the metadata portion of configuring any backend. Users can now switch from a general purpose backend to a specialized backend without changing their static metadata. Tooling like Schema validation can verify and catch configuration mistakes.

The second benefit is improved support for static tools that read the source files looking for metadata. This is useful for dependency chain analysis, such as creating “used by” and “uses” graphs. It is used for code quality tooling to detect the minimum supported version of Python. It is used by cibuildwheel to automatically avoid building wheels that are not supported. It is not used, however, to avoid wheel builds when the SDist is available; that was addressed by METADATA 2.2, and a Dynamic field in the SDist metadata that lets a tool know if the metadata can change when making a wheel - this is an easy mistake to make due to the similarity of the names.

Due to the rapidly increasing popularity of PEP 621 metadata, support from all major backends, and a rise of backends supporting complex compiled extensions, an issue with the restrictions applied in PEP 621 is becoming more apparent. In PEP 621, the metadata choice is all-or-nothing; metadata must be completely static, or listed in the dynamic field and completely absent from the static definition. For the most common use cases, this is fine; there is little benefit to set the version statically if you are going to override it dynamically. If you are using a custom README processor to filter or modify the README for proper display, it’s not a big deal to have to specify the configuration in a custom tool.* section.

However, there is an entire class of metadata fields where advanced use cases would really benefit from a relaxation of this rule.

  • Pinning dependency requirements when building the wheel Recent thread (this is more general than that one, and requires explicit opt-in, but would solve that issue)
  • Generating extra scripts from a build system (proposed for scikit-build-core)
  • Adding entry points dynamically (validate-pyproject-schema-store could have used this)
  • Adding dependencies or optional dependencies based on configuration (such as making an all dependency, or reading dependencies from dependency-groups, for example)
  • Adding classifiers; some backends can compute classifiers from other places and inject them (Poetry)
  • Adding license files to the wheel based on what libraries are linked in
  • Adding licenses based on vendored/linked code (setuptools might be able to use this?)
  • Adding SBom’s when building - PEP 770 had to remove the pyproject.toml field specifically because you want the build tool to add these, so the [project] table setting would be useless, you’d almost never be able to use it.

All of these use cases have a similar feature: they are adding something (possibly a narrower pin for the dependency case). With the exception of the recently added license field, they are lists or tables that need extending.

Today, you can implement these, but it requires providing a completely separate configuration language for the non-extended portion, and static analysis tools lose the ability to detect anything.

For example, let’s say you want to allow the build backend (my-build-backend) to pin to the supported build of numpy. You could do this:

[project]
dynamic = ["dependencies"]

[tool.my-build-backend]
original-dependencies = ["numpy", "packaging"]
pin-to-build-versions = ["numpy"]

As you can see, static tooling no longer can tell that numpy and packaging are runtime dependencies, and most importantly, the build backend had to duplicate the dependency table, making it harder for users to learn and read; the standardized place proposed by PEP 621 and adopted by all major build backends is lost.

PEP 621 includes the following statement:

In an earlier version of this PEP, tools were allowed to extend data for fields. For instance, build back-ends could take the version number and add a local version for when they built the wheel. Tools could also add more trove classifiers for things like the license or supported Python versions.

In the end, though, it was thought better to start out stricter and contemplate loosening how static the data could be considered based on real-world usage.

In this PEP, we are proposing a limited and explicit loosening of that table based on real world needs.

Proposal

Any field that is comprised of a list or a table with arbitrary entries will now be allowed to be present in both the [project] table and the project.dynamic list. If a field is present in both places, then the build backend is allowed to extend the list or table with new entries, but not remove entries, or modify the entries in a way that causes them to be removed. Tables of arrays allow either a new table entry, or extending an existing array. As a special case, the license field, when set to a string SPDX expression, can be extended logically, as well.

The fields that are arrays or tables with arbitrary entries are:

  • authors, maintainers: New author tables can be added to the list. Existing authors cannot be modified (list of tables with pre-defined keys).
  • classifiers: Classifiers can be added to the list.
  • dependencies: New dependencies can be added, including more tightly constrained existing dependencies. Backends are allowed to simplify duplicated items with different constraints as long as it is strictly identical to the original plus the duplicated items.
  • entry-points: Entry points can be added, to either new or existing groups. Existing entry points cannot be changed or removed.
  • keywords: Keywords can be added to the list.
  • license-files: Files can be added to the list.
  • license (string, special case): The license expression can be extended. An existing license cannot be logically excluded.
  • optional-dependencies: A new extra or new items can be added to a existing extra.
  • scripts, gui-scripts: New scripts can be added. Existing ones cannot be changed or removed.
  • urls: New urls can be added. Existing ones cannot be changed or removed.

As a reminder, this is entirely opt-in, by listing the field in dynamic; without that, the metadata continues to be entirely static. And since the current workaround is to move all the metadata out of the standard field, it may even increase the availability of metadata for static tooling.

A backend SHOULD warn if a field is specified and it does not know how to extend that field, to protect against possible user error. It should be noted, however, that mistakenly adding a field to the dynamic array is not a serious mistake, as it only limits the ability of a static tool to ensure completeness, so it is up to the discretion of the backend if they want to throw an error instead.

Static analysis tools, when detecting a field is both specified and in the project.dynamic array, must assume the field could be extended when the package is built.

The example given before now would look like this:

[project]
dependencies = ["numpy", "packaging"]
dynamic = ["dependencies"]

[tool.my-build-backend]
pin-to-build-versions = ["numpy"]

Further considerations

This does not affect any existing pyproject.toml’s, since this was not allowed before.

7 Likes

I forgot the most recent use case that made me start thinking about this a week ago or so, but maybe that’s fine because I can put a more detailed example here. I’ve been trying to move pybind11 over to static configuration. I ran into an issue: I’ve got something like this (I don’t have the template plugin implemented yet, because it ideally needs topological sorting from Python 3.9, but I’ve got a plan to add it):

[project]
name = "pybind11"
dynamic = ["version", "optional-dependencies"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "include/pybind11/detail/common.h"
regex = '''(?sx)
\#define \s+ PYBIND11_VERSION_MAJOR \s+ (?P<major>\d+) .*?
\#define \s+ PYBIND11_VERSION_MINOR \s+ (?P<minor>\d+) .*?
\#define \s+ PYBIND11_VERSION_PATCH \s+ (?P<patch>\S+)
'''
result = "{major}.{minor}.{patch}"

[tool.scikit-build.metadata.optional-dependencies]
provider = "scikit_build_core.metadata.template"
needs = ["version"]
result = { global = ["pybind11-global=={version}"] }

This is much better than the old templated setup.py.in we were using (you don’t want to know…), but the problem is this would require that every other optional dependency we ever want to add (most of them just moved to dependency-groups, to be fair) would now have to go inside this plugin, and could not be added to project. Similar issues can occur for multi-part packages that need to pin exact versions in their dependencies array (probably more common that direction, actually), which would require moving the entire dependencies array to some custom location.

Edit: implemented at feat: add template dynamic-metadata plugin and dynamic_metadata_needs by henryiii · Pull Request #1047 · scikit-build/scikit-build-core · GitHub, I’ll also push it to dynamic-metadata in the future.

6 Likes

Thanks for working on this @henryiii. I am fully supportive of something like this. I can see immediate use cases for about half the things on your list of use cases in projects I work on.

Your proposal here contains the same solution as Paul preferred in that thread (adding the field(s) being extended to dynamic = [...] in pyproject.toml). That thread was way too fast-moving to be able to weigh in, so let me say here that I think it’s indeed the right solution.

I’ll note that this used to be necessary for correctness when building against numpy<1.25. It no longer is for (almost?) all its dependents, thanks to changes in how numpy exports its C API by default. The use case is still very real though - you may want to add or change to PyTorch, because it requires exact dependencies = ["torch==version_at_build_time"] pins for all packages that use its C++ API.

At first I was thinking that this is exactly what is needed for e.g. python-flint since it is MIT but the PyPI wheels have LGPL things bundled in. Thinking about it though that bundling isn’t done by the build backend but rather by auditwheel et al.

Is that sort of thing out of scope here?

Does it create a problem somehow if the description in the sdist of what is dynamic and what can be changed by the backend is different from what is actually changed in the wheels on PyPI?

I think it would be nice if the build backend could do the bundling i.e. it seems generally problematic that “repairing” wheels is separate from all things like PEP 517. That is a separate topic though.

1 Like

If a field is not marked as dynamic in a sdist, the spec clearly says “then the value of the field in any wheel built from the sdist MUST match the value in the sdist”. When I wrote PEP 643, I didn’t even consider the possibility of tools reading an existing wheel and amending its metadata.

In principle, I’d say that amending metadata in a wheel that is not marked as dynamic in the sdist is a violation of the spec (as the amended wheel is still ultimately “built from the sdist”, just by a chain of tools rather than by a single tool). In practice, though, I can imagine that such a restriction would be problematic to enforce. Of course, if you need auditwheel to build a usable wheel for a package, I’m not sure what anyone would be doing installing from sdist (which is the only case where it matters whether a sdist’s metadata is static or not) so maybe the issue is actually irrelevant.

At some point I think we’re going to have to properly tackle the whole question of installing from source and static metadata, but now probably isn’t the time…

I think it makes sense and it’s pretty clean design.

At first, I had some reservations over the much wider scope than the “pinning dependencies” version, but I guess the alternatives are worse.

My primary use case is that as a downstream packager, I need to look at things like license and dependencies, and obviously having static metadata — and especially in a well-defined, commonly used format — makes my job easier. The scope of this change implies that now I will have to account that the metadata I’m seeing is not “final”, and there could be dynamic additions to it, but I guess it’s just a matter of getting used to it. And in the end, people who needed that would go full-dynamic today (or skip relevant dependencies entirely), so it’s not really making things worse for us.

The metadata is static from the perspective of any build frontend. Unfortunately a naive build frontend produces binaries that can be used locally but must be “repaired” to be suitable for PyPI. The fact that PyPI can distribute binaries at all depends in large part on tools like auditwheel that do this.

Plenty of people still build/install from sdist. It just requires installing the non-Python dependencies and toolchain first. Also every redistributor like conda builds from the sdist but they don’t do the bundling which is only needed for PyPI. From their perspective the metadata is static.

The question is whether static vs dynamic refers to the wheel that pops out of a PEP 517 build or to the relationship between the sdist and the wheels on PyPI. Those two things are not the same because of e.g. auditwheel.

5 Likes

Thanks for the clarification - the point I’d missed was that building from sdist[1] gives you a wheel that’s suitable for the local machine, but may not be suitable for publishing.

That’s a good way of framing the question - the use case that PEP 643 was intended to address was the former.


  1. In particular when the project needs something like auditwheel to publish binaries. ↩︎

1 Like