Partially dynamic project metadata proposal (pre-PEP)

henryiii · April 17, 2025, 5:44am

Some time ago, I proposed an idea here and at PyCon for a large dynamic-metadata PEP. It was basically three PEPs in one, and one or to of those PEPs could be implemented, at least in part, as a library (which is being worked on). However, there was one portion of that proposal that was fairly well received and requires a PEP to implement, so I wanted to write that up before this year’s PyCon packaging summit. It’s also extremely relevant, as several recent proposals (like PEP 770) and discussions would have been helped by this.

If this remains a good idea, this is the start of what I’d put in the PEP.

Partially dynamic project metadata

Need

In the core metadata specification originally set out in PEP 621 metadata can be specified in three ways. First, it can be listed in the [project] table. This makes it statically inferable, meaning any tool (not just the build backend) can reliably compute the value. Second, a field can be listed in the project.dynamic list, which allows the build backend to compute the value. Finally, a value could be missing from both the project table and the project.dynamic list, in which case the matching metadata is guaranteed to be empty.

This system provided (at least) two benefits to Python packaging. A standard specification that all major backends have now adopted, making teaching much easier; a single tutorial is now sufficient to cover the metadata portion of configuring any backend. Users can now switch from a general purpose backend to a specialized backend without changing their static metadata. Tooling like Schema validation can verify and catch configuration mistakes.

The second benefit is improved support for static tools that read the source files looking for metadata. This is useful for dependency chain analysis, such as creating “used by” and “uses” graphs. It is used for code quality tooling to detect the minimum supported version of Python. It is used by cibuildwheel to automatically avoid building wheels that are not supported. It is not used, however, to avoid wheel builds when the SDist is available; that was addressed by METADATA 2.2, and a Dynamic field in the SDist metadata that lets a tool know if the metadata can change when making a wheel - this is an easy mistake to make due to the similarity of the names.

Due to the rapidly increasing popularity of PEP 621 metadata, support from all major backends, and a rise of backends supporting complex compiled extensions, an issue with the restrictions applied in PEP 621 is becoming more apparent. In PEP 621, the metadata choice is all-or-nothing; metadata must be completely static, or listed in the dynamic field and completely absent from the static definition. For the most common use cases, this is fine; there is little benefit to set the version statically if you are going to override it dynamically. If you are using a custom README processor to filter or modify the README for proper display, it’s not a big deal to have to specify the configuration in a custom tool.* section.

However, there is an entire class of metadata fields where advanced use cases would really benefit from a relaxation of this rule.

Pinning dependency requirements when building the wheel Recent thread (this is more general than that one, and requires explicit opt-in, but would solve that issue)
Generating extra scripts from a build system (proposed for scikit-build-core)
Adding entry points dynamically (validate-pyproject-schema-store could have used this)
Adding dependencies or optional dependencies based on configuration (such as making an all dependency, or reading dependencies from dependency-groups, for example)
Adding classifiers; some backends can compute classifiers from other places and inject them (Poetry)
Adding license files to the wheel based on what libraries are linked in
Adding licenses based on vendored/linked code (setuptools might be able to use this?)
Adding SBom’s when building - PEP 770 had to remove the pyproject.toml field specifically because you want the build tool to add these, so the [project] table setting would be useless, you’d almost never be able to use it.

All of these use cases have a similar feature: they are adding something (possibly a narrower pin for the dependency case). With the exception of the recently added license field, they are lists or tables that need extending.

Today, you can implement these, but it requires providing a completely separate configuration language for the non-extended portion, and static analysis tools lose the ability to detect anything.

For example, let’s say you want to allow the build backend (my-build-backend) to pin to the supported build of numpy. You could do this:

[project]
dynamic = ["dependencies"]

[tool.my-build-backend]
original-dependencies = ["numpy", "packaging"]
pin-to-build-versions = ["numpy"]

As you can see, static tooling no longer can tell that numpy and packaging are runtime dependencies, and most importantly, the build backend had to duplicate the dependency table, making it harder for users to learn and read; the standardized place proposed by PEP 621 and adopted by all major build backends is lost.

PEP 621 includes the following statement:

In an earlier version of this PEP, tools were allowed to extend data for fields. For instance, build back-ends could take the version number and add a local version for when they built the wheel. Tools could also add more trove classifiers for things like the license or supported Python versions.

In the end, though, it was thought better to start out stricter and contemplate loosening how static the data could be considered based on real-world usage.

In this PEP, we are proposing a limited and explicit loosening of that table based on real world needs.

Proposal

Any field that is comprised of a list or a table with arbitrary entries will now be allowed to be present in both the [project] table and the project.dynamic list. If a field is present in both places, then the build backend is allowed to extend the list or table with new entries, but not remove entries, or modify the entries in a way that causes them to be removed. Tables of arrays allow either a new table entry, or extending an existing array. As a special case, the license field, when set to a string SPDX expression, can be extended logically, as well.

The fields that are arrays or tables with arbitrary entries are:

authors, maintainers: New author tables can be added to the list. Existing authors cannot be modified (list of tables with pre-defined keys).
classifiers: Classifiers can be added to the list.
dependencies: New dependencies can be added, including more tightly constrained existing dependencies. Backends are allowed to simplify duplicated items with different constraints as long as it is strictly identical to the original plus the duplicated items.
entry-points: Entry points can be added, to either new or existing groups. Existing entry points cannot be changed or removed.
keywords: Keywords can be added to the list.
license-files: Files can be added to the list.
license (string, special case): The license expression can be extended. An existing license cannot be logically excluded.
optional-dependencies: A new extra or new items can be added to a existing extra.
scripts, gui-scripts: New scripts can be added. Existing ones cannot be changed or removed.
urls: New urls can be added. Existing ones cannot be changed or removed.

As a reminder, this is entirely opt-in, by listing the field in dynamic; without that, the metadata continues to be entirely static. And since the current workaround is to move all the metadata out of the standard field, it may even increase the availability of metadata for static tooling.

A backend SHOULD warn if a field is specified and it does not know how to extend that field, to protect against possible user error. It should be noted, however, that mistakenly adding a field to the dynamic array is not a serious mistake, as it only limits the ability of a static tool to ensure completeness, so it is up to the discretion of the backend if they want to throw an error instead.

Static analysis tools, when detecting a field is both specified and in the project.dynamic array, must assume the field could be extended when the package is built.

The example given before now would look like this:

[project]
dependencies = ["numpy", "packaging"]
dynamic = ["dependencies"]

[tool.my-build-backend]
pin-to-build-versions = ["numpy"]

Further considerations

This does not affect any existing pyproject.toml’s, since this was not allowed before.

henryiii · April 17, 2025, 6:11am

I forgot the most recent use case that made me start thinking about this a week ago or so, but maybe that’s fine because I can put a more detailed example here. I’ve been trying to move pybind11 over to static configuration. I ran into an issue: I’ve got something like this (I don’t have the template plugin implemented yet, because it ideally needs topological sorting from Python 3.9, but I’ve got a plan to add it):

[project]
name = "pybind11"
dynamic = ["version", "optional-dependencies"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "include/pybind11/detail/common.h"
regex = '''(?sx)
\#define \s+ PYBIND11_VERSION_MAJOR \s+ (?P<major>\d+) .*?
\#define \s+ PYBIND11_VERSION_MINOR \s+ (?P<minor>\d+) .*?
\#define \s+ PYBIND11_VERSION_PATCH \s+ (?P<patch>\S+)
'''
result = "{major}.{minor}.{patch}"

[tool.scikit-build.metadata.optional-dependencies]
provider = "scikit_build_core.metadata.template"
needs = ["version"]
result = { global = ["pybind11-global=={version}"] }

This is much better than the old templated setup.py.in we were using (you don’t want to know…), but the problem is this would require that every other optional dependency we ever want to add (most of them just moved to dependency-groups, to be fair) would now have to go inside this plugin, and could not be added to project. Similar issues can occur for multi-part packages that need to pin exact versions in their dependencies array (probably more common that direction, actually), which would require moving the entire dependencies array to some custom location.

Edit: implemented at feat: add template dynamic-metadata plugin and dynamic_metadata_needs by henryiii · Pull Request #1047 · scikit-build/scikit-build-core · GitHub, I’ll also push it to dynamic-metadata in the future.

rgommers · April 17, 2025, 8:35am

Thanks for working on this @henryiii. I am fully supportive of something like this. I can see immediate use cases for about half the things on your list of use cases in projects I work on.

Your proposal here contains the same solution as Paul preferred in that thread (adding the field(s) being extended to dynamic = [...] in pyproject.toml). That thread was way too fast-moving to be able to weigh in, so let me say here that I think it’s indeed the right solution.

I’ll note that this used to be necessary for correctness when building against numpy<1.25. It no longer is for (almost?) all its dependents, thanks to changes in how numpy exports its C API by default. The use case is still very real though - you may want to add or change to PyTorch, because it requires exact dependencies = ["torch==version_at_build_time"] pins for all packages that use its C++ API.

oscarbenjamin · April 17, 2025, 10:31am

At first I was thinking that this is exactly what is needed for e.g. python-flint since it is MIT but the PyPI wheels have LGPL things bundled in. Thinking about it though that bundling isn’t done by the build backend but rather by auditwheel et al.

Is that sort of thing out of scope here?

Does it create a problem somehow if the description in the sdist of what is dynamic and what can be changed by the backend is different from what is actually changed in the wheels on PyPI?

I think it would be nice if the build backend could do the bundling i.e. it seems generally problematic that “repairing” wheels is separate from all things like PEP 517. That is a separate topic though.

pf_moore · April 17, 2025, 11:05am

If a field is not marked as dynamic in a sdist, the spec clearly says “then the value of the field in any wheel built from the sdist MUST match the value in the sdist”. When I wrote PEP 643, I didn’t even consider the possibility of tools reading an existing wheel and amending its metadata.

In principle, I’d say that amending metadata in a wheel that is not marked as dynamic in the sdist is a violation of the spec (as the amended wheel is still ultimately “built from the sdist”, just by a chain of tools rather than by a single tool). In practice, though, I can imagine that such a restriction would be problematic to enforce. Of course, if you need auditwheel to build a usable wheel for a package, I’m not sure what anyone would be doing installing from sdist (which is the only case where it matters whether a sdist’s metadata is static or not) so maybe the issue is actually irrelevant.

At some point I think we’re going to have to properly tackle the whole question of installing from source and static metadata, but now probably isn’t the time…

mgorny · April 17, 2025, 1:10pm

I think it makes sense and it’s pretty clean design.

At first, I had some reservations over the much wider scope than the “pinning dependencies” version, but I guess the alternatives are worse.

My primary use case is that as a downstream packager, I need to look at things like license and dependencies, and obviously having static metadata — and especially in a well-defined, commonly used format — makes my job easier. The scope of this change implies that now I will have to account that the metadata I’m seeing is not “final”, and there could be dynamic additions to it, but I guess it’s just a matter of getting used to it. And in the end, people who needed that would go full-dynamic today (or skip relevant dependencies entirely), so it’s not really making things worse for us.

oscarbenjamin · April 17, 2025, 1:12pm

The metadata is static from the perspective of any build frontend. Unfortunately a naive build frontend produces binaries that can be used locally but must be “repaired” to be suitable for PyPI. The fact that PyPI can distribute binaries at all depends in large part on tools like auditwheel that do this.

Plenty of people still build/install from sdist. It just requires installing the non-Python dependencies and toolchain first. Also every redistributor like conda builds from the sdist but they don’t do the bundling which is only needed for PyPI. From their perspective the metadata is static.

The question is whether static vs dynamic refers to the wheel that pops out of a PEP 517 build or to the relationship between the sdist and the wheels on PyPI. Those two things are not the same because of e.g. auditwheel.

pf_moore · April 17, 2025, 2:17pm

Thanks for the clarification - the point I’d missed was that building from sdist^[1] gives you a wheel that’s suitable for the local machine, but may not be suitable for publishing.

That’s a good way of framing the question - the use case that PEP 643 was intended to address was the former.

In particular when the project needs something like auditwheel to publish binaries. ↩︎

notatallshaw · May 19, 2025, 2:53pm

Back from the packaging summit and I was think about the problem of deduplication. I now think any kind of deduplication can I produce semantic issues.

Taking the example of dependencies, for most Python package installers the order of the dependencies have semantic significance and can result in different packages being installed.

Say your static dependencies are B, A and your dynamic dependencies are C, B, then it could be duplicates multiple ways, e.g. B, A, C, A, C, B, A, B, C. This different ordering affects the order of resolution which can result to entirely different install solutions.

So I’m of the opinion that any kind of deduplication should (edit:) not be allowed, and that the result should only be B, A, C, B.

Does that make sense?

pf_moore · May 19, 2025, 3:09pm

Did you mean that deduplication should be disallowed?

jamestwebber · May 19, 2025, 3:14pm

Sorry, this is in the context of installing a single package with those dependencies? I didn’t realize the ordering of dependencies had an affect on resolution, that seems bad? Is there already some rule about how to resolve this for static dependencies (i.e. do installers impose a consistent ordering up front)?

pf_moore · May 19, 2025, 3:17pm

There are potentailly multiple valid solutions for any given set of requirements. The order of the requirements can influence which one is preferred, but it will never result in an invalid solution.

notatallshaw · May 19, 2025, 3:17pm

Yes I’ve updated my comment.

Yes.

You have to have some way of deciding what dependency, or transative dependency to pick next, allowing user order to be a tie breaker allows users to effectively specify what package is most important to them.

At least pip and uv both intentionally use user order, I don’t know about others.

pf_moore · May 19, 2025, 3:30pm

To be clear though, pip (I don’t know about uv) doesn’t document that as a guaranteed behaviour. It can be used to guide the resolver in edge cases, but it’s not something users should rely on.

That puts your original point in an interesting position - in terms of standardisation, the key point is that existing consumers do rely on order^[1], and so it would be a breaking change in behaviour to allow deduplication, or any operation that could affect the order of items. But there’s currently no standard that requires that order is preserved, it’s just the fact that all metadata is currently viewed, as far as the process of reading it from pyproject.toml and writing it into core metadata fields is concerned, as simple string values with no particular semantic content.

IMO, this means that a proposal for “partially dynamic” metadata needs to be written in terms of simple concatenation of the “static” and “dynamic” parts, without consideration of semantics. This doesn’t just apply to specifier-style data - license expressions are another case where we could get into trouble if we try to semantically simplify the combined data (although I think license expressions have more conventional semantics when it comes to combining terms).

I think it’s just order, rather than the existence of actual duplicate values ↩︎

notatallshaw · May 19, 2025, 3:37pm

It don’t know what you mean by guaranteed means here but pip does document this behavior for users to take advantage of: More on Dependency Resolution - pip documentation v25.1.1

pf_moore · May 19, 2025, 3:47pm

Hmm, I’d forgotten about that - although it only mentions order for user-specified requirements, and we’re talking here about dependencies, which are not user-specified in that sense.

But this is getting off-topic. The main point is that reordering would be a breaking change, whether or not tools document their reliance on it.

notatallshaw · May 19, 2025, 3:58pm

Agreed, you’re correct, I hadn’t fully thought through my example, this specific case wouldn’t be affected by the order of the dependencies.

But tools may take the order of the list in the metadata fields, or pyproject.toml, as significant.

Concatenation should be the standard, new fields with new standards could well define a deduplication process from the beginning, but existing fields can not.

henryiii · May 22, 2025, 6:16am

For the PEP, I think this means you want to allow simplification by the backend. For example:

[project]
dependencies = ["torch"]
dynamic = ["dependencies"]
[tool.simple-backend]
dependencies = ["torch>=1.2"]

Allowing the backend to decide how it wants to apply this would allow it to select the best possible variation (which I assume is ["torch>=1.2"], but ["torch", "torch>=1.2"] and ["torch>=1.2", "torch"] would both be valid options for the backend), while forcing the backend to emit ["torch", "torch>=1.2"] is less flexible and doesn’t represent the user’s wishes better. Remember, the user controls the backend! The point of the rule on static dependencies is allowing tools (like dependency tree detection) to tell this package depends on at least torch. You can’t know what’s in the dynamic part, so you can’t statically compute the exact resolution anyway. We could require that the backend not reorder the static portion, though.

So I’m in favor of a) allowing prepends, appends, insertions, or simplifications by the backend. But b) allowing prepends, appends and c) only allowing appends are also fine, I just think a) is best and allows the most flexibility to do the best thing on the backend while keeping the most important feature of the static list.

It really comes down to the guarantee for static tooling: in a), static tooling can be sure that at least what it sees is present, but no other guaranties (like that it comes before the unknown part) are given.

(The dynamic-metadata package I’m working on is only able to append, by the way, at least it it’s current form)

pf_moore · May 22, 2025, 9:17am

But how can the backend know what the best possible variation is? It could easily depend on the user’s preference. Are we going to end up in a situation where all backends will have to support options to allow the user to control the order? Or will we say to a user who gets an order that doesn’t work for them that they have to switch backends? Neither alternative seems good.

I’d like to see some real examples of when this would matter. It feels like this could be a difficult problem to solve in the abstract, so knowing the real world impact will allow us to make better decisions on what we are willing to declare as unsupported.

notatallshaw · May 22, 2025, 12:45pm

While that simplification technically follows the spec:

Such simplifications aren’t well defined, in general, due to handling of pre-releases (see packaging/776#2848198645 and related discussion in Proposal: Intersect and Disjoint Operations for Python Version Specifiers)
Even that simplification is problematic because Packaging does not implement handling of pre-releases the same for SpecifierSet.filter with or without a specifier (which I have a PR to fix), e.g.

>>> list(SpecifierSet("").filter(["1.0a1"]))
['1.0a1']
>>> list(SpecifierSet("<=2").filter(["1.0a1"]))
[]

And regardless, this simplification necessitates choosing an order if there is another requirement in the middle, e.g. if the static part is ["torch", "pandas"] and the dynamic part is ["torch>=1.2"] then ["torch>=1.2", "pandas"] vs. ["pandas", "torch>=1.2"] can have a significance to the consuming tool.

If the user defined ["torch"] statically and ["torch>=1.2"] dynamically, then ["torch", "torch>=1.2"] is literally what the user requested.

I’m worried that trying to infer the intent of the user, in the spec, will lead to underspecified scenarios and create situations where different backends will produce different requirements, and different Python package installers will interpret them differently, and we will end up with users wondering why seemingly innocuous changes create different installed package.

In anything but the most trivial cases if you’re simplifying or only deduplicating how can you not change the order?

If a’s “simplifications” were to be accepted I would want to see this PEP include:

A well defined algorithm for producing a stable order of deduplication
Rigorously define what “simplify” means to packaging specifiers, including correctly following handling of pre-releases.

Largely I would be strongly in favor of c, unless there was a strictly defined behavior for backends to follow.