Partly dynamic project metadata (eg. adding dependency constraints based on the build environment)

In certain situations, particularly when working with native code, there is a need to add dependency constraints based on the build environment.

PEP 621 prohibits this, unless the field is marked as dynamic.

Data specified using this PEP is considered canonical. Tools CANNOT remove, add or change data that has been statically specified. Only when a field is marked as dynamic may a tool provide a “new” value.

And marking a field as dynamic, means it can’t be present on the project table.

  • Build back-ends MUST raise an error if the metadata specifies a field statically as well as being listed in dynamic.

This is a bit problematic, and I think it could be argued that in some way, it kind of defeats the purpose of the PEP. The PEP aims to allow metadata, such as dependencies, to be read from a standardized (static) place — pyproject.toml.
This feature is great, and enables tools such as Github’s repository dependency graph to exist. Requiring projects to stop providing its dependency in project.dependencies if some of its binary artifacts happen to depend on the version of one of its build dependencies, for eg., unnecessarily breaks tooling support.

In the rejected ideas section, the PEP shares the motivation:

Allow tools to add/extend data

In an earlier version of this PEP, tools were allowed to extend data for fields. For instance, build back-ends could take the version number and add a local version for when they built the wheel. Tools could also add more trove classifiers for things like the license or supported Python versions.

In the end, though, it was thought better to start out stricter and contemplate loosening how static the data could be considered based on real-world usage.

In accordance to this, I think we should consider loosening the static metadata requirements as follows:

  • Tools MAY add additional Requires-Dist entries to artifact metadata, as long as they FURTHER CONSTRAINT existing entries.
    • Eg. when a Requires-Dist: numpy entry is present, the build-backend may add a Requires-Dist: numpy == 2.* entry to the wheel metadata.
4 Likes

I think it’s clean, focused on a very specific problem and solves it with absolute minimal risk of side effects. I like it.

(read: it shouldn’t break my workflow of looking at pyproject.toml and assuming that if I found dependencies there, that list should generally be complete)

1 Like

How will this affect the metadata in the sdist? Would that be static or dynamic?

It should be static, I think. I cannot see any sensible reason for an sdist to need to use this exemption.

We can update the text to “(..) entries to binary artifact metadata (..)” to explicitly exclude sdists.

1 Like

But the definition of a static field in the metadata is that it must be the same in all wheels built from the sdist. So the sdist metadata can only be static if the build backend can guarantee that it will always add the same values in every wheel (which it can’t, in the example given where it depends on the build environment). But having data that is defined as static in pyproject.toml be dynamic in the sdist seems weird, and will probably break some people’s expectations.

1 Like

Yeah, IMHO this is the biggest problem with the dynamic spec. We should’ve allowed a default to be specified and the dynamic marker to mean it may change between sdist->wheel. Possibly we can special-case Requires-Dist to have dynamic with existing values only allows appending and not removing existing items?

Ultimately though, it just means that backends can provide this feature already if they like. They just have to require (all) dependencies be listed in a tool table and then they will store them in a custom metadata file in the sdist (because apparently we can’t put custom fields in PKG-INFO either) parse pyproject.toml again later and insert them all into the METADATA for the wheel/install.

2 Likes

It’s worth noting that a backend can support

[project]
dynamic = ["dependencies"]

[tool.build_backend]
dependencies = ["a", "b", "c"] # Backend can alter this at build time

and append to that list at build time. Essentially just replace project.dependencies with tool.build_backend.dependencies.

Yes, it’s a little less discoverable, because it’s a backend-specific option, but conversely it avoids complicating the rule “unless a field is marked as dynamic, what’s in the [project] table will be unchanged in all built distribution files” with special cases and exceptions.

So while there may be some value in the proposal, the status quo frankly isn’t particularly bad.

In either case, the behaviour of the sdist metadata also needs to be considered, as I mentioned earlier. If the backend can guarantee that it will always add the same data, the sdist metadata can be static, but (far more likely) if the backend wants to reserve the right to vary the dependencies at wheel build time, it just marks the dependencies as dynamic and we’re good. This does impact performance of resolution algorithms, which will need to invoke the build backend to determine the dependencies for the sdist. How big an issue this would be depends on whether this feature will be used in projects that need to be installed directly from source.

This is a bit too vague as it stands. For example, which of these is a constraint of numpy?

  • numpy >= 2.0a1 - this allows pre-releases, where the original spec didn’t.
  • numpy; sys_platform == "windows" - this removes numpy as a dependency on non-Windows platforms.
  • numpy @ https://some.arbitrary.host/any/file/at/all/numpy.zip - could be anything, as long as it pretends to be numpy.
  • numpy < 2.0, > 3.0 - matches nothing, so is always going to fail.

IMO, getting the semantics right (i.e., both precise and useful) will be tricky, and a potential source of problems in future. At the very least, I’d like to see evidence of a build backend that has successfully implemented this proposal via the tool namespace before we consider standardising it.

2 Likes

There’s two bits of information that I need from a package as a resolver and installer author. As a resolver author, we need to know what the dependencies of this package are in general (the range that a source distribution can support), and this information should be static [1]. As an installer author, there needs to be a way to tell which further constraints we’re having in a built wheel, so we can tell whether to use a cached wheel or to rebuild. While pip is more lenient on the resolver part (no universal locking), from my understanding of the pip caching the installer part also applies to pip.


  1. having to build in the resolver often causes trouble, from major performance regressions to building versions that fail to build but that would have been rejected anyway, especially when build a series of source distributions we’re backtracking through. It also causes problems when we e.g. built again one version of torch and now another version has been eventually selected ↩︎

1 Like

pip doesn’t care about dependency data as recorded in pyproject.toml. We only care about metadata in built artifacts (sdists and wheels). Having static metadata in sdists saves us an extremely costly build step that might just be used in resolving, for multiple sdists that ultimately don’t get built as wheels or installed. Full disclosure - I’m not sure pip includes the optimisation to read sdist metadata yet (it’s quite new, and not yet exposed as via the PyPI API), but I’m sure we will at some point.

We could optimise builds from source trees by reading pyproject.toml, but we don’t right now (and I’m not sure we have any plans to do so). But if we did want to, this proposal would stop us being able to include such an optimisation, as dependency data that’s not marked as “dynamic” could no longer be relied on to be static :slightly_frowning_face:

I was thinking about a slightly different case here: Let’s say we build a source dist foo 1.2.3 into a wheel once and get METADATA with Requires-Dist: numpy>=1,<2. It could now be that we’re in a further constraints situation where the source distribution can be built against numpy>=1,<3, and the wheel becomes either Requires-Dist: numpy>=1,<2 or Requires-Dist: numpy>=2,<3 depending on the build environment (that’s at least how I understood this request in general - please correct me if I’m wrong). In this case, an installer that performs sdist-to-wheel build caching would need to know that the wheel it built for foo is only applicable when installing with numpy 1.x, and it needs to do a separate build for installing into a numpy 2.x venv. (Any design that solves this will likely touch pyproject.toml and PKG-INFO too, but the problem itself should happen without either of those two).

I’m intentionally vague here about how or where this information should be as I don’t want to influence the specific design, I just want to share what I see as design constraints from an implementor’s perspective.

2 Likes

If you’re doing that, you should be reading the dependency metadata from the sdist and checking if it’s static. If it is, you can assume the wheel will have the same metadata. If it isn’t, then all bets are off and the wheel could have anything in it. There’s nothing in this proposal that adds a new “static but can be restricted” status to sdist metadata, so the proposal changes nothing in that regard.

It’s possible you’re using the pyproject.toml to determine the possible values of the wheel metadata. If you are, then this proposal is a clear degradation for you - at the moment, static dependency metadata is reliably going to be exactly the same in all wheels. With this proposal that will no longer be true.

So I don’t know how this proposal will ever be of benefit to you.

Could this be addressed by introducing a new sibling keyword to dynamic that means “the build backend may append to the static list of dependencies”?

This is arguably inelegant, but the narrowest version I can imagine would be

[project] 
dependencies = ["numpy"]
dynamic_dependencies = "extend"

[tool.mybackend]
extend_dependencies = {
  bigEndian = ["numpy>1"],
  littleEndian = ["numpy<=1"],
}

That preserves the clear signal that the data is dynamic, but allows some of it to be discovered by external tools which only see project.toml .

(I intentionally phrased it as “extend” to eliminate the question of what it means to further constrain dependencies.)

I’m not sure I understand the benefit of partially static dependency data over fully dynamic data though. I can maybe imagine a little, but because I don’t quite know what’s wanted, I can’t compare the proposal well against alternatives like the following strawman…

Invert the Problem

Does it work to take the opposite approach, and instead of declaring some portion of otherwise-static data as dynamic, declare some portion of otherwise-dynamic data as static?

Consider the following pyproject fragment:

[project] 
dynamic = ["dependencies"]
dynamic-guarantees.dependencies = {
  includes = ["numpy"],
}

That is, declare dependencies to be dynamic, and add a new table for guarantees which the build backend can be asked to provide. If the backend finds that built wheel metadata does not match the guarantees, it MUST fail the build.

Does this solve the same problem? Or has inverting the static/dynamic relationship lost something important?

1 Like

I think the main benefit is for when the dynamic part is entirely calculated by the backend. For example, starting with numpy as a dependency, and then when the wheel is compiled it gets converted to numpy >= <version used to build the wheel>.

The developer doesn’t actually specify the dynamic transform, other than to say “please pin these/all dependencies at build time”, and so from their POV it’s practically static, and they should get to specify it in the same place as everyone else gets to specify dependencies.

For something more directly chosen, such as the bigEndian/littleEndian example you gave, it’s less surprising that the dependencies are being specified somewhere else (in a tool table). But then, if you’ve got 19 dependencies that never change and 1 that does, why shouldn’t you get to specify that 19 are static and there will also be more [constraints] added at build time?

1 Like

But then, if you’ve got 19 dependencies that never change and 1 that does, why shouldn’t you get to specify that 19 are static and there will also be more [constraints] added at build time?

I’m still confused, which tools or use cases is this intended to benefit?

Installers and resolvers need to know all the dependencies, what tools or use cases is it good enough to know some dependencies?

And for those tools and use cases are they requesting this or is this being proposed in the hopes they would adopt this?

1 Like

The only example that’s been given so far is Github’s repository dependency graph, which apparently reads [project.dependencies]. I guess the “only allowed to constrain existing entries” restriction in the proposal means that partial information is good enough for that usage (as they likely only look at the project name, not the version limits).

I don’t personally find that example particularly compelling, though. And for that matter, I’m still waiting for a well-defined specification of what “constraining existing entries” would actually cover, so some of this is speculation.

1 Like

Hi there :waving_hand:

I’m definitely not qualified to weigh in with an opinion on this, but I wanted to mention a problem that @ngoldbaum and I faced.

We basically had an issue where cffi was specified in the build-system.requires section, but cffi doesn’t yet supporting free-threading builds.
The package itself could be built just fine without cffi, and with very little modifications could support free-threading.
Nathan then solved the problem by adding some logic in the project’s setup.py.

For a project that solely relies on a pyproject.toml file there’s currently no way to specify that some dependencies should be excluded when building against a free-threading build, or likewise that some additional dependencies are required.

I understand you specified that the additional entries should constrain further and not arbitrarily change the dependencies. What would be the blocker for more dynamic requirements such as in this case?

FWIW, there’s a proposed PEP that would also fix this specific issue: PEP 780 – ABI features as environment markers | peps.python.org

2 Likes

Ah yes, good point! But I think maybe we should specialize it with a custom core metadata key, so that it doesn’t break the existing definitions of Requires-Dist and Dynamic, perhaps Requires-Dist-Dynamic.

That said, I am not sure how worth it to increase the complexity.

The existing definition ensures that a dynamic Requires-Dist is not specified statically at all, so the change I proposed wouldn’t have any impact on existing uses. They would continue to start empty and have items appended at wheel build time.

That specifier changes the constraint, it doesn’t strictly further it. We can define the meaning of “FURTHER” in the PEP — the set of versions allowed by the new spec must be a subset of the original one.

This is why in my example I recommended keeping the original spec, and adding a new one to extend it.

As a spec, it also changes the constraint. Only after installing you’d now. You could specify that on top of numpy >= 2, for example, to get around it. Though resolvers/installers might need special handling to be able to resolve such cases.

That is a good point though, it might be worth to mention on the PEP.

Yeah, so there’s no way to further constraint it.

Sure. This is something we have discussed on meson-python, but I am not sure what is the status there. @rgommers would know better.

1 Like