Brainstorming: Eliminating Dynamic Metadata

I’m spinning this out of PEP 751: now with graphs! to avoid derailing the topic there too much.

Today the Python packaging ecosystem supports dynamic metadata. This in theory can be use with almost any key in pyproject.toml but most commonly causes problems with version where it’s also very commonly used. The reasons people like to use dynamic metadata are not too surprising and on the surface it sounds really convenient!

Unfortunately having metadata be dynamic causes challenges and they are a pretty significant tax to the ecosystem. So in some sense it would be nice to be able to do the same thing, but without actually being dynamic.

So I was thinking maybe we have a discussion here why dynamic metadata is used and what alternatives would be, that are not actually dynamic, or at least are not quite as dynamic.


Challenges:

  • Caching: caching of dynamic metadata is hard, because it’s not known what can invalidate it.

  • Build dependencies needed: in order to retrieve dynamic metadata you first need to install the dependencies necessary to run the code that emits that metadata. This can be a rather expensive step which can dramatically slow down package installation. In particular it punishes systems like uv that want to automatically re-sync the virtualenv constantly. The time spent resolving metadata dynamically for a single package can be slower than all other operations[1].

  • Unstable metadata: dynamic metadata is often taking external state such as git commits or general git status into account, which results in metadata changing under the hood without the installer or resolver knowing about it. Yet that metadata is in fact frozen in the metadata upon installation. This is particularly odd for editable installs where you might have installed the package at version 0.0.1+deadbeef, but you are already multiple commits away from that. In the past that often meant that entrypoints no longer found the package, and you had to run pip install --editable again.


Uses of Dynamic Metadata:

  • version: pretty commonly people try to make version match the latest git tag, the current git revision hash or similar. In part the motivation here is to also just sync up __version__ in a package with the installed metadata.

  • dependencies (and other keys with similar functionality): a motivating example for this I have seen is to make the attribute match a provided requirements.txt file. Sometimes it’s also just set as dynamic because for legacy reasons the requirements are still set in setup.py.

  • readme: this surprised me, but there is a package hatch-fancy-pypi-readme that has some adoption which apparently re-composes the readme from other inputs

  • scripts, gui-scripts, entry-points: these are typically set when used with setuptools (eg: when setup.py is still in use)


Complicating factors:

  • non-installed packages: I don’t know how much of a problem this is still today, but one of the reasons people cannot use the common pattern __version__ = importlib.metadata.version("MyDist") is that they support importing a package without (editable or else) installation.

When dynamic metadata is not an issue:

  • As more and more packages are published as wheels, dynamic metadata becomes less of an issue as the metadata is frozen within the wheel. It’s thus more of an issue for editable installs and sdist installations.

I intentionally don’t want to prime this topic with proposed solutions or alternatives and see if we can do some basic brainstorming here of what a world would look like that does not have dynamic metadata, or greatly cuts down on how much dynamic metadata is permitted.


  1. Motivating example: you can put your version to an attribute like package.__version__ with the setuptools backend. Importing that package on some large applications can take seconds, if that root package in itself decides to import part of the system. Sometimes this involves shelling out to git or other tools to read version information. ↩︎

18 Likes

Wheels have static metadata, sdists (currently) allow metadata to be generated at build time.

If future sdists mandate static metadata, then there will simply emerge a proto-format that gets prebuilt into static sdists. The need for dynamic metadata doesn’t go away simply by declaring it inconvenient.

11 Likes

Focusing narrowly on version info, the motivating use case which was provided was to push each commit to test pypi.

This seems satisfiable with some kind of “version smudge script” which could easily be packaged up for reuse.

That leaves the question of what the un-smudged version should be, since it presumably doesn’t matter for most development purposes.

The problem seems solvable today, with existing tools, but I have the vague sense that it’s missing some of the desired workflow characteristics of SCM-derived versions.

^ This use-case is supported by tools like bump2version, but they’re definitely a little clunkier than using a dynamic value. Basically it just does a find/replace on your code base, based on a config file. Not something I’d use but I’ve encountered it.

I’ll throw in another motivation for a dynamic version, which isn’t common but does come up: for a Rust extension package using maturin, the version is defined in Cargo.toml and the Python extension gets it from there[1]. In this case it’s not all that dynamic but it’s not literally in the file.


  1. I nudged them to declare this in the pyproject.toml so it followed the spec ↩︎

Sdists can have static metadata, and as build backends improve their support, it will become much easier for data to be static unless there’s a specific need for it to be dynamic. The biggest blocker here is with setuptools, where the ability to write plugins means that even if data looks static, it’s hard for the backend to guarantee that it is. But that’s a problem that should get fixed over time - the intention is clearly there to use static data whenever possible.

Assuming that’s the case, we’re left with a situation where metadata is dynamic because the user genuinely needs it to be dynamic. We can’t just wish those cases away, or declare them invalid. We have to understand why the need is there, and find a way of addressing the need using only static data. This may be possible, but it will need proper, focused research and outreach to projects that currently use dynamic metatdata. A discussion here is unlikely to come to any usable conclusions without that sort of research.

I think it’s extremely important to remember here that the project version is not allowed to be dynamic in a sdist. Everyone makes a big deal about setuptools_scm and similar tools for getting the version from VCS, but they simply aren’t an issue for sdists.

The problem with versions only occurs when building from arbitrary source trees. This is a problem, certainly, but it’s arguably more of an issue for project workflow tools than for packaging tools. I’d be more than happy, for example, for someone to publish an installer that only supported installing from formally packaged projects (sdists and wheels).

I think it’s critical that we keep in mind the scenarios we’re concerned about here. Managing arbitrary source trees is a significant concern for workflow tools like uv, poetry, hatch or PDM. But there are plenty of other problems there as well - environment management, locking, editable installs, etc. However, there are plenty of other packaging tools which are not workflow managers, and which don’t have the same issues to deal with.

It’s worth remembering that almost nothing about source trees (as opposed to sdists) is standardised. A source tree can have its own in-tree build backend, that completely ignores pyproject.toml and generates all of the project metadata in completely custom ways (which may or may not be “dynamic” in any sense that might matter here). If we want to try to impose constraints on source trees, we can do so of course, but we should understand that (a) we’re starting from nothing, and (b) we’d have to make rules about what build backends are allowed to do, most of which setuptools will almost certainly violate…

10 Likes

If we want to try to impose constraints on source trees, we can do so of course, but we should understand that (a) we’re starting from nothing, and (b) we’d have to make rules about what build backends are allowed to do, most of which setuptools will almost certainly violate…

And (c) it’s probably going to get ignored anyway since there’s not a common community system these source trees get uploaded into and distributed through (unlike sdists and wheels), so the opportunities for gatekeeping are comparatively limited.

4 Likes

And the goal here is not really to come to conclusions, but to better understand the use cases, and also to better share the general concerns that dynamic metadata has, and why it’s worth thinking about alternatives. Think of it as widening the overton window on that issue :slight_smile:

The list that I presented in the initial post is absolutely not exhaustive for a start.

Things that are not allowed are still possible. You can absolutely today lie about the version that is declared on pypi for the sdist and have a different version be installed after the build. The version that is ultimately passed to setup() is the version that will end up in the final metadata. You can easily validate this today by downloading a setup.py based sdist, and change the version passed to setup() without changing any of the package info. pip will install the sdist, and register it under the new version:

$ pip freeze
boto @ file:///tmp/whl/boto-2.49.0.tar.gz#sha256=c66482f39abf7252044da0a2b93c2de2ac1784ab0ec611540e619c7de8715f35
$ pip list
such file or directory
Package    Version
---------- --------------
boto       2.49.0+changed
pip        24.3.1
setuptools 75.3.0

How often sdists do something dodgy with a version I do not know, but I think focusing on just the source tree builds is a bit of a distraction. For sure lying about the version is not enforced, and it would be hard to. That’s just not how any build process works today as far as I know.

I think specific workflow tools can enforce/encourage particular conventions. That’s how other language ecosystems work (cargo for rust, npm for javascript, …) But I think you’re right, people would likely just ignore any “standards” proposed more globally than that.

Sure, you can ignore any standard you like. But tools will not work as expected if you violate their assumptions (that you’re behaving according to standards).

I thought that if pip builds a sdist and gets a different version in the resulting metadata than it found from the sdist filename, it would fail with an error. I’d happily call the behaviour you demonstrated a bug, and accept a PR to fix it. Certainly if that hacked sdist gets passed to the resolver, it’ll cause no end of problems. See pip/src/pip/_internal/resolution/resolvelib/candidates.py at main · pypa/pip · GitHub

2 Likes

And the goal here is not really to come to conclusions, but to better understand the use cases, and also to better share the general concerns that dynamic metadata has, and why it’s worth thinking about alternatives. Think of it as widening the overton window on that issue :slight_smile:

The primary use case I have is this: I work on projects which experience a lot of parallel development and change approval. There are specific people who get to decide the point at which a release of a project is created, and what particular state in the version control system represents that release as well as what the associated version number for that release should be. Versioning decisions are separate from the file tree within the VCS, and are applied with tags that result in release artifact creation from the VCS file tree states the tags indicate.

Basically, any solution that decrees version numbers should be stored in the file tree of the VCS for these projects is a non-starter. For that matter, any Python packaging standards which mandate specific content in those projects’ VCSes seems overreaching and likely to be ignored entirely.

5 Likes

So you do not store any version information in your pyproject.toml or anywhere else? It’s all derived from dynamic discovery from the current repository checkout? How would you deal with that problem if you were to be using npm or something similar which do not permit dynamic versioning?

I would naively have assumed that a workflow-tool build --version=1.2.3 which automatically freezes the right version into pyproject.toml and the manifest would be a suitable solution for most version needs for publishing.

So you do not store any version information in your pyproject.toml or anywhere else? It’s all derived from dynamic discovery from the current repository checkout?

Correct.

How would you deal with that problem if you were to be using npm or something similar which do not permit dynamic versioning?

A release file tree is generated from the VCS file tree with appropriate edits applied, and that is archived (e.g. as a an OpenPGP-signed tarball) and published to the usual locations.

I would naively have assumed that a workflow-tool build --version=1.2.3 which automatically freezes the right version into pyproject.toml and the manifest would be a suitable solution for most version needs for publishing.

Sure, tha could work, but the resulting pyproject.toml wouldn’t be committed to the VCS, only used ephemerally.

2 Likes

Just wanted to post a real world issue that because PyPI allows uploading wheels for the same version that have different dependencies, this causes lots of problems for package managers like Poetry that use the JSON API and assume that all dependencies for a given version of a package are the same.

So this problem can even affect users who don’t consume sdists.

I think uv and PDM make a similar assumption and would struggle with this, but I haven’t tested this particular issue with those tools.

An example of a popular package that still suffers from this issue is Open3D (11k GitHub stars)

As a user of Python packaging, it would be nice if this was no longer a possible failure mode.

3 Likes

Just wanted to point our that this is entirely cache-able, and is a one-off cost. The fact that there are relatively few build backends out there also means that the cache is likely to be shareable between many packages.

To be blunt, that’s a bug in those package managers as the standards explicitly allow this behaviour. The last time I attempted to suggest that this assumption be made into a rule, there was serious pushback, at least in part from the scientific community where I got the impression varying metadata is common because not all reasons for needing different dependencies could be expressed via markers.

I understand the reasons for this assumption, and it’s an expedient and practical simplification. But it doesn’t always hold, as you have found out.

8 Likes

In this situation I would consider that the problem is incorrectly recording the metadata as static on installation. By definition version should be “dynamic” for an editable install since the whole point is that you are going to switch around between different versions without reinstalling the package.

3 Likes

There is no alternative to recording the metadata statically as otherwise, how do you make the metadata available to the importlib metadata infrastructure? The only alternative is that when you call importlib.metadata.version("Foo") that it should locate the setup.py/pyproject.toml and invoke the build system then and there in a new virtualenv to generate the version on the spot.

You can’t. This is a good indication that this metadata shouldn’t be available for editable installs, or at least that there should be a marker that it is an editable install and that any metadata could have changed since recorded.

2 Likes

Currently there is no way to record the metadata for an editable install as dynamic but we are discussing how to change things so that everything works better and the premise of this thread is that dynamic metadata should be eliminated. My point is that in the case of an editable install the version is fundamentally dynamic and we can’t just wish that away: this is literally the purpose of an editable install.

The problem you highlight is that tools don’t support dynamic metadata for editable installs and then incorrectly treat the metadata as static after installation. There is no proper solution to this problem that somehow eliminates the need for dynamic metadata.

Instead of “Eliminating Dynamic Metadata” (removing support for dynamic metadata now, would be a huge breaking change) does anyone have suggestions for “Improving support for static metadata”?

1 Like