Brainstorming: Eliminating Dynamic Metadata

I think distinguishing between installing, resolving (locking is similar), and building is good to do. I also think it’s important to acknowledge that all of the non-build activities require tools to act as build frontends.
Tools have to support sdists, source trees, and VCS installs.

The degree to which dependency data can be dynamic with undeclared bounds is an issue for locking. Because locking has to support sdists, and sdists can have dependencies which are unknowable – building a wheel may only tell you about the current platform – there’s no way to construct a fully accurate lockfile which works on all platforms with confidence. I believe all lockers today are compromising in this scenario in some way.

I don’t want to see the dynamic platform detection features go away, but I would like to see some way of making sdists and source trees more friendly to the locking scenario.

2 Likes

The obvious answer is by declaring when they are using dynamic platform detection features. But that’s already present - sdists can declare when metadata is dynamic in the PKG-INFO file (and it’s assumed static unless explicitly marked as dynamic) and source trees can do the same in pyproject.toml.

So I’m still unclear about what more people need. I’m not saying they don’t need more than that, I simply don’t understand what the use cases are that the current standards fail to cover. I’ll be honest, the complaints I’m seeing here sound more like people just wish developers wouldn’t use the ability to have dynamic metadata, rather than that current support is lacking. And you can’t legislate away what people want - they’ll simply find more complicated ways of doing what they need to, and complain that “packaging is hard”.

1 Like

I can’t speak for anyone else’s workflows, but my assumption here has always been that lockfiles are only as portable as the metadata used to generate them is. If an sdist does different things on different platforms that isn’t expressible in metadata currently, then the lockfile ends up platform specific.

The actually solvable differences (specific to sdists) I see between what we have and the JS system that Armin’s retrospectively mentioned blog post describes as model are:

  • Pulling that PKG-INFO file outside of the sdist so that you don’t have to download the whole tarball to just read its metadata
  • Having a per-version (rather than per-artifact) metadata file. Perhaps this file could exist when the metadata is uniform across packages and use some placeholder when variable dependencies are in use (again assuming that solving the issue most of the time is better than doing nothing at all)

Neither of which feel like much of a win on the bandwidth over complexity scale.


On an unrelated note, I wonder if this conversation would make any more sense if it was split into separate topics for sdists and source trees? I still can’t tell whose taking issue with what at the moment.

1 Like

PEP 658 covers that. It’s not just for wheel metadata. PyPI hasn’t implemented it for sdists yet, but that’s an implementation issue.

That would be a reasonable thing to propose as a standard if someone wanted to do so. My feeling is that it would be difficult to define precisely (there are a huge number of edge cases to worry about) but if it could be done, that would be fine. Like you say, such a PEP would need to establish that the benefits justify it, though.

1 Like

I don’t really believe in my own proposal of per-version metadata files (installers would still need to support the per-artifact files so more code, more confusion, what’s the point), I was mostly just playing spot the difference.

I’m rather conscious that this is the conversation’s 106th comment yet I still don’t think we’ve found anything realistically doable to improve the current situation.

I’m in a similar boat; I’ve got ideas but I don’t think they’re much good and so I’ve not been raising them.

  1. Declaring the dependency boundary for a package statically, even when the true dependencies are dynamic (so that when inspecting an sdist, a locker knows what dependencies it may request).
  2. Expecting more data to be static in source trees (a pip change) so that when doing a git VCS install, pip would only do a shallow clone (deep clone would be opt-in somehow, e.g. a parameter in the fragment portion).
  3. Adding more markers and marker detection requirements.

I think (1) “works” but is a marginal benefit after a lot of maintainer effort, (2) would probably cause a ton of complaints and only improves one very specific scenario (I’m not clear if it’s just a tool change or requires a spec change?), and (3) just sounds like a lot of work for another marginal gain.


The only improvement idea mentioned thusfar in the thread which I believe in and would like to see is to expand the standards to cover some of the simple “source version from a file” usages.[1] This is a common use case, not tied up in VCS-specifics, which is specified in a different non-standard way for each tool.
The version in these cases is “dynamic” only in a technical way – that it must be sourced using the dynamic metadata feature of pyproject.toml – when the data is in fact static.

I know this idea bears the criticism that users should “just” use importlib.metadata, but I think that’s a failure to meet users where their actual needs are. They want to pull version data from an attribute in a python file, or from a text file in the root of their project named VERSION (which gets used for other things).

This feature would not cause any drastic sea-change, but IMO that’s a good thing.
If someone wants to pursue it, we should probably move to a new thread.


  1. I’m not a big fan of the regex phrasing, but I don’t have anything better to offer at present. ↩︎

2 Likes

I’ll use this opportunity to say something about what I perceive as a tension in programming language development and communities in general; I think it might help people see where “the other side” may be coming from. So this comment will be rather meta.

In programming languages there is an inherent tension between supporting the most modern and best ideas on one hand and stability on the other. Neither approach is inherently wrong, and they are usually good, or at least chosen, in different stages of the language’s lifetime.

For a lack of better names, let’s call the ends of the spectrum “enterprise” and “bleeding edge”.

To take an example from outside Python, I think Java was very modern, cool and fun to use the time it was first developed. But Java has definitely targeted the “enterprise” segment that values stability and backwards compatibility. This comes inherently at a cost of not being the most modern and cool language, and often not being the language of choice for new projects. This is fine and likely the choice the developers of Java made if they consciously realized it’s a trade-off between those two.[1]

That’s also why many languages have managed to stay in many ways more programmer friendly and cool: Either they came later and thus had a more modern starting point; they were more willing to change and thus not become a tool of choice for enterprise uses; or they didn’t even develop to a point where there would be enterprise users wishing for stability.

Python is, in my mind, not quite at the same end of the spectrum as Java. I have at times experienced new Python releases breaking old code, often for a very good reason, like the recent distutils removal. Yes, that is painful too. I find Python often provides a nice balance. Sufficient resistance to breaking things also tends to force people with an itch to scratch to explore solutions that don’t break anything or as much, but I think it has also been demonstrated that Python is willing to break things that are considered problematic enough for the whole (like distutils).

Now, I think advocacy or resistance to (reasonably) proposed ideas can perhaps be divided into a few components that get summed for the final attitude:

  1. The “general appetite for change”, i.e. the position on that spectrum;
  2. The “I don’t want this particular change” component, which comes to play when proposed changes break things you like (technically, or even socially, like “my users would demand type annotations”) and you don’t care about the benefits;
  3. Noise (randomness) caused by communication being inherently difficult especially once a language has developed to a point where different people use it so differently that it’s hard for them to even understand each other.

What I’d love is for people to be conscious about the trade-offs they are advocating for; perhaps it’s helpful to try to think where you personally believe Python stands in this, where you would personally prefer it to stand, and why someone has a different preference.

If people want to know where I stand on the issue of this thread so they don’t need to speculate my agenda based on above, here it is, but hidden because it’s an aside: [2]


  1. And then every language has some users who have been using it forever and assert that every idea developed that the language doesn’t serve is just different for no good reason, and “how kids nowadays”… ↩︎

  2. I have felt the pain caused by dynamic metadata for tools like Poetry, looked closely enough to not see those as solvable without breaking compatibility, and given my role in largely developing new code and not maintaining old stuff, would love if Python managed to transition to fully static metadata. I’m also in many other camps that don’t always align with how Python does things; I love static typing and only started to use Python for larger programs when type annotations were developed, although I’ve used it since early Python 2. It’s still the language I use the most, which I think is a testament to the process having worked quite well historically. ↩︎