Partially dynamic project metadata proposal (pre-PEP)

In addition to these concerns (which I agree are valid) I’m concerned that this is ignoring the wider point, that the proposal isn’t limited to requirements. We also need to consider how all other metadata should be combined, and whether there are special rules needed elsewhere. And unless we want to leave a significant problem for future PEP authors, we need a reasonable definition of default behaviour which will work for any future metadata that may get added.

For example, what are the combination rules for license files and license expressions? Does PEP 725 need to add a new section discussing combination rules for external dependency metadata? How will author and maintainer values get merged/normalised (for example, if two author names have the same email address, or vice versa)?

If we go beyond a simple “the build backend may concatenate extra values onto the end of any multiple-use field in the core metadata”, I feel that we’re opening up a huge can of worms. And conversely, if simple concatenation isn’t sufficient, maybe this proposal needs reworking to focus more on specific use cases, and not try to be too general.

2 Likes

Or will we say to a user who gets an order that doesn’t work for them that they have to switch backends?

different backends will produce different requirements

Dynamic metadata is not something that is transferable between backends. You can’t move a project with a dynamic field between backends today, and you won’t be able to after this PEP. I’m working on a way to allow users to implement dynamic metadata that works on multiple backends, but that is not in this PEP.

I think it’s important to note the backend can be the user, not just “some tool” we have to control. Hatchling, scikit-build-core, and setuptools (I think PDM too) all allow users to implement dynamic metadata themselves. If we take functionality away from the users, we should have good reason to. PEP 517 makes it clear that the limitations are there to benefit static tooling, not to apply arbitrary constraints on backends to try to control them.

The most extreme version would be to simply allow anything if dynamic was specified. This gives users full power, but this makes static analysis useless. The goal of the PEP is to balance static analysis and user control.

The version a) is that the static portion is guaranteed to be present. If you see "torch" in the static dependencies, you can be sure that it is in the dependencies. This allows static analysis to gain the ability to detect it, and allows flexibility for the dynamic portion (which literally can be Python code implemented by the package author!) in determining how to present this in METADATA.

The version c) takes an arbitrary choice of appending only, which means that if you actually can’t use ["torch", "torch>=1.12"], you are now forced to abandon adding "torch" as static so that tooling can tell you depend on torch and go full dynamic, as you have to today. A user or tool will not be allowed to do something different even if this fixes a bug. I fail to see why a static tool would significantly benefit from knowing that "torch" comes first but can be followed by anything at all.

My preferred mechanism would be this:

  • List items must be present and in the original static order. Static items cannot be removed.
  • Backends are allowed to insert items (at any position)
  • As a special case for dependencies, backends are allowed to tighten constraints on an existing item.

That would best support one of the main use cases (constraining an existing dependency), and be general enough that special rules aren’t needed per field. (the dependency one is two fields, dependencies and optional-dependencies). It also doesn’t add a requirement that static fields come first, which I don’t see any benefit for.

Deduplication is not allowed for static metadata, and is up to the backend within the limits above. You can’t remove an existing static item, but you can always choose to not add a duplicate item (that’s true even in the most strict version of the proposal, c) ).

Though if we really think c) is needed, I’m fine with that too (though the folks requesting pinning might need to verify they’d be okay with that - due to the concerns above, it looks like you could introduce unfixable bugs in resolution if you don’t give the backend (== user) any flexibility in fixing them).

which I have a PR to fix

I’ll try to review that soon! If the main problem is bugs and specification issues, maybe we should fix those.

1 Like

Because a tool may choose to take that order as significant, e.g. they may choose to download and resolve torches dependencies. And because of the nature of dependency resolution, if a tool does this, you could end up installing a different version of torch based on the order it was provided in the dependencies.

I don’t understand how the last two bullet points are consistent with the first bullet point, if you take static: ["torch", "pandas"] and dynamic ["torch<2"] and transform it into ["torch<2", "pandas"] you have both changed the order (torch was first and last, and now is just first) and removed a static item ("torch").

I don’t know what you mean by “tighten” in the context of PEP 440 style version specifiers, I don’t believe it’s logically possible without further standards.

I don’t understand what you mean by “Deduplication is not allowed” but “tightening” is, I don’t understand how “tightening” doesn’t imply “Deduplication”, if the static part is ["torch"] and the dynamic part is ["torch"] then if your result it ["torch"] this is what I mean by deduplicated.

I’m saying that should not be allowed as it will result in underspecified scenarios.

It’s “a” problem, but it’s not the “main” problem. The main problems are:

  1. PEP-440 style version specifiers do not preserve ordinality, are not transitive, and are “weird” when filtering a mix of final and pre-releases, therefore making standard mathematical transformations to “simplify” or “tighten” them may be logically incorrect
  2. Tools, and in particular dependency resolvers, may take the order of requirements, the number of requirements, the type of specifiers used, to make choices about how to resolve those dependencies.

Here’s a concrete example of point 1:

  • The static part is ["torch<2.0b2"]
  • The dynamic part is ["torch<2"]

According to handling of pre-releases <2.0b2 implies all pre-releases, <2 does not imply pre-releases, but allows them if no final or post version meets the requirement, except it excludes pre-releases of the same specified version.

So what requirement is the result of “restricting” or “tightening” or “simplifying”? Here are some possibilities and their problems:

  • torch<2: This is more tight than torch<2.0b2 (even though 2.0 is greater than 2.0b2), but it is not equivalent to the two requirements torch<2.0b2, torch<2 which implies all pre-releases, e.g. both versions "1.0", "1.0b1" would be selected by torch<2.0b2, torch<2 but not by torch<2. So you have created a semantically different specifier by “tightening”.

  • torch<2.0b2: This is less tight than torch<2, because it allows for versions like torch v2.0b1 which torch<2 explicitly excludes (not because it’s a pre-releases in general but because it excludes pre-releases of the same specified version).

  • torch<2.0dev0: Has the inverse problem to torch<2, it is the equivalent of torch<2.0b2, torch<2, because it implies all pre-releases, and does not allow for pre-releases of the specified version, but it is less tight than torch<2 because it would select all the versions "1.0", "1.0b1" but torch<2 would not

This is not all the issues with PEP 440-style version specifiers, if I’d picked different version numbers, different specifiers, or included post-releases there are additional nuances to work through.

So unless there is a very rigorous definition of what “tighten” or “restrict” or “simplify” means, it will be left up to the interpretation of the implementor, and I don’t see how any two implementations are likely to produce the same result.

2 Likes

Small note: I’ve stated here that torch<2.0b2, torch<2 implies all pre-releases, but thinking on it, this is not well defined in the spec, as the spec makes no attempt to handle more than a single specifier. So this might technically be an implementation detail of packaging and pip.

But this makes trying to “restrict”, “tighten”, or “simplify”, more undefined, not less.

OK, but does that mean that the user has to implement the merging/deduplication algorithms that we’re talking about? I’m now more confused, rather than less - I thought that we were talking about a spec that had the backend’s “dynamic metadata” capabilities (whether that’s user plugins or backend code) generating one set of values, and another set coming from pyproject.toml, with the PEP defining how those two sets get merged.

What we’re trying to control (or rather, specify) is how those two sets of data get combined, isn’t it?

Hang on, we’re not taking anything away here. Specifying a metadata field both in pyproject.toml and in dynamic code isn’t possible at the moment, so there’s nothing to “take away”. If you mean that in the presence of static metadata, users no longer have total freedom to define the data via the dynamic features, then that’s the point - the static data provides guarantees. And anyway, the user is just as much in charge of the static data, so trying to frame this as “us” limiting “them” is pointless at best, and silly at worst.

I think that’s the absolute minimum requirement. Any proposal that didn’t provide that wouldn’t be viable, IMO.

I’m fine with (1) and (2). I don’t like (3) because there’s no clear definition of what it means to “tighten constraints”[1]. Intuitively, it seems like it’s obvious, but there are a lot of special cases that would need to be considered, and I wouldn’t want to leave it to backends to come up with their own interpretation. Honestly, I think that what it means to “tighten” a constraint is probably worth a PEP in its own right (especially if it’s combined with other open questions around specifier algebra).

One further thought. What would all of this mean in terms of the core metadata Dynamic item (from PEP 643)? If dependencies come from pyproject.toml and from the backend, should the resulting data in the sdist be marked as Dynamic or not? Do we leave that question to the backend? Because if it’s marked as dynamic, then the wheel could legitimately omit metadata that’s defined statically in pyproject.toml, as a result of the static nature of that data getting lost in the sdist.


  1. And also because I don’t like special-casing certain fields ↩︎

1 Like

I have a question and a suggestion.

The question: Why does it matter about simplifying and deduplicating dependency metadata? If the intention is that the result is semantically identical to the original, why not just leave it in the unsimplified form? Generally, dependency metadata is for tools to read, not people, so simplifying is a minor convenience at best. And if it’s not semantically identical, surely simplifying would be a breaking change and therefore unacceptable?

The suggestion:

Let’s drop the question of dependencies for now. We should focus on a PEP that defines how partially dynamic metadata will work in the general case, without any of the special cases, exceptions and complications around dependencies. To be honest, I think there could be enough to discuss in that case alone. Then, as a “phase 2” (which can either be in the same PEP or a separate one) we can discuss how to standardise the idea of “simplifying” a specifier set. If we manage to come to a consensus on both of these, we’ve achieved the goals of this thread, but if we only get part way, at least we’ll have something concrete from it.

2 Likes

You cannot resolve dependencies without computing the dynamic portion. You need the entire specification to resolve because they can overlap. The dynamic portion could constrain a package not constrained in the static metadata.

I’m comparing variations in the proposal, not the PEP with the status quo. When we are limiting what plugins are allowed to do, which we need to do to benefit static tooling, I think we should have reasons to do so.

I think I addressed this in the PEP draft, but it is the same as a normal dynamic field; it’s up to the backend to specify it. In the Torch example we’ve been using, the pin is only added when building the wheel, and it’s based on the version of PyTorch you built with. My dynamic-metadata work has a way for plugins to specify which fields are Dynamic.

Oh, I see what you mean, that could allow the wheel process to legally replace it. I think I need a bit more text around this in the PEP; I think a backend should not remove anything statically defined in pyproject.toml.

Let’s drop (3). I’ll reword to just go with (1) and (2), which I think are enough to ensure this supports the dependency pinning use case, and if it doesn’t, it looks like it would need a further PEP anyway. I’m convinced from the above that “tightening” is not specifiable as things stand now.

I didn’t want to design us into a corner where backends could not possibly provide the specification the user needs. However, if the static and dynamic items together can’t resolve, I’d say that’s an issue with specification, and not dynamic metadata, so let’s leave off narrowing. (Deduplication never mattered, as the dynamic portion could just not add the item that is already in the static portion; instead of allowing it, we can just let backends handle that if they care).

Edit: I’ve updated the text in my branch.

I understand that, but if you have one build backend output the core metadata by restricting, tightening, or simplifying the requirements into one form, and a different build backend output the requirements into a different form, then suddenly you can switch from hatchling to flit-core and end up installing completely different versions of your dependencies and completely different transitive dependencies when you do pip install ..

1 Like

See PEP 808: Partially Dynamic Metadata by henryiii · Pull Request #4598 · python/peps · GitHub!

2 Likes

I’ve added comments on the PR. There’s one major point I think needs to be raised here - I’m strongly against the special casing of the license field. As I said earlier in the thread, I think the feature should work purely in terms of the basic data structures (tables and arrays) and not include any semantics relating to individual fields.

If extending license metadata is useful, it should be handled by a separate PEP that allows license to be an array of strings (which I assume would semantically be the same as combining the array values using the AND operator). This PEP would then automatically support that form, with no need for special casing.

1 Like