Limitations of PEP 621?

I really like PEP 621. However, I think there are 2 restrictions which seems too limited for some of our projects that could prevent adoptions.

Each of those 2 points are complementary and can be discussed independently.

(1) A build back-end MUST raise an error if the metadata specifies the name in dynamic.

In some cases, we want the same project released under 2 different names. The main use-case for this is projects having a nightly version like:

  • tensorflow / tf-nightly
  • tensorflow_datasets / tfds-nightly
  • kubric / kubric-nightly

(2) Build back-ends MUST raise an error if the metadata specifies a field statically as well as being listed in dynamic.

I think there are valid use-cases where the info should be defined both statically and dynamically. For example:

  • Poetry automatically update classifiers to append "License :: OSI Approved :: Apache and Programming Language :: Python :: 3
  • Tools might want to update the static version with a dynamic extension (e.g. 1.0.0+{datetime.now().isoformat()}, 1.0.0+{git_hash})
  • Projects might want to dynamically extend the optional-dependencies. For examples, automatically add a all = [] extra.
  • Similarly, in tensorflow_datasets, we could dynamically generate dataset-specific optional deps (tensorflow_datasets[<dataset_name>]), but still want to keep a static tensorflow_datasets[dev]

I’m not sure if this is the right place to open those discussion. But I’m interested about what the PEP author think.

As part of your CD to build your nightly versions you could update pyproject.toml before doing your build, thus not requiring the name be dynamic in the version of pyproject.toml checked into source control.

You could specify the classifiers out-of-band (e.g. in a [tool] section) and then append to it as necessary.

That suggests then that your version number isn’t actually known statically and thus it is dynamic as it’s based on information provided at build time (see setuptools_scm).

Is maintaining that by hand too difficult? Or having part of your CI making sure that the all extra is up-to-date?

I would say you either validate/update that outside of the file or your leave it as dynamic.

A key part of PEP 621 is that anything which is specified statically can be directly put into PKG-INFO/METADATA w/o modification. What you’re suggesting breaks that promise. It also means that all of that metadata is at best a hint and can’t be relied on to match what will actually be used later on which is what we were explicitly trying to get away from with PEP 621.

The PEP does not prevent you from modifying the file before you use it, so you can always create tooling to do just that. But otherwise the suggestions above are not enough to suggest to me that we should change the spec to allow for amending/modifying something that is statically defined.

1 Like

Thank you for your detailed answer.

A key part of PEP 621 is that anything which is specified statically can be directly put into PKG-INFO/METADATA w/o modification. What you’re suggesting breaks that promise. It also means that all of that metadata is at best a hint and can’t be relied on to match what will actually be used later on which is what we were explicitly trying to get away from with PEP 621.

I completly agree with keeping statically defined metadata.

My proposal doesn’t change this. Like today, everything not defined in dynamic = [] is static, everything defined in dynamic is provided by tools. This would be exactly the same before or after my proposal.

The user could simply give a hint/partial info of the dynamic fields.

I don’t think that’s unreasonable.

You could specify the classifiers out-of-band (e.g. in a [tool] section) and then append to it as necessary.

Of course, but then I feel this defeat the motivation of PEP 621 to:

Allow for more code sharing between build back-ends for the “boring parts” of a project’s metadata

I think

[project]
name = "etils"
version = "1.0.0"
classifiers = [
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "Topic :: Software Development :: Libraries :: Python Modules",
]
# Add license classifier & version scm suffix
dynamic = ["classifiers", "version"]

Feel more standard than:

[project]
name = "etils"
dynamic = ["classifiers", "version"]

[tool.my_tool]
classifiers = [
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "Topic :: Software Development :: Libraries :: Python Modules",
]

From the tools, there is no semantic difference between the 2 options (the dynamic field indicate which field is provided by tool). I just feel the first one is more “generic”.

Is maintaining that by hand too difficult? Or having part of your CI making sure that the all extra is up-to-date?

Activelly maintaining many python projects is actually much more work than I wish it was, so I’m always looking to optimise my workflow as much as I can.

Having a [all] is a very popular feature expected/requested by many users. I feel it’s a little sad if the best Python offer is to ask project mainteners to maintain it by hand. Here are some example of [all] requested by users.

Don’t get me wrong. I’m really glad that PEP 621 exists and I really want to use it.
However, for complex project, I think the specs should allow more flexible workflow.
As you pointed out, there are workaround but in the end those add more complexity vs keeping a plain setup.py (where I can dynamically set name, [all],… without having to hack around the limitations of PEP 621).

Perhaps, but then why even suggest you know what the dynamic fields should are when you’re admitting upfront that you don’t?

That problem does not need to be solved by pyproject.toml. You can always propose a PEP for an implicit all for installers to support.

If I know a field entirely:

[project]
my_field = 'data'

If I don’t know a field:

[project]
dynamic = ['my_field']

If I know a field partially:

[project]
my_field = 'data (part 1)'
dynamic = ['my_field']

I think defining a field partially when the data is partially know is better than nothing and feel quite intuitive for me.
I also think it’s better for the readers (which gets a hint of what the data is about while being warned this data can be augmented)

.I don’t think I’m being unreasonable here.

I think there are good reason why this hasn’t been done/don’t exists yet (e.g. extras can be mutually incompatible).
However for the projects who want this feature, I think Python should make it as easy as possible to opt-in.
This was easy to automate with setup.py. But now, with PEP 621, this has to be done manually which feel like a step backward.

Also [all] is just an example but there are other cases where users might want to define some deps statically and some deps dynamically.

FWIW, as of pip 21.2 (I think it’s 21.2), you can self-depend, so you don’t have to duplicate all of your dependencies in all:

[project]
name = "my-package"
optional-dependencies.all = ["my-package[foo]", "my-package[bar]"]

Yes, this is what I’m using already for example in: etils/pyproject.toml at 8d5981e4fcee1c4c7dc3c49cac3bbf3ef21fe46d · google/etils · GitHub

In my ideal world, rather than having to manually maintain the list manually, they would be some plugin/hook which would mutate the TomlDict to automatically add those info.

The most natural implementation would be something like:

@plugin
def add_deps_all_extras(pyproject: TomlDict) -> TomlDict:
    pyproject['optional-deps']['all'] = list(
        itertools.chain(*pyproject['optional-deps'].values())
    )
    return pyproject

@plugin
def add_scm_suffix_to_version(pyproject: TomlDict) -> TomlDict:
    version = pyproject['version']
    pyproject['version'] = f'{version}+{get_git_hash()}'
    return pyproject

Those implenentations feel very natural I think, but only works if we allow PEP 621 field to be mutated (field both in static and dynamic).

It’s not a question of being unreasonable, I just personally disagree with your proposal.

Since PEP 621 has been accepted and is being used in the community you will need to write a PEP to propose your changes and get general buy-in from folks to support your PEP. I’m just one person so me disagreeing is very much not a showstopper, but I would advise you wait to see if at least a few people step forward in support before putting in the time to write a PEP.

FWIW, I’m in agreement with @brettcannon here. But I will say that my view is mostly theoretical, as I’ve not used PEP 621 much in practice. So views from people who have found this to be a significant practical issue would be useful if you want to try to write a PEP proposing the change you’re suggesting.

As a disclaimer, I’m not a packaging expert or one of the PEP 621 author like the others here, just someone who has read and re-read it as part of my intersecting work on PEP 639.

The dynamic key and its semantics provide a guarantee that the source metadata specified in the project table of pyproject.toml can be relied upon by other tools, at and after package build time. At present, if a key is marked as dynamic, the field(s) it maps to can be modified in arbitrary ways by individual build tools, and thus you cannot

because the input data cannot be relied upon to not be modified in arbitrary ways by different build tools, and the handling is defined to be inherently bespoke and not shared between different build tools. While a key could be still be specified under project rather than tool, it would no longer be static and would be tool-specific, and thus breaks the project table’s current guarantee that anything specified there would be static and not modified in arbitrary tool-specific ways.

That said, given good cause and a PEP, there might be a few mechanisms that could potentially allow most of the behavior you mention, while still retaining many of the guarantees provided by PEP 621 and benefits thereof; however, the latter do (which more specifically do what you’re asking) do raise backward compat concerns and would come at a cost of tool and specification complexity.

First, the functionality could be provided via other mechanisms than including it in the static metadata, as suggested by others here.

Second, the guarantee could be relaxed very slightly to allow deviations from verbatim copying for one or more specific fields, if explicitly specified by a PEP, are suitably straightforward and have a deterministic outcome (i.e. given a project source tree and project table contents, all tools following the specification will produce the same output every time. This is more or less the approach I specify for a couple of particular (new) cases in PEP 639, with clear rationale for why it needed to be taken. To note, however, for PEP 621 keys/core metadata fields that aren’t newly defined (as both were in PEP 639), this creates potential backward compatibility issues, as older tooling not aware of the specification will produce different metadata in the build distribution, meaning the metadata isn’t truly “static” after all.

Third, specific fields (for example,classifiers) could be allowed to be both listed as dynamic and included under project, which would relax the guarantee in a particular defined way, so that the provided metadata can still be relied on to some extent by other tools, rather than the intended output being completely tool-dependent and non-interoperable. For a field like classifiers, you could specify, for example, that if classifiers is both included and marked as dynamic, individual classifiers can be added but not removed or modified. Existing tools should (per the spec) error out, since the same field is included in both dynamic and the project table, but that’s better then them silently handling the metadata incorrectly—however, if any tools rely on PEP 621’s existing guarantee that any key included under project must map statically to a metadata field, then they will silently behave incorrectly.

Finally, as a more radical form of the above, you could specify that if any key is both specified under dynamic and included under project, tools should only assume that it is the user input in a standardized form, but the actual output metadata is arbitrary and tool-dependent. This would accomplish the limited goal of being a standard user input format for project source metadata, but given different tools could handle any such keys in arbitrary ways in the output fields, it would be no more than a hint to tools about what the user intends, and users switching tools may not expect the output metadata to silently change in arbitrary ways and/or the input metadata to need modification, plus the above concerns about any tools currently relying on such fields.

Not saying these are necessarily good or bad ideas, but hopefully they are helpful in giving you some different directions in which to take a potential proposal that achieves what you’re looking for while having a better chance of being considered.