Proposal for dynamic metadata plugins

henryiii · March 27, 2023, 8:47pm

We’ve been working on a proposal for standardized plugins providing dynamic metadata for build backends. Initial work was done in https://github.com/scikit-build/scikit-build-core/issues/230, and now moving it here. This includes two changes to metadata specification that comes from PEP 621 - adding support for a table in project.dynamic, and loosening the requirements a bit on combing dynamic and static metadata (I see https://discuss.python.org/t/relaxing-or-clarifying-pep-621-requirements-regarding-dynamic-dependencies discussion this and stating a new PEP would be needed!). I’m putting the initial draft inline.

Note that “rejected ideas” aren’t technically rejected yet, and if one of those is deemed better, the proposal can be modified to swap the proposed and rejected ideas.

Also it’s not quite in the right form for a PEP just yet, it needs an abstract / motivation / rational / specification, which all there, but not quite in the proper headings / order. Of course I think of that right after posting… But I’m assuming there will be discussion, suggestions, etc. leading to changes.

Dynamic metadata plugins

Need

In the core metadata specification originally set out in PEP 621 there is the possibility of marking fields as “dynamic”, allowing their values to be determined at build time rather than statically included in pyproject.toml. There are several popular packages which make use of this system, most notably setuptools_scm, which dynamically calculates a version string based on various properties from a project’s source control system, but also e.g. hatch-fancy-pypi-readme, which builds a readme out of user-defined fragments (like the latest version’s CHANGELOG). Most backends, including setuptools, PDM-backend, hatchling, and flit-core, also have built-in support for providing dynamic metadata from sources like reading files.

With the recent profusion of build-backends in the wake of PEPs 517 and 518, it is much more difficult for a user to keep using these kind of tools across their different projects because of the lack of a common interface. Each tool has been written to work with a particular backend, and can only be used with other backends by adding some kind of adapter layer. For example, setuptools_scm has already been wrapped into a hatchling plugin (hatch-vcs), and into scikit-build-core. Poetry also has a custom VCS versioning plugin (poetry-dynamic-versioning), and PDM has a built-in tool for it. However, these adapter layers are inconvenient to maintain (often being dependent on internal functions, for example), confusing to use, and result in a lot of duplication of both code and documentation.

We are proposing a unified interface that would allow metadata providing tools to implement a single function that build backends can call, and a standard format in which to return their metadata. Once a backend chooses to adopt this proposed mechanism, they will gain support for all plugins implementing it.

We are also proposing a modification to the project specification that has been requested by backend and plugin authors to loosen the requirements slightly on mixing dynamic and static metadata, enabling metadata plugins to be more easily adopted for some use cases.

Proposal

Implementing a metadata provider

Our suggestion is that metadata providers include a module (which could be the top level of the package, but need not be) which provides a function dynamic_metadata(fields, settings=None). The first argument is the list of fields requested of the plugin, and the second is the extra settings passed to the plugin configuration, possibly empty. This function will run in the same directory that build_wheel() runs in, the project root (to allow for finding other relevant files/folders like .git).

The function should return a dictionary matching the pyproject.toml structure, but only containing the metadata keys that have been requested. dynamic, of course, is not permitted in the result. Updating the pyproject_dict with this return value (and removing the corresponding keys from the original dynamic entry) should result in a valid pyproject_dict. The backend should only update the key corresponding to the one requested by the user. A backend is allowed (and recommended) to combine identical calls for multiple keys - for example, if a user sets “readme” and “license” with the same provider and arguments, the backend is only required to call the plugin once, and use the readme and license fields.

An optional hook^[1], get_requires_for_dynamic_metadata, allows providers to determine their requirements dynamically (depending on what is already available on the path, or unique to providing this plugin).

Here’s an example implementation:

def dynamic_metadata(
    fields: Sequence[str],
    settings: Mapping[str, Any],
) -> dict[str, dict[str, str | None]]:
    if settings:
        raise RuntimeError("Inline settings are not supported by this plugin")
    if fields != ["readme"]:
        raise RuntimeError("This plugin only supports dynamic 'readme'")

    from hatch_fancy_pypi_readme._builder import build_text
    from hatch_fancy_pypi_readme._config import load_and_validate_config

    with Path("pyproject.toml").open("rb") as f:
        pyproject_dict = tomllib.read(f)

    config = load_and_validate_config(
        pyproject_dict["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"]
    )

    return {
        "readme": {
            "content-type": config.content_type,
            "text": build_text(config.fragments, config.substitutions),
        }
    }


def get_requires_for_dynamic_metadata(
    settings: Mapping[str, Any] | None = None,
) -> list[str]:
    return ["hatch-fancy-pypi-readme"]

Using a metadata provider

For maximum flexibility, we propose specifying a 1:1 mapping between the dynamic metadata fields and the providers (specifically the module implementing the interface) which will supply them.

The existing dynamic specification will be expanded to support a table as well:

[project.dynamic]
version = {provider = "plugin.submodule"}                   # Plugin
readme = {provider = "local_module", provider-path = "scripts/meta"} # Local plugin
classifiers = {provider = "plugin.submodule", max="3.11"}   # Plugin with options
requires-python = {min = "3.8"}                             # Build-backend specific
dependencies = {}                                           # Identical to dynamic = ["dependences"]
optional-dependences.provider = "some_plugin"               # Shortcut for provider =

If project.dynamic is a table, a new provider="..." key will pull from a matching plugin with the hook outlined above. If provider-path ="..." is present as well, then the module is a local plugin in the provided local path (just like PEP 517’s local backend path). All other keys are passed through to the hook; it is suggested that a hook validate for unrecognized keys. If no keys are present, the backend should fall back on the same behavior a string entry would provide.

Many backends already have some dynamic metadata handling. If keys are present without provider=, then the behavior is backend defined. It is highly recommended that a backend produce an error if keys that it doesn’t expect are present when provider= is not given. Setuptools could simply its current tool.setuptools.dynamic support with this approach taking advantage of the ability to pass custom options through the field:

# Current
[project]
dynamic = ["version", "dependencies", "optional-dependencies"]

[tool.setuptools.dynamic]
version = {attr="mymod.__version__"}
dependencies = {file="requeriments.in"}
optional-dependencies.dev = {file="dev-requeriments.in"}
optional-dependencies.test = {file="test-requeriments.in"}


# After
[project.dynamic]
version = {attr="mymod.__version__"}
dependencies = {file="requeriments.in"}
optional-dependencies.dev = {file="dev-requeriments.in"}
optional-dependencies.test = {file="test-requeriments.in"}
# "provider = "setuptools.dynamic.version", etc. could be set but would be verbose

Another idea is a hypothetical regex based version discovery, which could look something like this if it was integrated into the backend:

[project.dynamic]
version = {location="src/package/version.txt", regex='Version\s*([\d.]+)'}

Or like this if it was a plugin:

[project.dynamic.version]
provider = "regex.searcher.version"
location = "src/package/version.txt"
regex = 'Version\s*([\d.]+)'

Using project.dynamic as a table keeps the specification succinct without adding extra fields, it avoids duplication, and it is handled by third party libraries that inspect the pyproject.toml exactly the same way (at least if they are written in Python). The downside is that it changes the existing specification, probably mostly breaking validation - however, this is most often done by the backend; a backend must already opt-into this proposal, so that is an acceptable change. pip and cibuildwheel, two non-backend tools that read pyproject.toml, are unaffected by this change.

Supporting metadata providers:

An implementation of this proposal already exists for the scikit-build-core backend and uses only standard library functions. Implementations could be left up to individual build backends to provide but if the proposal were to be adopted then would probably coalesce into a single common implementation. pyproject-metdata could hold such a helper implementation.

Proposed changes in the semantics of `project.dynamic`

PEP 621 explicitly forbids a field to be “partially” specified in a static way (i.e. by associating a value to project.<field> in pyproject.toml) and later listed in dynamic.

This complicates the mechanism for dynamically defining fields with complex/compound data structures, such as keywords, classifiers and optional-metadata and requires backends to implement “workarounds”. Examples of practices that were impacted by this restriction include:

whey’s re-implementation of classifiers in a tool subtable (dependencies too!)
the removal of the classifiers augmentation feature in pdm-backed.
setuptools restrictions on dynamic optional-dependencies

In this PEP, we propose to lift this restriction and change the semantics associated with pyproject.dynamic in the following manner:

When a metadata field is simultaneously assigned a value and included in pyproject.dynamic, tools should assume that its value is partially defined. The given static value corresponds to a subset of the value expected after the build process is complete. Backends and dynamic providers are allowed augment the metadata field during the build process.

The fields that are arrays or tables with arbitrary entries are urls, authors, maintainers, keywords, classifiers, dependencies, scripts, entry-points, gui-scripts, and optional-dependencies.

Examples & ideas:

Version computation from VCS
Building description out of parts of a readme & other files
Pulling metadata from another build system (like CMake, Meson, Cargo) - these can be “integrated” plugins; setuptools’s dynamic table could use this instead.
Automatic classifier computation
Local metadata scripts
Dependency addition (extras, or adding specific dependencies for specific wheels)

Current PEP 621 backends & dynamic metadata

Backend	Dynamic?	Config?	Plugins?
setuptools
hatchling
flit-core
pdm-backend
scikit-build-core			^[2]
meson-python
maturin
enscons
whey
trampolim

“Dynamic” indicates the tool supports at least one dynamic config option. “Config” indicates the tool has some tool-specific way to configure this option. “Plugins” refers to having a custom plugin ecosystem for these tools. Poetry has not yet adopted PEP 621, so is not listed above, but it does have dynamic metadata with custom configuration and plugins. This proposal will still help tools not using PEP 621, as they can still use the plugin API, just with custom configuration (but they are already using custom configuration for everything else, so that’s fine).

Rejected ideas

Notes on extra file generation

Some metadata plugins generate extra files (like a static version file). No special requirements are made on such plugins or backends handling them in this proposal; this is inline with PEP 517’s focus on metadata and lack of specifications file handling.

Config-settings

The config-settings dict could be passed to the plugin, but due to the fact there’s no standard configuration design for config-settings, you can’t have generally handle a specific config-settings item and be sure that no backend will also try to read it or reject it. There was also a design worry about adding this in setuptools, so it was removed (still present in the reference implementation, though).

Passing the pyproject.toml as a dict

This would add a little bit of complexity to the signature of the plugin, but would avoid reparsing the pyproject.toml for plugins that need to read it. Also would avoid an extra dependency on tomli for older Python versions. Custom inline settings alleviated the need for almost every plugin to read the pyproject.toml, so this was removed to keep backend implementations & signatures simpler.

Shortcut for just selecting a provider

To keep the most common use case simple[^4], passing a string is equivalent to passing the provider; version = "..." is treated like version = { provider = "..." }. This makes the backend implementation a bit more complex, but provides a simpler user experience for the most common expected usage. This is similar to the way to how keys like project.readme = and project.license = are treated today. This was rejected since adding .provider works in TOML and keeps it explicit.

New section

Instead of changing the dynamic metadata field to accept a table, instead there could be a new section:

dynamic = ["version"]

[dynamic-metadata]
version = {provider = "plugin_package.submodule"}

This is the current state of the reference implementation, using [tool.scikit-build.metadata] instead of [dynamic-metadata]. In this version, listing an item in dynamic-metadata should be treated as implicitly listing it in dynamic, though listing in both places can be done (primary for backward compatibility).

dynamic vs. dynamic-metadata could be confusing, as they do the same thing, and it actually makes parsing this harder for third-party tools, as now both project.dynamic and dynamic-metadata have to be combined to see what fields could be dynamic. The fact that dict keys and lists are handled the same way in Python provides a nice method to avoid this complication.

Alternative proposal: new array section

A completely different approach to specification could be taken using a new section and an array syntax^[3]:

dynamic = ["version"]

[[dynamic-metadata]]
provider = "plugin_package.submodule"
provider-path = "src"
provides = ["version"]

This has the benefit of not repeating the plugin if you are pulling multiple metadata items from it, and indicates that this is only going to be called once. It also has the benefit of allowing empty dynamic plugins, which has an interesting non-metadata use case, but is probably out of scope for the proposal. The main downside is that it’s harder to parse for the dynamic values by third party projects, as they have to loop over dynamic-metadata and join all provides lists to see what is dynamic. It’s also a lot more verbose, especially for the built-in plugin use case for tools like setuptools. (The current version of this suggestion listed above is much better than the original version we proposed, though!). This also would allow multiple plugins to provide the same metadata field, for better (maybe this could be used to allow combining lists or tables from multiple plugins) or worse (this has to be defined and properly handled).

This version could enable a couple of possible additions that were not possible in the current proposal. However, most users would not need these, and some of them are a bit out of scope - the current version is simpler for pyproject.toml authors and would address 95% of the plugin use cases.

Multiple plugins per field

The current proposal requires a metadata field be computed by one plugin; there’s no way to use multiple plugins for a single field (like classifiers). This is expected to be rare in practice, and can easily be worked around in the current proposal form by adding a local plugin that itself calls the plugins it wants to combine following the standard API proposed. “Merging” the metadata then would be arbitray, since it’s implemented by this local plugin, rather than having to be pre-defined here.

Empty plugins (for side effects)

A closely related but separate could be solved by this paradigm as well with some modifications. Several build tools (like cmake, ninja, patchelf, and swig) are actually system CLI tools that have optional pre-compiled binaries in the PyPI ecosystem. When compiling on systems that do not support binary wheels (a very common reason to compile!), such as WebAssembly, Android, FreeBSD, or ClearLinux, it is invalid to add these as dependencies. However, if the system versions of these dependencies are of a sufficient version, there’s no need to add them either. A PEP 517 backend has the ability to declare dynamic dependencies, so this can be (and currently is) handled by tools like scikit-build-core and meson-python in this way. However, it might also be useful to allow this logic to be delegated to a metadata provider, this would potentially allow greater sharing of core functionality in this area.

For example, if you specified “auto_cmake” as a provider, it could provide get_requires_for_dynamic_metadata to supply this functionality to any backend. This will likely best be covered by the “extensionlib” idea, rather than plugins, so this is not worth trying to address unless this array based syntax becomes the proposed syntax - then it would be worth evaluating to see if it’s worth trying to include.

Most plugins will likely not need to implement this hook, so it could be removed. But it is symmetric with PEP 517, fairly simple to implement, and “wrapper” plugins, like the first two example plugins, need it. It is expected that backends that want to provide similar wrapper plugins will find this useful to implement. ↩︎
In development, based on a version of proposal. ↩︎
Note, that unlike the proposed syntax, this probably should not repurpose project.metadata, since this would be much more likely to break existing parsing of this field by static tooling. (Static tooling often may not parse this field anyway, since it’s easier to check for a missing field - you only need to check the dynamic today if you care about “missing” version “specified elsewhere”.) ↩︎

pradyunsg · March 27, 2023, 9:27pm

I don’t have any major horse in this race, since IIUC, build-backend → tools-that-a-build-backend-uses interoperability^[1] so I don’t have much to add other than bikeshedding-like concerns (below).

Is this not a restricted version of https://hatch.pypa.io/latest/plugins/metadata-hook/reference/?
Why is get_requires_for_dynamic_metadata required? If it’s a plugin that’s pulled in by the build-backend based on configuration, it can have that as a dependency of itself and be returned by the build-backend via get_requires_for_build_sdist… right? IMO it’s likely better to have the user explicitly list those out in build-system.requires and avoid dynamicness around this (I don’t really understand the need for an ability to dynamically declare dependencies TBH)
I also see get_requires_for_dynamic_metadata_wheel mentioned in the post, but I assume that’s a typo?

I’d suggest avoiding the vocabulary of build-backend hooks for that (i.e. don’t say get_requires_for_...) to avoid confusion.
dynamic_metadata(...) would be better named something like resolve_dynamic_metadata or something, since not containing dynamic metadata is the whole point!

I do maintain a build-backend, since GitHub - pradyunsg/sphinx-theme-builder: Streamline the Sphinx theme development workflow (maintained, but extremely stable as of Jan 2023) is a thing, but it’s not got any dynamic behaviours and I won’t be adding any either. ↩︎

pf_moore · March 27, 2023, 9:59pm

I’d echo the point about get_requires_for_dynamic_metadata. The build frontend doesn’t interact directly with the plugin, all interactions happen via the backend. So the frontend will call the existing get_requires_for_build_* hooks, and the backend should be responsible for reading pyproject.toml and determining the needed plugins and returning them in its response. Any dependencies of the plugin will get pulled in automatically by the resolver.

Other than that, as a frontend maintainer, I have no particular opinions on this PEP (I’ll change that position if the PEP expects any frontend changes, but I don’t believe it should). My only views are essentially bikeshedding:

This seems like a rather complex proposal. Is it trying to over-generalise? Outside of dynamically generated versions, the use cases seem fairly niche. For example, what’s the use-case for in-tree plugins? And why can’t plugin options be specified using a [tool.plugin] setting? Is there any realistic likelihood that multiple plugins will have the same configuration (and that users will expect to be able to switch plugins without changing the config)?
Are there use cases for backends to dynamically add keywords or authors/maintainers to a static base set? Again, this feels like over-generalisation. The relaxation of the rule that metadata should be static or dynamic, but not both, feels like it might deserve to be an independent proposal. Otherwise, I fear that it might get “lost” in the discussion of the rest of the proposal.

henryiii · March 28, 2023, 12:17am

Yes, that’s an inspiration for this. That’s specific to Hatchling. I’d like to implement dynamic metadata plugins for scikit-build-core and meson-python without having to build a scikit-build-<plugin> and meson-python-<plugin> set of plugins. And other build backends either have their own ecosystem of plugins (poetry, setuptools), or integrate this functionality (pdm-backend, whey) or are missing this functionality (flit-core, trampolim, enscons).

We don’t have to keep it in the proposal - there’s a footnote mentioning it could be dropped, and I almost did, I just got some comments saying it was easy enough to add and potentially useful. It’s mainly there for two reasons:

The fact PEP 517 allows this has been critical in supporting cmake and meson builds - cmake and ninja cannot be a normal dependency since they are just wrappers for a CLI tool that might not be present - and the place you need binary builds on users systems are typically exactly the places wheels can’t be distributed! (WebAssembly, Android, ClearLinux, BSD’s, raspberry pi-sort of, etc) And, for example, a plugin reading metadata from CMake would need to do exactly the same thing - only request cmake if the cmake tool is not present.
It’s handy for a specific use case - wrapping a plugin, which is exactly what the reference implementation has to use it, since scikit-build-core can’t depend on every plugin it wraps, so instead it needs to only add the plugins it’s wrapping if they are requested.

However, both of these issues have been somewhat alleviated by the addition of backend-specific support (requested by setuptools). This means a CMake backend or meson backend could add support via this method, and not provide a general “get metadata from cmake/meson” plugin for other backends to use.

Yes, get_requires_for_dynamic_metadata_wheel is a typo. I like reusing simple terms reminiscent of PEP 517, as the return value is identical for the get_requires one, and dynamic_metadata is a function, not a constant, so computing the dynamic metadata seems to be exactly what I’d expect for it to do. But I don’t really care what these are called, if someone wants something else. I just think simple is most likely to be agreeable.

It started a lot simpler - setuptools is one of the main reasons it became more complicated, because they had an immediate use case. They want to write:

[project]
name = "my_proj"
author = {name = "John Doe"}

[dynamic-metadata]
version = {attr="mymod.__version__"}
dependencies = {file="requeriments.in"}
optional-dependencies.dev = {file="dev-requeriments.in"}
optional-dependencies.test = {file="test-requeriments.in"}

(Or, more verbosely, if you remove the support for build-backend specific behavior if provider is omitted)

[dynamic-metadata]
version = {provider="setuptools.dynamic.version", attr="mymod.__version__"}
dependencies = {provider="setuptools.dynamic.dependencies", file="requeriments.in"}
optional-dependencies.dev = {provider="setuptools.dynamic.dependencies", file="dev-requeriments.in"}
optional-dependencies.test = {provider="setuptools.dynamic.dependencies", file="test-requeriments.in"}

I think this answers several of your questions - this is an example of a single plugin with different settings, and an example of inline configuration that’s useful. The current proposal doesn’t discourage tool.<plugin> configuration, and plugins like hatch-fancy-pypi-readme would probably still require it. But being able to configure inline is useful, especially for some plugins that can provide multiple fields. The reasons for the list of fields in and dict out was also for a cmake style plugin - a single call is very expensive (multiple seconds or more), and it can generate a lot of fields at once. Again, this is really alleviated by the addition of backend-specific behavior, so maybe the proposal could be simplified to drop the recommended merging of identical calls (which also means the input fields list could be a string and the output could be the final value instead of a dict).

Also, local plugins were also immediately requested, and would be very useful, say for example in pybind11, pybind11 could have a regex that picks out the version from the C++ code - currently the version is specified in two places. Another use case is in the proposal - if you wanted to merge outputs from two plugins, you could do that with a local plugin (which is also why the proposal doesn’t support multiple plugins per field).

I do think more use cases will pop up if it becomes easier to write a plugin. And I’d say the readme plugin (have you ever used hatch-fancy-pypi-readme? It’s fantastic!) is a very valid major use case. Being able to remove unnecessary bits from your readme and add things like the first few entries of your changelog via static configuration in pyproject.toml is fantastic.

I think this was just a general rule (lists and tables can be added to), and authors/maintainers happen to be arbitrary lists. I’d say the simpler thing is to not arbitrarily restrict ones we think aren’t useful, but to just provide a general rule - lists and tables can be added to. If it’s not useful, people won’t use it. Though I could imagine a list being generated from an author list like CITATION, actually! Or a GitHub contributor list. The point of dynamic metadata is to allow metadata to be pulled from an alternate format, like Git, CITATION, meson/cmake/cargo/etc, .py files, or built from parts (like README). Currently, it’s tied to the build backend.

Very much could be a separate PEP - I couldn’t tell if two independent but related changes should be one PEP or two. But I think initial discussion can happen here. One reason the above isn’t in a polished final PEP form, it might be two PEPs.

CAM-Gerlach · March 28, 2023, 8:05am

At least in the final draft PEP, I would recommend being a little more specific and precise here, describing more explicitly the effect on each of the different value types (table, array, etc) and making it normatively clear that the per the specification, the final built core metadata field values for any dynamic fields MUST be (inclusive) supersets of those specified in the [project] table keys, with the existing values unmodified. Otherwise, without clear and unambiguous specified and implemented semantics, tools will not be able to rely even to the envisioned limited extent on the static values in the [project] table, making them useless.

On a broader note, I really hope this doesn’t encourage users to specify more keys as dynamic than strictly necessary and that they are already, as that would be a significant loss in the primary benefit of PEP 621/pyproject metadata. Alternatively, though, it would provide at least a modest net gain in such at least for many types of tools if users do end up adopting it for the keys they are already marking as dynamic anyway, particularly in light of the other interoperability benefits.

Also, as a lesson learned from interpreting PEP 621, please try to make sure you’re consistent, clear and precise with the language distinguishing pyproject source metadata keys from core metadata fields, which have different names, syntax and don’t all match up 1:1. This resulted in a lot of confusion and uncertainty over its interpretation specifically where the dynamic key was concerned, which we should avoid in the future.

konstin · March 28, 2023, 11:16am

From the perspective of maturin, i’d like to support e.g. transformations on the readme, but i think that can be done with static metadata, e.g.

[project]
readme = "Readme.md"
# other fields

[transforms]
readme = "fancy_markdown_generator.tranform"

The advantage is that you can still get the normal readme from static (and more importantly, without running arbitrary code) like you would on github, and when building for pypi the readme gets transformed.

For the version, i’d prefer agreement that pyproject.toml is the source of truth for the version and __version__ or vcs tags should be set from it (setting __version__ with importlib.metadata). This is already successfully working in other ecosystems such as npm and cargo.

Do you have examples where requires-python and classifiers would actually change during the build? For classifiers i see the whey case, but wouldn’t it make more sense anyway to parse license-key, platforms, python-implementations and python-versions from pyproject.toml than reading classifiers, or having whey edit the pyproject.toml?

https://discuss.python.org/t/pep-for-dynamic-metadata-plugins/25237/3: Are there use cases for backends to dynamically add keywords or authors/maintainers to a static base set? Again, this feels like over-generalisation. The relaxation of the rule that metadata should be static or dynamic, but not both, feels like it might deserve to be an independent proposal. Otherwise, I fear that it might get “lost” in the discussion of the rest of the proposal.

I currently see the requirement to transform readme files (github and pypi having slightly different markdown needs to not look broken), but i think it would be great to list out for which other fields this applies and why this cases can’t be solved without a new standard.

What would be interesting for me while we’re already specifying this would be a standard way for tools to add additional (generated) files to the distribution as this a common use case (but i can totally see if you say this is out of scope).

https://discuss.python.org/t/pep-for-dynamic-metadata-plugins/25237/5: On a broader note, I really hope this doesn’t encourage users to specify more keys as dynamic than strictly necessary and that they are already, as that would be a significant loss in the primary benefit of PEP 621/pyproject metadata. Alternatively, though, it would provide at least a modest net gain in such at least for many types of tools if users do end up adopting it for the keys they are already marking as dynamic anyway, particularly in light of the other interoperability benefits.

Very much agree with this. Besides the sdist → wheel transformation there is also a number of analysers and tools that read and process python packaging data which won’t execute arbitrary code and which would greatly benefit from static metadata.

This is also a big performance issue, reading a toml file takes nearly no time at all while setting up a new python process to call the provider is a noticeably overhead (and source of potential errors).

The first argument is the list of fields requested of the plugin, and the second is the extra settings passed to the plugin configuration, possibly empty.

Rejected Ideas

Config-settings

The config-settings dict could be passed to the plugin, but due to the fact there’s no standard configuration design for config-settings, you can’t have generally handle a specific config-settings item and be sure that no backend will also try to read it or reject it. There was also a design worry about adding this in setuptools, so it was removed (still present in the reference implementation, though).

Does plugin configuration mean the other fields from the toml dict next to provider, and if so are plugins allowed/supposed to read e.g. [hatch.hatch_fancy_pypi_readme], and if so, what is the type of settings? Please also consider allowing the user to pass in settings, the lack of a consistent interface for config_settings is probably the biggest we have with PEP 517 in maturin (there is e.g. no way for the user specify that they want a debug or a release build of a dependency at the moment).

For forward compatibility, is there a way to add arguments that the caller may or must pass to dynamic_metadata later without breaking everyone who implemented def dynamic_metadata(fields, settings=None)?

The function should return a dictionary matching the pyproject.toml structure, but only containing the metadata keys that have been requested.

Is there a designated output directory for newly generated files or are you supposed to pass all text inline as with the readme example?

An optional hook, get_requires_for_dynamic_metadata, allows providers to determine their requirements dynamically (depending on what is already available on the path, or unique to providing this plugin).

Would it be possible to specify this statically, either in or similar to optional-dependencies? In general, i think it should be spelled out clearer how the backend knows which version of which providers to install.

It is highly recommended that a backend produce an error if keys that it doesn’t expect are present when provider= is not given.

I’d change this to “A backend MUST error if it sees an unexpected key”.

Using project.dynamic as a table keeps the specification succinct without adding extra fields, it avoids duplication, and it is handled by third party libraries that inspect the pyproject.toml exactly the same way (at least if they are written in Python). The downside is that it changes the existing specification, probably mostly breaking validation - however, this is most often done by the backend; a backend must already opt-into this proposal, so that is an acceptable change. pip and cibuildwheel, two non-backend tools that read pyproject.toml, are unaffected by this change.

This is not an argument against doing this, but please acknowledge in this section that this can break tools parsing pyproject.toml for e.g. generating github-like dependency graphs or other tool and IDE integrations.

To keep the most common use case simple, passing a string is equivalent to passing the provider; version = “…” is treated like version = { provider = “…” }.

I’d like to avoid this, this will be confusing for users when version = "..." has completely different semantics depending on where in the pyproject.toml it is and it will also give bad error messages when a user accidentally enter a PEP 621 value instead of a provider. On the other hand seeing version = "..." and hinting the user “Did you mean: version = { provider = "..." }?” is easy to have.

When a metadata field is simultaneously assigned a value and included in pyproject.dynamic, tools should assume that its value is partially defined. The given static value corresponds to a subset of the value expected after the build process is complete. Backends and dynamic providers are allowed augment the metadata field during the build process.

This is sound implementation-wise, but what is the advantage here over defining partial data in backend specific fields?

This would add a little bit of complexity to the signature of the plugin, but would avoid reparsing the pyproject.toml for plugins that need to read it. Also would avoid an extra dependency on tomli for older Python versions. Custom inline settings alleviated the need for almost every plugin to read the pyproject.toml, so this was removed to keep backend implementations & signatures simpler.

Please add the working directory of the provider hook to the specification.

henryiii · March 28, 2023, 1:18pm

Sure, that can be done. Yes, the idea is very much to keep anything statically defined is reliable, with dynamic only indicating that it could be added to. I also would specifically only allow it for tables if they take arbitrary keys.

I don’t think this will cause users do pull out metadata that is already static, but will help with standardizing the current practices already seen in some backends, and will help users transitioning that are currently struggling because they specify custom readme transforms, extra-dependency summation, etc. in their setup.py’s. And it will help in transitioning between backends - you could switch from hatch to maturin/scikit-build-core/mason-python while keeping your setuptools_scm, hatch-fancy-pypi-readme, etc. practices. (Note: I pick those because you have no choice if you need to use those, that is, if you have a binary build with cargo/cmake/meson).

That’s exactly what this PEP would enable. You do not want to specify a readme in readme that is not the package readme, though! You’d put readme in dynamic, then you’d use a plugin, like this: scikit-build/pyproject.toml at 9dafff729a16568ae939ac0d6e8e08761e3e6f39 · scikit-build/scikit-build · GitHub Note that that’s an actual, real example, and was part of the setup.py → hatchling conversion for scikit-build (classic, the one started in 2014). The only difference is this PEP would enable all backends that support this PEP to use this plugin, rather than just hatchling. And the readme is entirely statically configured, no arbitrary Python running!

If you need the GitHub readme, just get README* from the sdist, exactly the way GitHub find’s READMEs too! The readme field in the pyproject.toml is for PyPI (package) descriptions. You don’t want to have it static but modified, especially by arbitrary Python, as in your [transforms]. Some analysis tools (like GitHub) aren’t even written in Python.

The plugins settings are exactly this:

[build-system]
requires = ["any-backend-that-supports-this", "konstins-readme-transform"]
build-backend = "any_backend_that_supports_this"

[project.dynamic.readme]
provider = "konstins-readme-transform"
file = "Readme.md"

would translate to the backend calling konstins-readme-transform.dynamic_metadata(["readme"], {"file": "Readme.md"}). This allows basically arbitrary communication, so I don’t think you need an extra field. Backends could, of course, set up privileged communication with their own special plugins - this is already done between frontends (such as hatch and flit) with their backends (though hatch is being fixed). I don’t think we need to add any provisions for that, it will happen anyway.

Most plugins don’t need this, and those that do may actually need the dynamic part. If this isn’t dropped, then I’d say it should remain similar to PEP 517’s method. If PEP 517 had chosen to make get_requires_for_build_* static, at least meson-python and scikit-build-core would be seriously crippled - meson Python dynamically injects patchelf and ninja only if needed, scikit-build-core does cmake and ninja.

Sure. What’s the best procedure for updates? Update the above text? Move this to a repo? Post it again and again in replies?

Okay, I thought I did basically say that, but I can add a bit more. GitHub’s (very newly added) support will likely break, because they are not using Python - most Python tooling should continue to work unless they add isinstance, which is only required of the backend. But I think we could coordinate with GitHub to make sure it handles table keys as well as lists.

I was okay to drop it too, but someone requested keeping it. What is a good method for seeing what people are in favor of? Is there a voting mechanic here? Oh, there is. Nice.

The advantage is that static parsers (that don’t want to because it’s expensive, or maybe can’t because they can’t run arbitrary Python) need to be able to rely on the data they are gathering from the pyproject.toml. Currently, there are only two options - fully statically specified (not in dynamic) or completely dynamic (not present and in dynamic). This proposal (well, maybe the second pep if it’s split into two) would enable a third option - present and also in dynamic. This data is only useful if you can make assumptions about it - the assumption we are providing is that it will not be modified or removed, but only augmented. This assumption is useful for many cases - if you are asking “what dependencies does this have”, knowing all the static dependencies is better than knowing none of them at all. Knowing all the static classifiers is better than knowing none at all. Etc. So it’s a net gain both for authors (since you don’t have to move all dependency or classifier specification to a custom field, as you have to do today with whey and setuptools), and for static tooling, since they now can see most of the details rather than none.

The working directory is just the same working directory as PEP 517, but stating that clearly, is that what you mean?

henryiii · March 28, 2023, 1:20pm

What are people thinking about splitting this into two separate PEPs? One for plugins, and one for additive dynamic metadata?

One PEP
Two PEPs

0 voters

henryiii · March 28, 2023, 1:24pm

Is the shortcut project.dynamic.version = {provider = "plugin"} → project.dynamic.version = "plugin" a good one to keep? Since you could write this project.dynamic.version.provider = "plugin", I’ll put my vote in for “no”.

No, don’t need the shortcut
Yes, keep the shortcut

0 voters

I’ve moved this to rejected ideas above, and put in an example with the .provider= shortcut.

dholth · March 28, 2023, 1:29pm

This reminds me of the setuptools egg_info.writers entry point, some of which seem to be listed at Wheelodex — egg_info.writers

Plugins write all files inside the egg-info folder, including famous ones like entry_points.txt

abravalheri · March 28, 2023, 2:40pm

Hi Paul, thank you very much for taking the time to read the document.

I think it is useful to think what happens if we don’t use a “centralized” approach, and instead have multiple parts of the pyproject.toml file that define dynamic behaviour.

For example, let’s imagine a setuptools user wants to take advantage of an hypothetical “fancy-pypi-readme”:

[project]
name = "myproj"
dynamic = ["version", "description", "optional-dependencies"]
# ^-- This part is required to inform frontend/tools, which can use optimizations

[dynamic-metadata]
description = {provider = "fancy_pypi_readme"}
# ^-- This part would be required for plugin interoperability

# --> unrelated stuff (regarding to build backend concerns)
[tool.mypy]
# ...

[tool.pytest.ini_options]
# ...
# <-- end of unrelated stuff

[tool.setuptools.dynamic]
version = {attr = "mymod.__version__"}
optional-dependencies.docs = {file = "docs/requirements.in"}
optional-dependencies.tests = {file = "tests/requirements.in"}
# ^-- This part  is backend specific and only makes sense
#      if `project.dynamic` is well configured

[tool.fancy_pypi_readme]
content-type = "text/markdown"
file = "AUTHORS.md"
# ^-- This part is plugin specific and only makes sense
#      if both `project.dynamic` and `dynamic-metadata` are well configured

Here we can see dynamic behaviour spreading across 4 distinct sections of configuration: project, dynamic-metadata, tool.setuptools.dynamic, tool.fancy_pypi_readme. Moreover we also see interdependencies between these sections (e.g. backend-specific sections can only operate if dynamic is set properly, plugin-specific sections can only operate if both dynamic and dynamic-metadata are set properly).

With that in mind, the opinion that I would like to convey is that allowing related information to be grouped together in a meaningful way makes a lot of sense. And this proposal offers a consistent and tidy way of organizing this information with a builtin extension mechanism.

I would argue that the proposal improves UI, which is something important considering that the main functionality of the project section in pyproject.toml file is the communication between user and build backend.

konstin · March 28, 2023, 3:40pm

You do not want to specify a readme in readme that is not the package readme, though! You’d put readme in dynamic, then you’d use a plugin, like this: scikit-build/pyproject.toml at 9dafff729a16568ae939ac0d6e8e08761e3e6f39 · scikit-build/scikit-build · GitHub

The difference i see with that example is that README.rst does exist and if i were a tool i’d like to know that, even if before uploading to pypi there’s a transform that makes it look different from the github version.

And the readme is entirely statically configured, no arbitrary Python running!

If i want to have the actual final readme i still need to run arbitrary python, which still has the performance problems, the implementation complexity and the error handling even if the provider has statical configuration, even if fancy-pypi-readme has toml configuration.

If you need the GitHub readme, just get README* from the sdist, exactly the way GitHub find’s READMEs too! The readme field in the pyproject.toml is for PyPI (package) descriptions. You don’t want to have it static but modified, especially by arbitrary Python, as in your [transforms]. Some analysis tools (like GitHub) aren’t even written in Python.

For me having a readme field is about not having to guess the location of the readme but having a specification where to find. There’s different ways to do this, e.g. cargo specifies “If no value is specified for this field, and a file named README.md, README.txt or README exists in the package root, then the name of that file will be used. You can suppress this behavior by setting this field to false”. I actually like this a lot because it means one key less to specify, but it relies on having a specified search order over looking for README* and hoping it returns exactly one file (otoh, saying there must a most one README* file in a source distribution which is then considered canonical would also a valid specification, but it kinda needs to be written down somewhere).

Most plugins don’t need this, and those that do may actually need the dynamic part. If this isn’t dropped, then I’d say it should remain similar to PEP 517’s method. If PEP 517 had chosen to make get_requires_for_build_* static, at least meson-python and scikit-build-core would be seriously crippled - meson Python dynamically injects patchelf and ninja only if needed, scikit-build-core does cmake and ninja.

Oh so the ~~frontend~~ backend logic would look like below and most tools would go with the fast, non-install path? Then this is a not a problem indeed.

if get_requires_for_dynamic_metadata := getattr(provider, "get_requires_for_dynamic_metadata", None):
    install(get_requires_for_dynamic_metadata(settings))

Sure. What’s the best procedure for updates? Update the above text? Move this to a repo? Post it again and again in replies?

I’d make a github repo to track changes (and maybe pull requests, depending on your preference) and then update the text in github and here.

Okay, I thought I did basically say that, but I can add a bit more. GitHub’s (very newly added) support will likely break, because they are not using Python - most Python tooling should continue to work unless they add isinstance, which is only required of the backend. But I think we could coordinate with GitHub to make sure it handles table keys as well as lists.

FWIW my use cases are deserializing into cattrs/pydantic as well as using pyproject.toml rust with serde which all three enforce a schema. But again, this not an argument against this change, just something that needs to be kept in mind.

The advantage is that static parsers (that don’t want to because it’s expensive, or maybe can’t because they can’t run arbitrary Python) need to be able to rely on the data they are gathering from the pyproject.toml. Currently, there are only two options - fully statically specified (not in dynamic) or completely dynamic (not present and in dynamic).

Sorry, i slightly misphrased my question: Who would need a value being defined in [project] already knowing that it is unreliable because it is also defined in [project.dynamic] (as opposed to not displaying the value in [project] at all and making the fragment part of [project.dynamic])?

The working directory is just the same working directory as PEP 517, but stating that clearly, is that what you mean?

Yes please, a sentence like “The working directory when invoking get_requires_for_dynamic_metadata and dynamic_metadata MUST be the directory containing the pyproject.toml” would be great.

henryiii · March 28, 2023, 3:42pm

A few more surveys:

Keep get_requires_for_dynamic_metadata?

Keep it
Drop it

0 voters

Should we keep the multiple requests per plugin? That is, fields: list[str] in and dict[str, Any] out, rather than doing one call per plugin? The original design was based on supporting a potential expensive “native tool” call to collect metadata from CMake/Cargo/Meson/whatever. And inline settings were not present. But now we have backend-specific behavior is provider= is missing, and inline settings, so this is a lot less necessary. It’s not that complex to implement, but it’s a bit messy - and as an emergency workaround if it was missing, a slow tool could cache some intermediate state - should we keep it?

Keep the multiple keys in/out
Make it always one call per metadata value

0 voters

abravalheri · March 28, 2023, 3:43pm

Hi @konstin, probably some of your questions were already answered by Henry, but these are some comments that I hope complement that.

Other than whey’s use case for requires-python and classifiers (Configuration - whey 0.1.1 documentation), one could set requires-python based on which versions the test suite runs (e.g. by deriving the information from tox.ini or .github/workflows). Some of the classifiers (e.g. Environment::GPU::*, Framework::*) can also be inferred in a similar way.

steve.dower · March 28, 2023, 3:44pm

konsti:

Oh so the frontend logic would look like below and most tools would go with the fast, non-install path? Then this is a not a problem indeed.
if get_requires_for_dynamic_metadata := getattr(provider, "get_requires_for_dynamic_metadata", None):
    install(get_requires_for_dynamic_metadata(settings))

I’d assumed the “frontend” here would be the backend’s get_requires_for_build API, and it would augment its own list of dynamic requirements with the requirements of any plugins that it’s using.

So the frontend already supports everything it needs to, and it’s on the backends to implement consistent support for plugins.

konstin · March 28, 2023, 3:48pm

Sorry, yes, i meant the caller, which in this case is the build backend. The terminology gets a bit confusing because now there’s a frontend calling a backend through the same kind of hooks that the backend then calls the provider, where the backend now is actually the caller

henryiii · March 28, 2023, 4:04pm

This is true for any [transform]. And you could actually teach a different language how to read hatch-fancy-pypi-readme’s config and compute the readme if you really wanted too, it’s a simple regex procedure - one benefit of having a few common shared plugins rather than having to handle every backend seperatly.

But in general, use the README.* search if you want the GitHub readme (as that’s exactly what GitHub does, GitHub does not allow the README name to be configured). And run the Python hook if you need the final PyPI readme and readme is listed in dynamic. The is how it works today, and it wouldn’t really change.

Yes, the actual implementation is pretty much exactly that, you can see it in scikit-build-core. Though I’m currently updating scikit-build-core to be more in line with the current form of the proposal.

I’ve already stated this several times. The value here is not unreliable, that’s the point of the restriction - it is just incomplete if it’s also in dynamic. If you are building a static dependency graph, having the static dependencies is a huge upgrade from having nothing at all because everything is in some tool-specific config. In general, there are lots of use cases for ‘give me as many as you can’. If you truly need the full value and it’s in dynamic, then you have to run the hook - that doesn’t change what it is today.

For example:

[project.optional-dependencies]
test = ["pytest>=6"]
lint = ["ruff"]

[project.dynamic.optional-dependencies]
provider = "opt-dep-merger"
dev = ["test", "lint"]    # This portion is entirely
all = true                # up to the plugin

A static tool like those checking for missing deps in Conda-forge or a dependency graph tool can still compute possible optional dependencies (pytest, ruff). It just can’t tell you there’s a dev extra that’s the combination of test and lint, or an all extra that combines everything). Only if you need to know every possible extra do you need this knowledge, and often you don’t.

pf_moore · March 28, 2023, 4:34pm

I’m sorry, this example confuses me. [dynamic-metadata] isn’t a valid pyproject.toml key. Did you mean that to be tool-specific? Or are you basing the example on one of the “rejected ideas” from the proposal? Either way, I don’t understand what the comment “This part would be required for plugin interoperability” means - what plugins do you imagine interoperating here?

If I assume you mean

[project.dynamic]
metadata = {plugin="fancy_pypi_readme"}

then there’s a weird confusion in the existing proposal, because you have project.dynamic specified twice, both as a list of strings and as a table with the key metadata and the value {plugin="fancy_pypi_readme"}. @henryiii what’s going on here? How should @abravalheri’s example actually be specified, if we assume no plugin options and a tool.fancy_pypi_readme key?

Also, given that even if we have pugin options, plugins can still support a tool.plugin key, why not simplify the proposal for now by omitting the options, and then have a follow-up proposal to add plugin options if real-world experience proves that a tool-specific section is too clumsy?

I think that the fundamental issue here is the dual-purpose use of dynamic that I flagged above. On the other hand, I don’t personally think there’s any big issue with grouping the information as “description comes from plugin X” as one piece of information, and “plugin X uses the following parameters” as a separate one. This is a matter of opinion, admittedly, but personally, I prefer to start with a simpler proposal and add complexity when we identify objective problems with the simpler approach.

I think there’s a philosophical question that the PEP exposes here - is the plugin part of the backend (and as such, the user is communicating with the backend) or is it an independent tool that the backend calls? I’m assuming it’s an independent tool, as my reading of the PEP is that it’s to allow plugins to be specified independently of the backend (so that there’s a single scm plugin rather than setuptools-scm, hatchling-scm, etc, etc). If the plugin is an independent tool, having a plugin-specific section in the pyproject.toml` is the obvious way to configure it, as far as I’m concerned.

pf_moore · March 28, 2023, 4:44pm

That sounds like a violation of the spirit of PEP 621, if not the actual specification. I would assume that

[project]
readme = "Readme.md"

with no dynamic would mean that if I write a tool that introspects pyproject.toml, I am allowed to assume that I can read the contents of Readme.md, and that is precisely the data that will end up in the metadata in the sdist and wheel. Allowing backends to write metadata that cannot be read statically without invoking the backend seems to be precisely what the dynamic list is intended to avoid.

@brettcannon as author of PEP 621 do you agree with my interpretation here? If so, I’d appreciate if someone could submit a clarification to the specification to make that explicit.

abravalheri · March 28, 2023, 5:02pm

In the example I tried to imagine what would be a “fully decentralized” approach.
It does consider one of the rejected ideas.

I believe that it would be the most conservative approach with the least effort in terms of standardisation, right? (I mean, it would not change the way dynamic works nowadays, it would only standardize the minimum necessary for the interoperability between backend and plugin and it would delegate both plugin and backend configurations to [tool.*]).

I apologize for the confusion, the example I gave was just hypothetical. By exaggerating on the level of “decentralization” we can have a genuine feeling about how a “centralized approach” is more convenient. For example, we can already start to see some value in grouping related information and avoiding duplication (e.g. the duplication of dynamic and the dynamic-metadata that is in the rejected ideas section).

I think this is a good opportunity to tidy up things in a way that is extensible and uniform (e.g. same mechanism for backends and plugins).

It is a slightly different topic, but from my experience in setuptools (with dynamic + tool.setuptools.dynamic), I would say users already think it is clumsy. The user focus is to describe how they way the dynamic information to be filled in, adding it to project.dynamic is only done because otherwise there is a build error.
Unifying the mechanism would improve this UI.

As far as I understood, the main responsible for calling the plugin is the backend.
My interpretation is that PEP describes to the user how to tell the backend the way they want the dynamic metadata to be filled in. When the user specifies provider=mymodule, they are telling the backend to call mymodule.dynamic_metadata(...). When the user passes other keys alongside provider, they are telling the backend to use those arguments in the call.
This is what makes the interoperability possible.

Proposal for dynamic metadata plugins

Dynamic metadata plugins

Need

Proposal

Implementing a metadata provider

Using a metadata provider

Supporting metadata providers:

Proposed changes in the semantics of project.dynamic

Examples & ideas:

Current PEP 621 backends & dynamic metadata

Rejected ideas

Notes on extra file generation

Config-settings

Passing the pyproject.toml as a dict

Shortcut for just selecting a provider

New section

Alternative proposal: new array section

Multiple plugins per field

Empty plugins (for side effects)

Proposed changes in the semantics of `project.dynamic`