PEP 621: Storing project metadata in pyproject.toml

Entries that I’m using in some of my projects’ setup.cfg that I didn’t see in the PEP:

  • package_data
  • data_files
  • platforms (not I ever use anything else than any here)

I suspect platforms was discussed and it was decided that it wasn’t in use enough to be included, especially as it’s recommended to be used only when the platform isn’t in the (fairly comprehensive) trove classifiers. Perhaps that should be stated in the “rejected ideas” section. In any case, I think the build-backend has a pretty good idea as to which platforms support the packages it builds.

Both package_data and data_files are build options, which you can see are under [options] in setup.cfg. They’re specific on how to build the package, not on its metadata

1 Like

It actually wasn’t, but your reasoning is correct. No tools surveyed (except setuptools) support this field, as far as I know. The specification does not aim to cover and replace eveything needed to generate Core Metadata, only those commonly used and have well-established static representations in the community. We can always add additional fields to the table if they become widespread and cause unnecessary migration overhead.

Please see “Specify files to include when building” heading under “Rejected Ideas”.

The fields you’ve mentioned don’t have a corresponding core metadata field, and are related to how the package is built - this is explicitly out of scope for this PEP.

I’d like to point out that, broadly speaking, I am in favor of this initiative (though I would guess I am the most skeptical of all the listed authors) — I agree with @bernatgabor’s point that this seems to be trying to go back to the bad old days of a single backend, but in practice there are just not that many ways to specify the package metadata (considering it all maps to the same static fields anyway), and this takes a decent amount of cognitive load off of backend designers.

That said, I want to point out a few concerns I have with the messaging around this PEP, because people are already hopelessly confused by the situation with pyproject.toml, and this looks likely to make it worse.

  1. People already think that pyproject.toml is some sort of replacement for setup.py and are constantly asking non-sensical questions like, “Should you use pyproject.toml or setup.py?” or “Can you achieve this with pyproject.toml?”. This proposal will make this 100x worse, because the word for “projects that use PEP 517/518 to specify build-system requirements” is still "pyproject.toml projects", and now you will also have metadata specified in pyproject.toml.

    It was already confusing enough to say, “The standardized pyproject.toml is about specifying information about your build system, and some build systems also keep their configuration there.” Now we have to go even more nuanced with, “There are two standardized tables in pyproject.toml, one is about specifying what build system you are using, and the other is a common format for package metadata that is used by different build systems. Many build tools also use pyproject.toml for their configuration about how to do the build (e.g. what to include in the package).”

  2. The Motivation section will confuse people even more, because it strongly implies that this is the solution to the problem that it’s difficult or impossible to get metadata without actually building the project, but this is not actually a good solution to that problem. In order for this to be a good solution to the problem, it would have to be widely adopted and most people would have to avoid any sort of dynamic metadata.

    An opt-in standard for the input to the core metadata files will make the easy path easy, but it won’t provide a more general solution as would be provided by standardizing the way the output of the “calculate core metadata files” gets included in sdists. People already don’t think terribly deeply about things like this, and I think the fact that the rhetoric in the Motivation section doesn’t match the actual PEP will make people even more confused about what this PEP does.

It is not encouraging that some of the only discussion of this I’m seeing on social media seems way off-the-mark in terms of what this will change: this post highlights that this “allows specifying multiple maintainers” (which implies that it actually changes something about the underlying metadata spec) and this post seems to think that this is a necessary step to allowing the use of pyproject.toml as “a complete alternative to setup.cfg/setup.py”.

Of course, I don’t know how to fix the problem of messaging, since Brett is pretty high-profile within the Python community and he’s been out there giving clear and concise explanations of this on podcasts and in blog posts, and it doesn’t seem to be penetrating (even among people like… tweeting about his clarifying posts). Still, it’s worth thinking about settling on a communication strategy and at least trying to address the ongoing problems we’re having communicating this information to the community.

3 Likes

Since @pganssle has expressed interest in discussing it further (and made an interesting suggestion for how we could tackle the Author/Maintainer situation), I’ve flagged this thread to request the discuss.python.org moderators to split out the discussion on the Author/Metadata fields into a dedicated thread.

@mods please keep this post in this thread, and I’ll edit it later to include the link to the new thread. :slight_smile:

Edit: The author/maintainer distinction problem it is!

I think this is 100% true, and giving separate names to pyproject.toml-based “build processes” as well as “configuration” is the way to go here. I really want us to improve the communication in this area.

I do think we shouldn’t have this discussion in this thread, and would like to suggest Name for pyproject.toml builds and PEPs as a better location for discussion the messaging and naming around pyproject.toml. :slight_smile:

Something I don’t see specified in the PEP, is in how to actually consume metadata, and how this PEP changes that.

So currently we have 3 types of “artifacts” in which we might want to introspect the metadata:

  • Wheels
  • Sdists
  • Arbitrary Directory

For terseness sake I’m going to speak only in a PEP 517 world, but the same ideas map to setuptools before it.

For wheels, I assume this PEP has no real effect on it, you wouldn’t expect to see a pyproject.toml file inside a wheel still, and you’d still be expected to parse the METADATA file as you do today.

However I’m unsure how this changes things for the other two cases… Currently for them you basically call an API which generates a METADATA file/structure and then parse that. With this PEP are we expecting consumers to start parsing pyproject.toml for metadata, and then fall back to the APIs and parsing a different format? Or are we expecting these to basically only be read directly by build tools, and consumers should still expect to only interact with the METADATA structure?

3 Likes

The section in the PEP saying

Finally, this PEP makes core metadata for projects statically defined. By being statically defined, metadata can be read more quickly and easily than if it were dynamically calculated. Tools which read and write metadata can do so without worrying about which build back-end the user specified the metadata for.

is intended to sanction this use¹. However, it explicitly wasn’t a goal to replace the existing mechanisms for extracting metadata, and that ambivalence comes through in the PEP.

Personally, I’d be strongly in favour of explicitly allowing this use (and exercising that right in pip, to extract name, version and dependency data as efficiently as possible). But I can’t speak for the other PEP authors over this.

¹ Although the presence of dynamic makes doing so a little more complex, as tools need to have a fallback mechanism in place.

I’m happy to tweak the motivation section as I personally don’t view it as that critical in how it’s written. Feel free to send a PR to change its wording or add a section that you feel better represents what you’re after.

No explicit expectations are being set, but that’s because I would argue sdists and arbitrary directories have no spec to begin with. :slight_smile: I don’t expect an sdist to have a METADATA file and I’m not aware of any PEP that says so either. If we decided to standardize sdists – which I hope we do someday – then it could be a possibility to say that pyproject.toml is expected to have [build-system] specified, but I personally wouldn’t go passed that requirement for quite some time.

The original reason we didn’t do this was due to the setuptools-scm users wanting their version number to be dynamically calculated. But perhaps pip can strongly encourage folks via benchmarks or something that using this speeds up installs?

@bernatgabor @pganssle and anyone else who has had issue with the Motivation section, I have tried to simplify it via https://github.com/python/peps/pull/1469. It also says more about what the PEP isn’t.

As before I won’t merge it until another co-author approves it.

(Somewhat off-topic, so if this needs further response I suggest we open a new thread) Pip already tries very hard to avoid a build step when getting metadata. I’d see pyproject.toml as being most useful for dependency data (which isn’t available elsewhere until you build) and for “source trees” (which have no filename to parse).

Dynamic versions annoy me as the maintainer of a metadata consumer, but the benefits of getting rid of them are likely far more to do with code complexity than performance (sadly).

2 Likes

I have merged the simplified Motivation section thanks to Dustin and Pradyun’s reviews!

I think that people would simply not adopt this [project] table if it were not possible to specify at least the version dynamically at build time, and probably many other things. It would be a pretty serious regression, even if you ignore everyone who wants their git metadata to be the single source of the version number.

I would be somewhat surprised if this significantly improved the situation with regards to the ability to dynamically process metadata. It will probably be convenient when someone happens to use it and happens to not use any dynamic fields, but I think the real solution to the “static metadata” problem is to standardize metadata in sdists. Once setuptools and other build tools start rolling out changes that standardize metadata, huge swathes of the ecosystem will be opted in automatically, and it should be canonical. I can imagine situations where “parse the metadata from pyproject.toml without a build” might be useful, I don’t think it will be a significant long-term or short-term solution (unless we get this done and then really drop the ball on metadata-in-sdist, or it gets super wide adoption immediately) for things like building a resolver.

I say this not to just try to shoot down the idea that we should be parsing metadata from this, but to try and frame the discussion a bit about what we should be optimizing for. With regards to “static metadata”, I think the ordering of our priorities should be:

  1. Pave the way for a future where sdists can contain useful and canonical static metadata files.
  2. Make it easy for back-ends to adopt this without any loss of functionality.
  3. Design it in such a way that tools can know when it’s safe to parse static metadata directly from the pyproject.toml.

I think the PEP as it stands does a good job of this: fields that are tool-provided are marked as such, no existing backends care about the few fields that are not allowed to be provided dynamically (as far as I know).

3 Likes

I would be quite happy to start the discussion of standardizing sdists after this PEP is done as that is something I would like to see dealt with in some form. Maybe this will inform that work by saying anything in dynamic must be provided in some supplementary form. Maybe it doesn’t come into play at all and it simply acts as a way to encourage users and build back-ends to specify as much as they can statically upfront and it’s more useful for tools analyzing projects from their source code.

And so I’m happy to have that general goal of dealing with the sdist standardization in the back of our head, but I don’t know if it will necessarily dictate how this PEP turns out.

1 Like

This might be wrong, but I kind of feel like this PEP should probably be targeted specifically at build tools as the consumer of this static metadata for producing builds, and non build tools should still be expected to continue to go through the existing hooks.

I worry that the current PEP gives the impression that given a sdist, you should start reading from this file, and I think that’s the wrong path to go down. There are things that are OK to be dynamic when you’re in development or producing packages, but once you’re in a sdist should no longer be dynamic. Version is a big one that comes to mind. This PEP doesn’t explicitly tell people to start doing that, but it might be good to explicitly call this out if people agree?

3 Likes

If you look in the Motivation section there’s only a single bullet point that doesn’t say “build back-end”.

I think both you and @pganssle have some notions about sdists which are not written down anywhere and thus have not been fully communicated since sdists are obviously not fully static by definition since there is no definition :wink:.

If it will make you and @pganssle more comfortable with this PEP then I am fine with explicitly stating in the Motivation that this is meant for people working from a source checkout to either analysis purposes or for a build back-end to produce an artifact at which point the build artifact’s metadata is considered canonical. In the eyes of this PEP, a source checkout an sdist is a build artifact and not equivalent to a source checkout.

1 Like

@dstufft @pganssle and co-authors: I opened https://github.com/python/peps/pull/1474 to clarify how what this PEP proposes shouldn’t be considered the metadata for an sdist. As usual I won’t merge until I have co-author sign-off.

And I got the sign-off, so the change in the Motivation section has been made.

1 Like

It seems like a lot of information at the top of a file, can it be put at the end of the file as standard? In ‘normal’ writing you would put acknowledgements, references etc at the end to save clutter.