PEP 621: Storing project metadata in pyproject.toml

bernatgabor · June 24, 2020, 9:34am

I actually find this as one of the big advantages of different build backends, the fact that they are free to specify their configuration however they wish to do so. This means that they can do the simplest possible way for the scope they are targeting. If all build backends use the same configuration, and generate the same wheel what’s the point of having different build backends? Wouldn’t that just moves us back when distutils/setuptools was the only thing.

Wouldn’t the build backends documentation detail this? You can attach documentation URLs on PyPi, and I’d expect this to be the first thing on that documentation. How different is this from us writing this PEP, and then people have to find this PEP to figure things out?

This sounds very much like wishing back the days we had only setuptools and there was one place things could have been.

The argument in PEP517 from moving away from just setuptools was exactly that not tying down build backends allows innovation, people can now use other approaches than the one we just came up. PEP-517 and PEP-621 seems to work in opposite directions here. Sure now people don’t need to learn build backends, but build backends are now much more restricted with coming up better ways to define their build parameters, which removes motivation from creating and maintaining a build backend. We’re vendor locking all build backends to pyproject.toml.

IMHO this is the biggest plus side, and the PEP should probably start with this. Forcing everyone to use pyproject.toml while makes teaching easier, restricts innovation from build backends, so that’s more a negative than plus.

How is this working with allowing escape hatch of dynamic definition? Is it allowed to provide the metadata dynamically (e.g. where dynamic means read from setup.cfg and return the value)?

How would this work with setuptools attr: a.version? IMHO would be nice to address that use case directly, for example, it should not be able to use it, this is a dynamic provide?

I personally find the authors always a bit redundant. People tend to move on from projects, and new people step in to maintain it. As such the only up-to-date and useful list is of maintainers and as such would like to use maintainers instead of authors. I know authors is specified in core, but maybe we should take this opportunity to migrate to maintainers instead. The PyPi webpage thinks similarly setuptools · PyPI, as there’s a bigger focus on the page on maintainers than authors.

We should give some guidance on when the build backend may fill in things, and when not. E.g. if I manually provide Programming Language :: Python :: entries the backend probably should not extend this, however, if I let this part out entirely the build backend may add as much it can deduce. Just to preserve here sanity that I specified something in the config, and then the wheel generated contains something totally else.

Should we provide here a core set of URL keys recommended? Just to give some standardization for PyPi. Should it be home vs homepage, or doc page vs documentations, to name some. I’d expect to define a core keys for home, documentation, VCS URL and report issue urls.

uranusjr · June 24, 2020, 11:13am

Good observation! This was overlooked when I read the proposal. I would say no, since such metadata is otherwise invalid, so the existence of the unrecognised filename suffix implies the build tool should be consulted. But no matter what the decision is, it should be written down.

To be clear, build backends still can specify things whatever they want, if they choose to. I don’t believe the intention is to say “you must use these fields to specify metadata” (at least mine certainly is not), but more like ”this is a way to write those metadata that you can reasonably expect other tools to understand”.

Yes, but in which case you should list the fields you want to generate in dynamic.

Tools are also expected to dynamically generate the metadata if the [project] table is completely missing. (This is not in the PEP, but probably should be included for clarity.)

This was actually discussed extensively, and the conclusion was something more like “keep Authors and drop Maintainers, but make Authors semantically mean Maintainers” because tools in other ecosystem seem to suggest people don’t care about the difference anyway, and Authors is a more familiar term than Maintainers to many.

This was also discussed, but IIRC the consencus was to leave this out of this PEP because this should be discussed separately. I believe pypa/warehouse#5947 is the place for this.

pf_moore · June 24, 2020, 12:09pm

To be fair, though, it is intended (at least I believe it is) that users can expect to write metadata in this form and not have to rewrite it when changing tools. That means there will be pressure on backends to support this format, even if only as one alternative.

Personally, I can see @bernatgabor’s point here. But we’re only saying how the user enters the metadata - the question of what metadata is allowed, and how it’s validated, is not part of this PEP So there’s not that much innovation being blocked here (and dynamic acts as an escape hatch in any case).

bernatgabor · June 24, 2020, 12:31pm

Would be nice to address this in the PEP.

Definitely needs to be included then.

dholth · June 24, 2020, 2:50pm

I think this is useful. Like 75% of the other tools enscons also takes basically the setuptools setup() arguments in pyproject.toml under its own [tool.enscons] table. These are converted directly into the METADATA file. (Is there a PEP to METADATA implementation for us to use?). The divergence from setuptools happens in the build system, which determines how files, not metadata, winds up in the package. Those two processes appear to be totally independent of each other.

brettcannon · June 25, 2020, 12:02am

I can’t find such a case in the PEP.

I don’t think so. I’ll update the PEP.

This is covered by:

Tools MAY support alternative content-types which they can
transform to a content-type as supported by the `core metadata`_.

Everyone is going to have a different opinion as to what the most important reason why this PEP should exist. I don’t think arguing about the order is worth it as long as the key motivations for everyone is somehow captured.

Yes, it would be a dynamic provide.

As for an example, it’s just specifying dynamic = ["version"] and however setuptools choose to let people specify that as the way to get the version so I’m not sure what the benefit would be in tossing in such an example.

What the PEP is doing is what you’re suggesting, but standardizing on “author” instead of “maintainer”. I’ll call that out.

This is actually already a compromise of even allowing tools to do this as at least one person wants to just ditch all the trove classifiers we said to backfill. So we purposefully made it weak and underspecified as the assumption is the importance of the relevant classifiers will actually go away in the future.

That’s a PyPI question for which I have a year-old issue about.

I’ve opened Clarify points brought up from public consultation by brettcannon · Pull Request #1465 · python/peps · GitHub to address the above and will merge it once one other co-author approves the PR.

dustin · June 25, 2020, 12:19am

@pradyunsg already fixed it: PEP 621: Cosmetic updates by pradyunsg · Pull Request #1459 · python/peps · GitHub

brettcannon · June 26, 2020, 12:56am

10 posts were split to a new topic: The author/maintainer distinction problem

sersorrel · June 25, 2020, 12:30am

brettcannon:

sersorrel:

Per the core metadata specification, “The only legal value [for charset ] is UTF-8.” Would it be worth removing this sentence and altering the following two to something like

This is covered by:
Tools MAY support alternative content-types which they can
transform to a content-type as supported by the `core metadata`_.

Ah, I misread, I thought there was a readme.charset field.

nschloe · June 25, 2020, 8:35am

Entries that I’m using in some of my projects’ setup.cfg that I didn’t see in the PEP:

package_data
data_files
platforms (not I ever use anything else than any here)

EpicWink · June 25, 2020, 8:44am

I suspect platforms was discussed and it was decided that it wasn’t in use enough to be included, especially as it’s recommended to be used only when the platform isn’t in the (fairly comprehensive) trove classifiers. Perhaps that should be stated in the “rejected ideas” section. In any case, I think the build-backend has a pretty good idea as to which platforms support the packages it builds.

Both package_data and data_files are build options, which you can see are under [options] in setup.cfg. They’re specific on how to build the package, not on its metadata

uranusjr · June 25, 2020, 10:17am

It actually wasn’t, but your reasoning is correct. No tools surveyed (except setuptools) support this field, as far as I know. The specification does not aim to cover and replace eveything needed to generate Core Metadata, only those commonly used and have well-established static representations in the community. We can always add additional fields to the table if they become widespread and cause unnecessary migration overhead.

pradyunsg · June 25, 2020, 11:33am

Please see “Specify files to include when building” heading under “Rejected Ideas”.

The fields you’ve mentioned don’t have a corresponding core metadata field, and are related to how the package is built - this is explicitly out of scope for this PEP.

pganssle · June 25, 2020, 1:43pm

I’d like to point out that, broadly speaking, I am in favor of this initiative (though I would guess I am the most skeptical of all the listed authors) — I agree with @bernatgabor’s point that this seems to be trying to go back to the bad old days of a single backend, but in practice there are just not that many ways to specify the package metadata (considering it all maps to the same static fields anyway), and this takes a decent amount of cognitive load off of backend designers.

That said, I want to point out a few concerns I have with the messaging around this PEP, because people are already hopelessly confused by the situation with pyproject.toml, and this looks likely to make it worse.

People already think that pyproject.toml is some sort of replacement for setup.py and are constantly asking non-sensical questions like, “Should you use pyproject.toml or setup.py?” or “Can you achieve this with pyproject.toml?”. This proposal will make this 100x worse, because the word for “projects that use PEP 517/518 to specify build-system requirements” is still "pyproject.toml projects", and now you will also have metadata specified in pyproject.toml.

It was already confusing enough to say, “The standardized pyproject.toml is about specifying information about your build system, and some build systems also keep their configuration there.” Now we have to go even more nuanced with, “There are two standardized tables in pyproject.toml, one is about specifying what build system you are using, and the other is a common format for package metadata that is used by different build systems. Many build tools also use pyproject.toml for their configuration about how to do the build (e.g. what to include in the package).”
The Motivation section will confuse people even more, because it strongly implies that this is the solution to the problem that it’s difficult or impossible to get metadata without actually building the project, but this is not actually a good solution to that problem. In order for this to be a good solution to the problem, it would have to be widely adopted and most people would have to avoid any sort of dynamic metadata.

An opt-in standard for the input to the core metadata files will make the easy path easy, but it won’t provide a more general solution as would be provided by standardizing the way the output of the “calculate core metadata files” gets included in sdists. People already don’t think terribly deeply about things like this, and I think the fact that the rhetoric in the Motivation section doesn’t match the actual PEP will make people even more confused about what this PEP does.

It is not encouraging that some of the only discussion of this I’m seeing on social media seems way off-the-mark in terms of what this will change: this post highlights that this “allows specifying multiple maintainers” (which implies that it actually changes something about the underlying metadata spec) and this post seems to think that this is a necessary step to allowing the use of pyproject.toml as “a complete alternative to setup.cfg/setup.py”.

Of course, I don’t know how to fix the problem of messaging, since Brett is pretty high-profile within the Python community and he’s been out there giving clear and concise explanations of this on podcasts and in blog posts, and it doesn’t seem to be penetrating (even among people like… tweeting about his clarifying posts). Still, it’s worth thinking about settling on a communication strategy and at least trying to address the ongoing problems we’re having communicating this information to the community.

pradyunsg · June 25, 2020, 3:32pm

Since @pganssle has expressed interest in discussing it further (and made an interesting suggestion for how we could tackle the Author/Maintainer situation), I’ve flagged this thread to request the discuss.python.org moderators to split out the discussion on the Author/Metadata fields into a dedicated thread.

@mods please keep this post in this thread, and I’ll edit it later to include the link to the new thread.

Edit: The author/maintainer distinction problem it is!

pradyunsg · June 25, 2020, 3:38pm

I think this is 100% true, and giving separate names to pyproject.toml-based “build processes” as well as “configuration” is the way to go here. I really want us to improve the communication in this area.

I do think we shouldn’t have this discussion in this thread, and would like to suggest Name for pyproject.toml builds and PEPs as a better location for discussion the messaging and naming around pyproject.toml.

dstufft · June 25, 2020, 5:10pm

Something I don’t see specified in the PEP, is in how to actually consume metadata, and how this PEP changes that.

So currently we have 3 types of “artifacts” in which we might want to introspect the metadata:

Wheels
Sdists
Arbitrary Directory

For terseness sake I’m going to speak only in a PEP 517 world, but the same ideas map to setuptools before it.

For wheels, I assume this PEP has no real effect on it, you wouldn’t expect to see a pyproject.toml file inside a wheel still, and you’d still be expected to parse the METADATA file as you do today.

However I’m unsure how this changes things for the other two cases… Currently for them you basically call an API which generates a METADATA file/structure and then parse that. With this PEP are we expecting consumers to start parsing pyproject.toml for metadata, and then fall back to the APIs and parsing a different format? Or are we expecting these to basically only be read directly by build tools, and consumers should still expect to only interact with the METADATA structure?

pf_moore · June 25, 2020, 7:11pm

The section in the PEP saying

Finally, this PEP makes core metadata for projects statically defined. By being statically defined, metadata can be read more quickly and easily than if it were dynamically calculated. Tools which read and write metadata can do so without worrying about which build back-end the user specified the metadata for.

is intended to sanction this use¹. However, it explicitly wasn’t a goal to replace the existing mechanisms for extracting metadata, and that ambivalence comes through in the PEP.

Personally, I’d be strongly in favour of explicitly allowing this use (and exercising that right in pip, to extract name, version and dependency data as efficiently as possible). But I can’t speak for the other PEP authors over this.

¹ Although the presence of dynamic makes doing so a little more complex, as tools need to have a fallback mechanism in place.

brettcannon · June 26, 2020, 12:54am

I’m happy to tweak the motivation section as I personally don’t view it as that critical in how it’s written. Feel free to send a PR to change its wording or add a section that you feel better represents what you’re after.

No explicit expectations are being set, but that’s because I would argue sdists and arbitrary directories have no spec to begin with. I don’t expect an sdist to have a METADATA file and I’m not aware of any PEP that says so either. If we decided to standardize sdists – which I hope we do someday – then it could be a possibility to say that pyproject.toml is expected to have [build-system] specified, but I personally wouldn’t go passed that requirement for quite some time.

The original reason we didn’t do this was due to the setuptools-scm users wanting their version number to be dynamically calculated. But perhaps pip can strongly encourage folks via benchmarks or something that using this speeds up installs?

brettcannon · June 26, 2020, 1:29am

@bernatgabor @pganssle and anyone else who has had issue with the Motivation section, I have tried to simplify it via https://github.com/python/peps/pull/1469. It also says more about what the PEP isn’t.

As before I won’t merge it until another co-author approves it.