PEP 621: Storing project metadata in pyproject.toml

I like specifying dependencies as the empty string extra name “” for requires-dist and with a non-empty string extra name for optional dependencies, but this has not caught on.

For the same reason, having fewer, more general concepts, I’m surprised scripts are special cased. But there’s a strong argument that they are in fact special compared to every other use of entry points.

Does this format intend to support directly executable .py or .sh scripts?

The format is intended to map directly to core metadata, so given that there’s no special case for executable .py or .sh scripts in the core metadata, then equally there’s no special provision for it in this PEP.

(That argument can be generalised to any case where people want to know “does this PEP support X” - show us where X is supported in the core metadata specs, and that’s essentially your answer :slightly_smiling_face:)

2 Likes

I’m really excited to see this PEP written up!

Under “Entry points”, “MUST not” should probably be “MUST NOT”; this is the only PEP that uses mixed capitalisation for that phrase (9 use all-caps, 44 use all-lowercase).

If a tool recognizes more extensions than this PEP, they MAY infer the content-type for the user.

Does this depends on "readme.content-type" being included in dynamic? (More generally: is listing non-top-level fields in dynamic allowed?)

Tools MAY support other encodings if they choose to.

Per the core metadata specification, “The only legal value [for charset] is UTF-8.” Would it be worth removing this sentence and altering the following two to something like:

Tools MAY support alternative content-types or charsets which they can transform to a format supported by the core metadata [1]. Otherwise tools MUST raise an error for unsupported content-types or charsets.

Interesting! I do think that there’s packages that do both – indicate their license as full text as well as just-the-name.

There are several packages where the license field include the text of the whole license instead of just a few letters indicating the given license.

FWIW, solving the “Python package licenses are a mess” with SPDX-based license declarations is out of scope for this PEP – that’s what “PEP 639: Improving license clarity with better package metadata - #74 by pradyunsg” is about. That said, as noted in the PEP (and reiterated by @brettcannon here), that is something that we’ve accommodated for – by keeping the string value of the license attribute “open”, so that it mapped to the SPDX-based license metadata field, if/when that gets formalized.

On the scripts, I agree that those are standardised in the output metadata, but I don’t think that necessarily implies they have to be standard in the input metadata. They’re saying far more about “how to install” than “what is being installed”. But at the same time, 99% of the “how” is “what pip does”, so perhaps that’s good enough to canonicalise for all packagers (rather than just backend developers)?

My understanding of the license field is that it’s meant as a fallback for when a classifier is not sufficient. So it ought to remain free text. Though I’d prefer to reference a file/URL rather than have it be the full text - particularly for licenses that include third party acknowledgements (such as Python itself). Those get very long.

I actually find this as one of the big advantages of different build backends, the fact that they are free to specify their configuration however they wish to do so. This means that they can do the simplest possible way for the scope they are targeting. If all build backends use the same configuration, and generate the same wheel what’s the point of having different build backends? Wouldn’t that just moves us back when distutils/setuptools was the only thing.

Wouldn’t the build backends documentation detail this? You can attach documentation URLs on PyPi, and I’d expect this to be the first thing on that documentation. How different is this from us writing this PEP, and then people have to find this PEP to figure things out?

This sounds very much like wishing back the days we had only setuptools and there was one place things could have been.

The argument in PEP517 from moving away from just setuptools was exactly that not tying down build backends allows innovation, people can now use other approaches than the one we just came up. PEP-517 and PEP-621 seems to work in opposite directions here. Sure now people don’t need to learn build backends, but build backends are now much more restricted with coming up better ways to define their build parameters, which removes motivation from creating and maintaining a build backend. We’re vendor locking all build backends to pyproject.toml.

IMHO this is the biggest plus side, and the PEP should probably start with this. Forcing everyone to use pyproject.toml while makes teaching easier, restricts innovation from build backends, so that’s more a negative than plus.

How is this working with allowing escape hatch of dynamic definition? Is it allowed to provide the metadata dynamically (e.g. where dynamic means read from setup.cfg and return the value)?

How would this work with setuptools attr: a.version? IMHO would be nice to address that use case directly, for example, it should not be able to use it, this is a dynamic provide?

I personally find the authors always a bit redundant. People tend to move on from projects, and new people step in to maintain it. As such the only up-to-date and useful list is of maintainers and as such would like to use maintainers instead of authors. I know authors is specified in core, but maybe we should take this opportunity to migrate to maintainers instead. The PyPi webpage thinks similarly setuptools · PyPI, as there’s a bigger focus on the page on maintainers than authors.

We should give some guidance on when the build backend may fill in things, and when not. E.g. if I manually provide Programming Language :: Python :: entries the backend probably should not extend this, however, if I let this part out entirely the build backend may add as much it can deduce. Just to preserve here sanity that I specified something in the config, and then the wheel generated contains something totally else.

Should we provide here a core set of URL keys recommended? Just to give some standardization for PyPi. Should it be home vs homepage, or doc page vs documentations, to name some. I’d expect to define a core keys for home, documentation, VCS URL and report issue urls.

Good observation! This was overlooked when I read the proposal. I would say no, since such metadata is otherwise invalid, so the existence of the unrecognised filename suffix implies the build tool should be consulted. But no matter what the decision is, it should be written down.

To be clear, build backends still can specify things whatever they want, if they choose to. I don’t believe the intention is to say “you must use these fields to specify metadata” (at least mine certainly is not), but more like ”this is a way to write those metadata that you can reasonably expect other tools to understand”.

Yes, but in which case you should list the fields you want to generate in dynamic.

Tools are also expected to dynamically generate the metadata if the [project] table is completely missing. (This is not in the PEP, but probably should be included for clarity.)

This was actually discussed extensively, and the conclusion was something more like “keep Authors and drop Maintainers, but make Authors semantically mean Maintainers” because tools in other ecosystem seem to suggest people don’t care about the difference anyway, and Authors is a more familiar term than Maintainers to many.

This was also discussed, but IIRC the consencus was to leave this out of this PEP because this should be discussed separately. I believe pypa/warehouse#5947 is the place for this.

To be fair, though, it is intended (at least I believe it is) that users can expect to write metadata in this form and not have to rewrite it when changing tools. That means there will be pressure on backends to support this format, even if only as one alternative.

Personally, I can see @bernatgabor’s point here. But we’re only saying how the user enters the metadata - the question of what metadata is allowed, and how it’s validated, is not part of this PEP So there’s not that much innovation being blocked here (and dynamic acts as an escape hatch in any case).

Would be nice to address this in the PEP.

Definitely needs to be included then.

I think this is useful. Like 75% of the other tools enscons also takes basically the setuptools setup() arguments in pyproject.toml under its own [tool.enscons] table. These are converted directly into the METADATA file. (Is there a PEP to METADATA implementation for us to use?). The divergence from setuptools happens in the build system, which determines how files, not metadata, winds up in the package. Those two processes appear to be totally independent of each other.

I can’t find such a case in the PEP.

I don’t think so. I’ll update the PEP.

This is covered by:

Tools MAY support alternative content-types which they can
transform to a content-type as supported by the `core metadata`_.

Everyone is going to have a different opinion as to what the most important reason why this PEP should exist. I don’t think arguing about the order is worth it as long as the key motivations for everyone is somehow captured.

Yes, it would be a dynamic provide.

As for an example, it’s just specifying dynamic = ["version"] and however setuptools choose to let people specify that as the way to get the version so I’m not sure what the benefit would be in tossing in such an example.

What the PEP is doing is what you’re suggesting, but standardizing on “author” instead of “maintainer”. I’ll call that out.

This is actually already a compromise of even allowing tools to do this as at least one person wants to just ditch all the trove classifiers we said to backfill. So we purposefully made it weak and underspecified as the assumption is the importance of the relevant classifiers will actually go away in the future.

That’s a PyPI question for which I have a year-old issue about. :grin:

I’ve opened Clarify points brought up from public consultation by brettcannon · Pull Request #1465 · python/peps · GitHub to address the above and will merge it once one other co-author approves the PR.

@pradyunsg already fixed it: PEP 621: Cosmetic updates by pradyunsg · Pull Request #1459 · python/peps · GitHub

1 Like

10 posts were split to a new topic: The author/maintainer distinction problem

Ah, I misread, I thought there was a readme.charset field.

Entries that I’m using in some of my projects’ setup.cfg that I didn’t see in the PEP:

  • package_data
  • data_files
  • platforms (not I ever use anything else than any here)

I suspect platforms was discussed and it was decided that it wasn’t in use enough to be included, especially as it’s recommended to be used only when the platform isn’t in the (fairly comprehensive) trove classifiers. Perhaps that should be stated in the “rejected ideas” section. In any case, I think the build-backend has a pretty good idea as to which platforms support the packages it builds.

Both package_data and data_files are build options, which you can see are under [options] in setup.cfg. They’re specific on how to build the package, not on its metadata

1 Like

It actually wasn’t, but your reasoning is correct. No tools surveyed (except setuptools) support this field, as far as I know. The specification does not aim to cover and replace eveything needed to generate Core Metadata, only those commonly used and have well-established static representations in the community. We can always add additional fields to the table if they become widespread and cause unnecessary migration overhead.

Please see “Specify files to include when building” heading under “Rejected Ideas”.

The fields you’ve mentioned don’t have a corresponding core metadata field, and are related to how the package is built - this is explicitly out of scope for this PEP.

I’d like to point out that, broadly speaking, I am in favor of this initiative (though I would guess I am the most skeptical of all the listed authors) — I agree with @bernatgabor’s point that this seems to be trying to go back to the bad old days of a single backend, but in practice there are just not that many ways to specify the package metadata (considering it all maps to the same static fields anyway), and this takes a decent amount of cognitive load off of backend designers.

That said, I want to point out a few concerns I have with the messaging around this PEP, because people are already hopelessly confused by the situation with pyproject.toml, and this looks likely to make it worse.

  1. People already think that pyproject.toml is some sort of replacement for setup.py and are constantly asking non-sensical questions like, “Should you use pyproject.toml or setup.py?” or “Can you achieve this with pyproject.toml?”. This proposal will make this 100x worse, because the word for “projects that use PEP 517/518 to specify build-system requirements” is still "pyproject.toml projects", and now you will also have metadata specified in pyproject.toml.

    It was already confusing enough to say, “The standardized pyproject.toml is about specifying information about your build system, and some build systems also keep their configuration there.” Now we have to go even more nuanced with, “There are two standardized tables in pyproject.toml, one is about specifying what build system you are using, and the other is a common format for package metadata that is used by different build systems. Many build tools also use pyproject.toml for their configuration about how to do the build (e.g. what to include in the package).”

  2. The Motivation section will confuse people even more, because it strongly implies that this is the solution to the problem that it’s difficult or impossible to get metadata without actually building the project, but this is not actually a good solution to that problem. In order for this to be a good solution to the problem, it would have to be widely adopted and most people would have to avoid any sort of dynamic metadata.

    An opt-in standard for the input to the core metadata files will make the easy path easy, but it won’t provide a more general solution as would be provided by standardizing the way the output of the “calculate core metadata files” gets included in sdists. People already don’t think terribly deeply about things like this, and I think the fact that the rhetoric in the Motivation section doesn’t match the actual PEP will make people even more confused about what this PEP does.

It is not encouraging that some of the only discussion of this I’m seeing on social media seems way off-the-mark in terms of what this will change: this post highlights that this “allows specifying multiple maintainers” (which implies that it actually changes something about the underlying metadata spec) and this post seems to think that this is a necessary step to allowing the use of pyproject.toml as “a complete alternative to setup.cfg/setup.py”.

Of course, I don’t know how to fix the problem of messaging, since Brett is pretty high-profile within the Python community and he’s been out there giving clear and concise explanations of this on podcasts and in blog posts, and it doesn’t seem to be penetrating (even among people like… tweeting about his clarifying posts). Still, it’s worth thinking about settling on a communication strategy and at least trying to address the ongoing problems we’re having communicating this information to the community.

3 Likes

Since @pganssle has expressed interest in discussing it further (and made an interesting suggestion for how we could tackle the Author/Maintainer situation), I’ve flagged this thread to request the discuss.python.org moderators to split out the discussion on the Author/Metadata fields into a dedicated thread.

@mods please keep this post in this thread, and I’ll edit it later to include the link to the new thread. :slight_smile:

Edit: The author/maintainer distinction problem it is!