PEP 621: Storing project metadata in pyproject.toml

I share the concerns other people have mentioned that this invites people to consume static metadata and ignore the possibility of it being dynamically generated. Yes, that’s technically against the spec, but given the broad confusion @pganssle pointed out, there’s an excellent chance people won’t know that. And it’s such an attractive shortcut - never mind about hooks, environment setup, installing build system dependencies; just read the information from a static file and you’re done!

So I can imagine that anything that 90% of projects specify statically becomes effectively mandatory as lots of smaller tools rely on it. Version would probably be safe from this because quite a lot of projects want to read it from somewhere else, but pretty much any other field could be affected.

I don’t know if this is necessarily bad overall. Many things would be easier with static metadata. But I think there are cases where you need to determine e.g. more specific runtime dependencies from the build step, and I don’t know how to ensure that when it’s probably only 0.1% of packages.

1 Like

One meta-comment on the PEP structure: it’s using the deprecated style where it tries to use the PEP itself as a living specification

Instead of doing that, the PEP should point to a PR that adds the proposed specification for tooling developers to use to https://packaging.python.org/specifications/, while the PEP focuses on the meta-commentary role of explaining not only what’s included in the specification (and why), but also what you’ve deliberately chosen to leave out of the specification.

Trying to have one document that both provides a readable specification for tooling developers and also provides the rationale for why that specification is the way it is for the benefit of reviewers forces trade-offs that we don’t actually need to make anymore.

2 Likes

Since I didn’t mention it earlier, I should note that I’m broadly in favour of this idea, but I share the concern raised by others that there are practical issues we need to work through to avoid encouraging the creation of artifact analysis tools that adopt an introspection approach that is simple, easy, and wrong.

I think the main way to tackle this would be for the PEP to explicitly allow build backends to mutate pyproject.toml when creating the sdist. I’m less worried about it for source directories, as there’s a simple self-selection process:

  • for use within a project, established projects simply won’t adopt tools that don’t support the metadata input format that they use, while fans of a particular tool are likely to be willing to adapt their metadata input practices to conform to its limitations
  • for broad analysis across multiple projects, tools already have to deal with all kinds of malformed input, so their authors aren’t likely to be tempted by attractive shortcuts when specs clearly spell out why the shortcut isn’t enough to cover the general case

In this initial iteration of the PEP, that could take the form of the following statement:

When build tools are constructing an sdist from a source directory they MUST delete the [project] table (if present) from pyproject.toml. A future PEP will cover a standardised mechanism that allows inclusion of static project metadata in an sdist when that metadata will be identical across all wheels and local package installations derived from the sdist.

As my current expectation is that any such future PEP would allow sdists to include metadata in a format that looks more like wheel and installation DB metadata, requiring build tools to delete the [project] table eliminates the potential for that table to become an attractive nuisance to authors of code that looks at sdists rather than source directories.

If we change our mind about that later, “build tools don’t need to delete the [project] table from pyproject.toml any more” is a much more manageable policy change than “ouch, there are all these already published sdists with confusing [project] tables that it’s now too late for us to do anything about”.

1 Like

I don’t think it is, the PEP explicitly describes this use-case.

Finally, this PEP is meant for (…) those doing analysis on a source checkout.

Given this is a supported use-case, I think with Brett’s changes it is now clear enough that the fields are dynamic. So the question we need to ask here is if this should be a supported use-case? I think maybe we should remove it as it gives the wrong impression. The usefulness of this is pretty low anyway, you can only rely on name being present.

name being required is already a tradeoff, backends like mesonpep517 would probably just want to fetch this information from an external source. It is not a big deal, but would be a nice to have :smiley:. If we remove the requirement, we now support this use-case, and mitigate the problem of people relying on static metadata for external tools.

People are gonna use this for static metadata either way, I think it’s better to not validate their use-case. It would still be possible, the PEP cannot ban this practice, but it can make it clear it was not design for it.

Exactly.

If we choose to keep this maybe we should add a reference to prepare_metadata_for_build_wheel, something like “this PEP can be used for this but what you probably really want to do is use prepare_metadata_for_build_wheel”.

1 Like

Well, actually sorry. I just re-read your reply and realized you were talking about explicitly ignoring the metadata is dynamic. My reply is meant for the use-case being consuming static metadata.

I think the point still stands, but I am not sure if other people will agree.

Note that prepare_metadata_for_build_wheel is extremely high cost (you need to set up an environment with the specified build requirements, then call the backend taking care to do so in a subprocess). Even with a library to do this all for you, the runtime cost is substantial.

Yes, it’s the “official” way to get metadata, but we need to consider practical needs here. We need a standard for getting static metadata from a sdist, and until that’s available, people will have to make awkward trade-offs. And for source trees, people will always have to make those awkward trade-offs - we’re never likely to have a static metadata standard for source trees other than this one.

I’m coming more and more to the conclusion that what we should have done was standardise sdist metadata before working on this PEP. But hopefully the work on this PEP will make the sdist metadata PEP easier to deliver, so it’s not like anything was wasted.

Nope. What they really want is to use the sdist metadata. But without access to Guido’s time machine, they’ll have to make do for now :slightly_smiling_face:

This is only true at the first call… And in practice most build-requires are just setuptools/wheel so most of the tools can reuse build environments, in case they are enquiring multiple projects one after another.

I understand that, but as far as I can tell, standardizing a way to define metadata in sdists is something we want to do and is planned. Do you think it makes sense to present this (edit: this = tools reading static metadata) as an alternative in the PEP itself? The PEP is something that will stay there forever, in years people will still be looking at it and using it to base their decisions. If we do propose this as an alternative and validate the use-case in the PEP, after we standardize sdist metadata people will still read it and think it’s a reasonable approach.

I understand that prepare_metadata_for_build_wheel is not optimal for some situations, but it works and I would consider it to be an acceptable semi-temporary solution. If for some reason that is incompatible with people’s needs, they can still read the static metadata in pyproject.toml, but this wouldn’t be recommended in the PEP. IMO it would be reasonable to add a reference to prepare_metadata_for_build_wheel.

I don’t think my proposal was very clear, so I will put it in a simpler way. It might also be useful for people who don’t want to read the full backlog.

I propose two things:

  • Remove the reference to static tools to using the static metadata. I am not saying the PEP should prohibit this usage, but it should not suggest it, or reference it, as it is not the its goal.
  • Optionally allow name to be dynamic. As far as I can tell, this field is only required because of the “tools reading static metadata” use-case. If that is dropped from the PEP this field can be removed, and we can just say that all metadata can be dynamic.

In practical terms, this would be basically the same as the current PEP minus the “tools can read static metadata” suggestion (can still be done, it is just not suggested or the intended use of the PEP).

Overall I think that would solve us quite a bit of trouble in the future.

We can fake that, though :wink: . This PEP isn’t accepted yet, so if someone has the capacity to start that discussion and drive that PEP I’m fine with putting this PEP on hold until an sdist PEP is accepted. If people are hoping I will drive an sdist PEP then they will quite possibly be waiting until August for that to even start based on other things going on in my open source life right now (those of you who know what I’m alluding to will understand, those who don’t should be glad that they don’t).

But do know that I will not be killing this PEP if we choose to tackle the sdist problem now; the pain point that this PEP is trying to solve does not go away with sdists being standardized.

1 Like

No, I definitely wasn’t thinking that. I was mostly just noting (and maybe being a little sad about) the fact that in open source we get things in the order that people want to do them, which isn’t always necessarily the best order when looked at globally.

I don’t have the bandwidth myself to push a sdist metadata PEP at the moment, so while I might be glad if someone else did, I’m not going to presume :slightly_smiling_face:

2 Likes

To be honest, I was originally going to start work on an sdist PEP after PEP 621 was settled and potential second packaging PEP that I have had preliminary discussions with someone about. So I actually already have an outline in my head of how I would standardize sdists, but I know for a fact my ideas will be somewhat controversial.

That means if someone doesn’t want to see my opinions getting pushed at the speed of open source (and if you have listened to things I have said over the years you can piece together my thoughts on this topic), you have some time to try to beat me to a PEP. :wink:

+1 on the PEP. It is an answer to the question “why are flit, poetry, others using tool-specific sections to write info in the same format”, and has all of these pros:

  • does not try to change author/maintainer meaning from current specs
  • does not favor one build backend, or prevent new ones from being invented
  • uses the string DSL for requirements defined in previous PEPs and already used by tools
  • does not change sdists

The primary goal is to bring a common format to build backends, and the PEP can also enable tools that do static analysis of source directories to find some or all metadata info from a unique file and format, without changing tools that inspect distributions.

The scope is narrow and it’s a consolidation for build backends awaited by their users. Good work!

3 Likes

I’m in the process of implementing this for Hatch.

For Improving license clarity with better package metadata, what do we imagine the 3rd/final license key being called? Or would license be a string rather than a table at that point?

License will be a string (with an expression syntax). I am not sure we have an agreement on making yet another new key, but I need to check the latest and submit the PEP for that :slight_smile:

I’m new to Python PEPs but i really like this idea. How long can it take until its approved/rejected?

Years, although in this PEP’s case I suspect we will have a decision by the end of the year because I won’t let it drag on. :grin:

We could propose a PEP to reduce PEPs discussion time :exploding_head:

Yea, I think that’s the intended course – making it simple to have the “best” format for license declaration.

IMO the only reason that intent is not written/expressed in the PEP is because the licenses PEP isn’t submitted to reference it. :slight_smile:

Here it is: https://www.python.org/dev/peps/pep-0639/

Future discussions should go to PEP 621: round 3.