PEP 621: Storing project metadata in pyproject.toml

It seems like a lot of information at the top of a file, can it be put at the end of the file as standard? In ‘normal’ writing you would put acknowledgements, references etc at the end to save clutter.

Hi, I finally got a chance to properly look over this PEP. I am a bit skeptical.

First of all, I agree it would be great to have a common standardized section to specify core metadata. I do, however, worry about the implications this will have due to the “escape hatch” mechanism.

My main worry is that external tool will start relying on this metadata to get information about the project, or something similar. That would not work well due to the dynamic mechanism. This might not be the intent of the PEP, but I think it is one very possible side-effect.

Actually, there is one sentence that seems to recommend this.

Finally, this PEP is meant for (…) those doing analysis on a source checkout.

But I am not sure my interpretation is correct.

There is no way you can rely on this metadata. Any mechanism that makes use if it will just implement heuristics, which obviously cannot be relied on, and will turn out to be a long term issue (let me clarify, it might work well short term, with a small sample size, but as you scale it, the bad design will start to show). I think this will be a very attractive option to people getting started designing their tools, and that’s what worries me. We can’t police every tool and let authors know this is a bad idea.

Although I agree it would be good to have a standardized place for the core metadata, I do need see a need for it. And I am not sure it is worth the possible issues introduced by this.

A possible way to mitigate the metadata being misused by external tools would be to maybe introduce PEP517-like hooks to fetch it. This would also allow the backend to do all the required normalizations, for eg. right now the only field you can rely on is name, and it needs to be normalized as per PEP503.

My goal with this reply is not to oppose the PEP but rather to question if this is indeed the best step forward? I do not believe there is any right solution, at least not at this moment. This is something that might work out well, or not, we will only see the results in the future. With all that said, I would advise you to be careful when proceeding, and to make sure you think everything through.

You 100% can rely on the data. At least, in the sense that if a field is not mentioned in dynamic you know the final value, and if it is you know that only the backend can tell you. People writing tools that don’t take that into account are simply not following the spec, and yes, that may happen, but we shouldn’t worry too much about that.

PEP 517 already has this hook, it’s prepare_metadata_for_build_wheel. People wanting to reliably introspect metadata should use that - at least until there’s a standardised sdist format, after which introspecting the sdist is another option.

As we’ve tried to indicate in the rationale, this PEP is defining how users write the metadata, so that backends can read it in a uniform manner. We acknowledge that people might use the data for other reasons, but it’s not the core purpose of the spec. The rationale section was recently revised to try to make that clearer - I don’t know if you’re reading the latest version, but if not check to see if that addresses your concerns at all.

1 Like

Yeah, of course, what I meant is that you cannot rely on the data being there.

Ah, right! I had forgotten that it could be used to achieve the same :sweat_smile:

Yes, I am looking at the latest version. I am not opposed to how the current text is written, but I would like it to be clearer that most fields can be dynamic. If someone does not read the PEP in its entirety, I think it’s easy for them to miss that. Again, the current text is fine, I would just like for it to be a little clear on this :smile:

This is structured how all PEPs are structured.

I tried to clarify this via PEP 621: clearly specify that metadata specified is static, but it's … · python/peps@7e4d254 · GitHub (I didn’t bother co-authors on this since it was just a clarification point of something the PEP already said).

1 Like

Due to how long this topic has already gotten, I have started PEP 621: how to specify dependencies? to explicitly discuss the one open issue in the PEP.

Are you talking about the PEP, or the pyproject.toml file?

Thanks! This addresses my concerns :grin:

2 Likes

I just committed a change where we reintroduce maintainers so there can be a separate discussion and potential PEP to deal with what the true differences between Author and Maintainer are (if any).

I share the concerns other people have mentioned that this invites people to consume static metadata and ignore the possibility of it being dynamically generated. Yes, that’s technically against the spec, but given the broad confusion @pganssle pointed out, there’s an excellent chance people won’t know that. And it’s such an attractive shortcut - never mind about hooks, environment setup, installing build system dependencies; just read the information from a static file and you’re done!

So I can imagine that anything that 90% of projects specify statically becomes effectively mandatory as lots of smaller tools rely on it. Version would probably be safe from this because quite a lot of projects want to read it from somewhere else, but pretty much any other field could be affected.

I don’t know if this is necessarily bad overall. Many things would be easier with static metadata. But I think there are cases where you need to determine e.g. more specific runtime dependencies from the build step, and I don’t know how to ensure that when it’s probably only 0.1% of packages.

1 Like

One meta-comment on the PEP structure: it’s using the deprecated style where it tries to use the PEP itself as a living specification

Instead of doing that, the PEP should point to a PR that adds the proposed specification for tooling developers to use to https://packaging.python.org/specifications/, while the PEP focuses on the meta-commentary role of explaining not only what’s included in the specification (and why), but also what you’ve deliberately chosen to leave out of the specification.

Trying to have one document that both provides a readable specification for tooling developers and also provides the rationale for why that specification is the way it is for the benefit of reviewers forces trade-offs that we don’t actually need to make anymore.

2 Likes

Since I didn’t mention it earlier, I should note that I’m broadly in favour of this idea, but I share the concern raised by others that there are practical issues we need to work through to avoid encouraging the creation of artifact analysis tools that adopt an introspection approach that is simple, easy, and wrong.

I think the main way to tackle this would be for the PEP to explicitly allow build backends to mutate pyproject.toml when creating the sdist. I’m less worried about it for source directories, as there’s a simple self-selection process:

  • for use within a project, established projects simply won’t adopt tools that don’t support the metadata input format that they use, while fans of a particular tool are likely to be willing to adapt their metadata input practices to conform to its limitations
  • for broad analysis across multiple projects, tools already have to deal with all kinds of malformed input, so their authors aren’t likely to be tempted by attractive shortcuts when specs clearly spell out why the shortcut isn’t enough to cover the general case

In this initial iteration of the PEP, that could take the form of the following statement:

When build tools are constructing an sdist from a source directory they MUST delete the [project] table (if present) from pyproject.toml. A future PEP will cover a standardised mechanism that allows inclusion of static project metadata in an sdist when that metadata will be identical across all wheels and local package installations derived from the sdist.

As my current expectation is that any such future PEP would allow sdists to include metadata in a format that looks more like wheel and installation DB metadata, requiring build tools to delete the [project] table eliminates the potential for that table to become an attractive nuisance to authors of code that looks at sdists rather than source directories.

If we change our mind about that later, “build tools don’t need to delete the [project] table from pyproject.toml any more” is a much more manageable policy change than “ouch, there are all these already published sdists with confusing [project] tables that it’s now too late for us to do anything about”.

1 Like

I don’t think it is, the PEP explicitly describes this use-case.

Finally, this PEP is meant for (…) those doing analysis on a source checkout.

Given this is a supported use-case, I think with Brett’s changes it is now clear enough that the fields are dynamic. So the question we need to ask here is if this should be a supported use-case? I think maybe we should remove it as it gives the wrong impression. The usefulness of this is pretty low anyway, you can only rely on name being present.

name being required is already a tradeoff, backends like mesonpep517 would probably just want to fetch this information from an external source. It is not a big deal, but would be a nice to have :smiley:. If we remove the requirement, we now support this use-case, and mitigate the problem of people relying on static metadata for external tools.

People are gonna use this for static metadata either way, I think it’s better to not validate their use-case. It would still be possible, the PEP cannot ban this practice, but it can make it clear it was not design for it.

Exactly.

If we choose to keep this maybe we should add a reference to prepare_metadata_for_build_wheel, something like “this PEP can be used for this but what you probably really want to do is use prepare_metadata_for_build_wheel”.

1 Like

Well, actually sorry. I just re-read your reply and realized you were talking about explicitly ignoring the metadata is dynamic. My reply is meant for the use-case being consuming static metadata.

I think the point still stands, but I am not sure if other people will agree.

Note that prepare_metadata_for_build_wheel is extremely high cost (you need to set up an environment with the specified build requirements, then call the backend taking care to do so in a subprocess). Even with a library to do this all for you, the runtime cost is substantial.

Yes, it’s the “official” way to get metadata, but we need to consider practical needs here. We need a standard for getting static metadata from a sdist, and until that’s available, people will have to make awkward trade-offs. And for source trees, people will always have to make those awkward trade-offs - we’re never likely to have a static metadata standard for source trees other than this one.

I’m coming more and more to the conclusion that what we should have done was standardise sdist metadata before working on this PEP. But hopefully the work on this PEP will make the sdist metadata PEP easier to deliver, so it’s not like anything was wasted.

Nope. What they really want is to use the sdist metadata. But without access to Guido’s time machine, they’ll have to make do for now :slightly_smiling_face:

This is only true at the first call… And in practice most build-requires are just setuptools/wheel so most of the tools can reuse build environments, in case they are enquiring multiple projects one after another.

I understand that, but as far as I can tell, standardizing a way to define metadata in sdists is something we want to do and is planned. Do you think it makes sense to present this (edit: this = tools reading static metadata) as an alternative in the PEP itself? The PEP is something that will stay there forever, in years people will still be looking at it and using it to base their decisions. If we do propose this as an alternative and validate the use-case in the PEP, after we standardize sdist metadata people will still read it and think it’s a reasonable approach.

I understand that prepare_metadata_for_build_wheel is not optimal for some situations, but it works and I would consider it to be an acceptable semi-temporary solution. If for some reason that is incompatible with people’s needs, they can still read the static metadata in pyproject.toml, but this wouldn’t be recommended in the PEP. IMO it would be reasonable to add a reference to prepare_metadata_for_build_wheel.

I don’t think my proposal was very clear, so I will put it in a simpler way. It might also be useful for people who don’t want to read the full backlog.

I propose two things:

  • Remove the reference to static tools to using the static metadata. I am not saying the PEP should prohibit this usage, but it should not suggest it, or reference it, as it is not the its goal.
  • Optionally allow name to be dynamic. As far as I can tell, this field is only required because of the “tools reading static metadata” use-case. If that is dropped from the PEP this field can be removed, and we can just say that all metadata can be dynamic.

In practical terms, this would be basically the same as the current PEP minus the “tools can read static metadata” suggestion (can still be done, it is just not suggested or the intended use of the PEP).

Overall I think that would solve us quite a bit of trouble in the future.

We can fake that, though :wink: . This PEP isn’t accepted yet, so if someone has the capacity to start that discussion and drive that PEP I’m fine with putting this PEP on hold until an sdist PEP is accepted. If people are hoping I will drive an sdist PEP then they will quite possibly be waiting until August for that to even start based on other things going on in my open source life right now (those of you who know what I’m alluding to will understand, those who don’t should be glad that they don’t).

But do know that I will not be killing this PEP if we choose to tackle the sdist problem now; the pain point that this PEP is trying to solve does not go away with sdists being standardized.

1 Like

No, I definitely wasn’t thinking that. I was mostly just noting (and maybe being a little sad about) the fact that in open source we get things in the order that people want to do them, which isn’t always necessarily the best order when looked at globally.

I don’t have the bandwidth myself to push a sdist metadata PEP at the moment, so while I might be glad if someone else did, I’m not going to presume :slightly_smiling_face:

2 Likes

To be honest, I was originally going to start work on an sdist PEP after PEP 621 was settled and potential second packaging PEP that I have had preliminary discussions with someone about. So I actually already have an outline in my head of how I would standardize sdists, but I know for a fact my ideas will be somewhat controversial.

That means if someone doesn’t want to see my opinions getting pushed at the speed of open source (and if you have listened to things I have said over the years you can piece together my thoughts on this topic), you have some time to try to beat me to a PEP. :wink: