PEP 621: round 3

A separate reply because it’s at least partially unrelated to my other criticism:

Assuming we indeed want to specify that some fields must be static in the source distribution (whether through the idea I criticized in my earlier post where backends modify pyproject.toml or by requiring them to be static at input), we should list those fields. I don’t know which fields PyPI exposes offhand, and if it ever changes in the future that would create uncertainty about whether the rule applies to a potentially evolving set of fields, or just the ones that were exposed in PyPI at the time that the PEP was passed.

1 Like

I don’t think it’s a “backdoor” thing as the PEP now says it outright. Plus @pf_moore specifically likes/wants it. :grin:

Potentially, yes. If that’s not clear I’m happy to try and clarify the wording if you have suggestions.

But we don’t have that it in sdists right now either. This is one of those things where I think getting consensus may be hard since everyone is going to have a personal preference with no outright winner, especially as there isn’t clear precedent for metadata in sdists.

Why is that wrong specifically? If you look at the fields listed for being static you will see they either already static in order to make the sdist file name or are static because they are necessary to communicate up to PyPI for uploading:

  1. name is necessary for the file name
  2. version is necessary for the file name
  3. description goes to PyPI
  4. readme goes to PyPI
  5. requires-python goes to PyPI
  6. license being dynamic seems rather odd for an sdist since you wouldn’t know what the license is otherwise
  7. authors and maintainers goes to PyPI
  8. keywords goes to PyPI
  9. classifiers goes to PyPI
  10. urls goes to PyPI

Those are the only fields with guidance on what to place in sdists and tools already have this information anyway when making an sdist for one of two reasons as mentioned above.

So is the concern more about having sdist build tools write or update pyproject.toml, or am I missing something more subtle about having these specific fields be static in an sdist?

If other people object we can entertain doing that, but I’m still going to push for the same thing and we have not exactly made headway on the sdist conversations, so I’m not sure what pushing that off does. To me, an sdist is a post-source artifact, not a pre-wheel artifact and that suggests to me keeping the metadata in the more human-readable format while also making it easier for wheel-buliding tools to have to potentially do a little less by having more data provided upfront that tools already calculated once before.

Just the ones that are in the PEP. Uploading any more details would mean new core metadata fields anyway which is a whole standards process on its own.

1 Like

Note in particular that even if we do ultimately standardise a different file (say METADATA) as the sdist metadata file, it will still be required that pyproject.toml and that file will have to contain the same data, as there’s no way it makes sense to have two files with different data. So what’s the harm in fixing the data now rather than later? As @brettcannon says, it’s all data that has to be known by the time we write a sdist anyway, so we’re not blocking any sort of actual flexibility, we’re just recording data that is otherwise currently getting lost in a non-standardised location.

And to go back to the point about this being a “backdoor”, I certainly don’t intend it to be unclear that we expect pyproject.toml in sdists to contain all of the data that is fixed, not just whatever the user specified in the original source tree. To the contrary, I consider with its previous wording, which basically said that the metadata shouldn’t be considered as canonical, the PEP didn’t offer sufficient benefits to be worth standardising, so I would have rejected it in that form anyway.

1 Like

I don’t strongly object to the changes, but agree with Paul that this was extremely unexpected.

My concerns:

  1. We should try to immediately get TOML support in the standard library. Until then, we should recommend a library so installs don’t pull in more than one of toml, tomlkit, etc.

  2. We should be much more explicit regarding the new sdist requirements. For example, backends would have to remove the dynamic field after each entry is made static, right?

  3. That is contrary to what most people say here: Purpose of an sdist

    Question: does pip treat an sdist as something that is installed directly or does it build a wheel from it first? It has been a while since I looked at the code.

1 Like

All the more reason to not go from a PEP where we have consensus (how to specify metadata for build backends) to a PEP where we we are very far from consensus (how to standardize sdists).

Well, for one thing, I really don’t like that we’re requiring backends to not only read but also emit TOML files now. Must we preserve comments and whitespace? Is there a facility for doing such a thing?

We’re also conflating inputs (pyproject.toml) with outputs (METADATA), and creating two different ways to specify the static metadata.

And this is and must be opt-in, so it’s not even a standardized place to look for metadata!

If we want standardized sdists, we should work on the PEP for standardized sdists. Many of our discussions to this point would have gone dramatically differently, in my opinion, if we had been designing a standardized place for build backends to store metadata.

Fine, we should reject the PEP then. It’s a bad design for a standardized mechanism for storing static metadata, and if the benefits as an input system aren’t good enough for it to be worth standardizing, we should just reject it.

2 Likes

That is contrary to what most people say here: Purpose of an sdist Question: does pip treat an sdist as something that is installed directly or does it build a wheel from it first? It has been a while since I looked at the code.
[/quote]

To reply to your points:

  1. That’s a separate issue. TOML has been used in packaging since PEP 518, so there’s nothing new here.
  2. The rules don’t change. Removing a field from dynamic is how you make it static. I don’t understand what you think is implicit here?
  3. I don’t really understand “post-source” vs “pre-wheel”. As far as I am concerned, sdists are built artifacts (created by backends from source trees) that are used to build wheels. The standards offer no way to install a sdist directly, so the only standards-compliant way of installing from a sdist is via a wheel. Pip does have legacy code that uses setuptools-specific mechanisms to install sdists directly, but (a) that’s backend-specific for setuptools, and (b) we are phasing that out.

That’s a fair comment - this is a new process. But I’m not sure it’s as big an issue as you’re suggesting. I’m happy to debate the implications, but I don’t see this as an immediate showstopper.

If the consensus is that the changes @brettcannon has made are unacceptable, and we go back to the previous version, then yes, the PEP will get rejected. But I don’t think we have consensus yet.

1 Like

I agree with @pganssle that it feels wrong to require the backend to modify pyproject.toml. I’m not against the backend needing to fill out metadata (I think it is a good idea for the reasons mentioned above, e.g. avoid inconsistencies between METADATA and pyproject.toml), but TOML is not a particular good format to modify in-place for later user consumption, at least with tools currently available in Python. Existing libraries tend to discard user formatting (the best available is TOMLKit, but it has many other problems). This would be problematic since pyproject.toml is ultimately a user-facing file, and users would be unhappy if they crack open an sdist for development and find pyproject.toml garbled.

Would it be better if instead of back-filling the information in-place, we make backends fill them into in a separate table? This can be easier to do without needing to rewrite hand-written data (e.g. serialise the table separately and append the string at the end of the file).

1 Like

If we were to split out the "backends must update pyproject.toml" question for a moment and just focus on question of what the proposal states must be canonical and fixed in a sdist, then is there any problem with the list? Because we’re going to get blocked on the “sdist standardisation” debate again if we’re not careful here, and I’d like to avoid that. On the other hand, one of my biggest complaints with the original PEP 621 is that it specified so little as required that it was essentially of no use to general consumers.

Another option would be to make all of the fields (with the exception of version) that round 3 states are required post-sdist not allowed to be dynamic in all situations. Version is a special case, and sdist consumers get that from the filename anyway, so I’m OK with treating that differently.

1 Like

Attempting to standardise sdists should trigger that debate :slight_smile:

If this proposal needs sdists to contain certain metadata beyond what is provided by the original source repository, then it’s trying to standardise sdists.

2 Likes

This is exactly what has happened with the expansion of the PEP into a mechanism for standardizing the metadata in an sdist. We in fact had basically this same discussion here.

I think this idea is too much in that it changes the nature of pyproject.toml in an undesirable way and it is not enough, in that it’s not going to be a very effective way of speeding anything up or improving the way tools can scrape metadata from the ecosystem. Even if we pass this as is, today, I don’t think a significant fraction of the ecosystem will have adopted it within 2 years. I imagine adoption will be particularly slow among big and fundamental projects that everyone depends on — which are likely to be more conservative with their build systems for various reasons. If we were to go with a “standardize the status quo” approach, we could upgrade huge swathes of the ecosystem overnight by implementing standardized sdists in setuptools.

Also, to circle back to the rejection reason:

To be clear, my suggestion was that PEP 621 data is canonical, when specified. The difference would be that pyproject.toml would remain purely an input file, and tools that need to access dynamic fields from pyproject.toml would still need to be resolved by a call prepare_metadata_for_build_wheel.

Is this something other backend authors agree on? I think the main benefit of something like this is that you can have a unified tutorial for how to specify a lot of what goes into your package, but I suppose it’s fair to say that since this doesn’t specify enough to actually build a package (doesn’t specify the contents), then it wouldn’t generate terribly useful tutorials (other than possibly a nice “how to specify dependencies” tutorial, which could have a decent amount of meat in it). The other benefit is that it allows people to write a standards-compliant PEP 621 → METADATA library for re-use by backend authors (though of course someone can just write one anyway and then “whatever that library does” becomes the standard).

If we’re pivoting, I say we pivot to an informational PEP intended to give backend authors a ready-made description of how to design a way to specify metadata in pyproject.toml.

1 Like

I’m not sure what you’re referring to here? That the previous version of the PEP didn’t offer sufficient benefits¹? When I was talking about benefits I was referring to consumers other than backends. Obviously such consumers can’t assume PEP 621, but if it is there, I’d rather it contained as much information as possible, otherwise what’s the point? If you’re focusing on backend authors, then that’s a different perspective - what are you seeing as benefits? @sdispater seemed fairly clear that (now we have PEP 508 strings for dependencies) Poetry won’t be adopting PEP 621 in the near term, so I doubt Poetry is relevant. Flit already has a syntax quite similar to PEP 621, and nowhere except pyproject.toml to get data from, so the only real change there is the section name. So who apart from setuptools are you looking at for agreement?

Maybe I’m making the wrong assumptions about how setuptools would adopt this. I’m presuming that it will be an optional alternative to setup.cfg/setup.py for a long time yet, simply for compatibility reasons. So there’s not going to be much to push users to switch. So I expect a significant majority of user code won’t have PEP 621 metadata for a long time yet. If setuptools just copies the existing files over to the sdist, then that’s true for sdists too. If setuptools writes PEP 621 data into sdists, we get a significant, and much faster, migration of sdists to have reliable PEP 621 metadata right from the start. To me, that’s a massive benefit for people writing tools to introspect sdists.

What am I missing?

An informational PEP would require each backend to use its own [tools.xxx] namespace, and would just define the format. That has even less benefit for non-backend consumers. I’m inclined to simply say let @brettcannon post the proposal somewhere, then setuptools and flit can adopt it and no need for any sort of PEP. We can save the PEP process for when/if we want to use the reserved namespace and make it a formal standard rather than an informational one.

¹ Edit: On re-reading, maybe you’re asking whether the new wording in PEP 621 is something other backend authors agree on. In which case the answer is we’re still discussing it, so I don’t know yet. But in that case I’m certainly not just looking for opinions from backend authors, I’m asking all interested parties, including people wanting to introspect sdists.

2 Likes

Is there a downside to accepting the PEP as it was previously and drafting a new PEP for sdist 2.0? We need to agree on the name too anyway: PEP 625: File name of a Source Distribution

As someone closely following all of these threads for my Hatch rewrite, it’s super surprising that this PEP went from what appeared to be consensus to blocked.

2 Likes

I believe this answers my question — if the other backend authors who participated in the process don’t find PEP 621 useful, then it should be dead in the water.

Yes, that is the benefit I am talking about — if we standardize where metadata goes in an sdist, setuptools can do it quietly, which is why I was excited about standardizing sdist metadata. We should definitely have a conversation about the best way to standardize sdist metadata and come up with an approach that will work well so that we can realize this benefit.

I think I’m relatively convinced that we should withdraw PEP 621. IMO it will still be useful because it’s a pretty good design for a static metadata spec, and people can adopt it whether or not it is accepted.

1 Like

I find it very useful at least :slightly_frowning_face: . I’m sure setuptools and flit would adopt it shortly too.

As-is or under [tool.<NAME>]?

1 Like

It’s the backend authors who have to adapt, but other consumers (particularly PyPI) who get the benefit.

As a (part-time) backend author, I’d prefer the sdist standard to be based on the wheel metadata than the user-written metadata. That way I don’t need to deal with two different output formats (but I also deliberately designed my backend to treat sdist as just a partially in-place compiled source directory, including rewriting the pyproject.toml completely, so there’s not a lot that happens in the sdist->wheel step).

But provided I’m using a library to read/validate/write the pyproject.toml file, it’s no big deal. I’d rather not have to encode all of the transform logic between PEP 621 and METADATA though.

2 Likes

What is it you find useful about it, though? I was always dubious about the prospect that this is solving a major problem. Because this doesn’t specify how a package is built, it’s not like this makes your pyproject.toml file interoperable between different backends.

If you just adopt PEP 621 under tool.hatch or tool.hatch:project (to answer your question) you get basically all the benefits. We can even write a library that parses these things directly into some sort of intermediate object that is capable of writing METADATA files (you tell it what the root table is and it just works).

If we assume that, because it only covers metadata, the benefits for documentation, project templating and backend switching are marginal (fair), it seems that the only thing backends would be getting out of this would be the ability to put their metadata in the [project] table rather than a tool-specific table, which is not such a big deal.

There is one place where I could see the idea of standardizing metadata being useful, though, which is for tools that seek to scrape metadata directly from repositories rather than source distributions. E.g. dependabot or libraries.io or whatever. Even in a world where source distributions are standardized, that would allow tools like that to avoid unnecessary sdist builds when the project they are analyzing uses PEP 621. Without PEP 621, such tools would either need to always execute builds or just special-case tool.setuptools (and maybe add a parser for poetry and maybe flit), and not be compatible with more marginal backends.

I don’t know that we ever got any input from someone building tools like this, and I don’t know how much of an important use case they are.

I mostly like:

  1. the familiarity of fields granted by the future network effect of widespread use, allowing for a mostly interoperable pyproject.toml file. similar to [requests|httpx].[get|post|...]
  2. quite simply, I like using standards. it reduces uncertainty and avoids having to re-invent the wheel, whether that be naming, implementation, etc.
  3. as you mention, a default place to look for dependency scanners
1 Like

ahem :slightly_smiling_face:

I build adhoc tools like this a lot. But if you want the really big potential consumer of this data for me, pip is probably the one to look at. What pip most needs from a sdist is name and version (which we get from the filename) and dependencies. I imagine a bunch of dependency data will be deferred to wheel build time, but we could get a lot of benefit from reliable dependency metadata in sdists.

When resolving an install request, pip downloads distributions from the package index to get dependency information. That’s costly and it forms a major performance hit for pip’s new resolver. For wheels, all we have to do is download and extract the metadata. For sdists, we have to download, unpack, set up an isolated build environment, and call the PEP 517 metadata hook. If we had reliable PEP 621 metadata in the sdist, we could check for non-dynamic dependency data, and if it’s there, bypass all of that. (And to be clear, the build cost isn’t something that’s going to get paid anyway, it’s quite possible we will discard a sdist because it doesn’t lead to a valid result).

So I’m very much arguing that consumers are important from real-world requirements. And while I’d just as happily take a “standard sdist metadata” PEP, I’m frankly sick of getting bogged down in debate on that. The advantage of PEP 621 here is:

  1. It has a mechanism for saying “to be calculated later”.
  2. It defaults to static data - dynamic is explicitly opt-in.
  3. There’s no debate over the name of the flipping file, it’s pyproject.toml.

But if backends know the data, and don’t put it anywhere because we’re waiting for the mythical “standards sdist format”, I can’t use it in pip or anywhere else unless the user migrates their project to PEP 621. And waiting for users to adopt the new standard is probably just as slow as waiting for sdist standardisation.

So yes, in its new form, PEP 621 offers a significant benefit for pip. Probably other consumers as well, but I can’t speak for them. There’s a cost for backends in that they have to update pyproject.toml, but honestly I don’t think that’s a big chore.

I apologise if I assumed incorrectly that people realised the above. I’m pretty certain pip’s use case was presented during the initial PEP 621 discussions, but got dismissed for the same “we’re not standardising sdists” reasons. That’s one of the reasons I became less interested in PEP 621 - it was clearly only being targeted at being a common input format for backends. I was never involved as a backend developer, but as a consumer developer (IIRC, @pradyunsg and @uranusjr were in that category too¹). Fair enough, but we’ve now essentially failed to make it a common input format, at least to the extent that Poetry have said they are unlikely to adopt it for some time. So I was left wondering what’s left. When I realised that we could possibly bring back the benefits for pip, I suggested that to @brettcannon and he was willing to give that a go.

But without being a cross-backend format, and without being usable for tools that want to introspect source distributions, I’m not sure there’s enough left in PEP 621 to warrant standardising (as opposed to just the backends that want to have a common format getting together and agreeing one).

¹ And wasn’t @di involved for the Warehouse side? Surely Warehouse would be another case that would benefit from being able to read pyproject.toml for metadata known to be fixed across all distribution files for a project?

3 Likes

I don’t have much of a say on this, but I think we should be pragmatic here and just accept the current version to avoid further stagnation. It’s frankly a huge net win for backends and consumers alike.

Also, after a few years of gradual adoption by tools + projects, we’ll actually unlock the ability to accurately resolve dependencies for arbitrary platforms/environments!

2 Likes

That is not what I’m talking about. Consumers like pip need sdists with reliable metadata. I was talking about tools that parse repositories rather than any result of a build process.

There is no question that standardized sdist metadata would be useful for any number of use cases.

You can’t avoid the debate on it just because you repurposed a PEP that had broad agreement for a totally different purpose.

There’s a huge amount of debate of the name of the file, because pyproject.toml is a human-written file that changes the semantics of builds! It’s also a second core metadata spec when we already have a core metadata spec!

Slower. Which is why I said we should focus on sdist standardization if that’s what we want. It’s what I said when PEP 621 discussions started and you said, “it will take too long”, and now the result is that we designed something totally unsuitable as a standard store of sdist metadata because we never set out to design a store of sdist metadata.

Again, to be clear, no one is questioning the benefits of having a standardized store of metadata in sdists. I have already assumed that PEP 621 is not that, and was asking about people reading pyproject.toml in repositories. They would plausibly still get some value out of this even without the new pyproject.toml-rewriting scheme.

1 Like