PEP 621: round 3

ahem :slightly_smiling_face:

I build adhoc tools like this a lot. But if you want the really big potential consumer of this data for me, pip is probably the one to look at. What pip most needs from a sdist is name and version (which we get from the filename) and dependencies. I imagine a bunch of dependency data will be deferred to wheel build time, but we could get a lot of benefit from reliable dependency metadata in sdists.

When resolving an install request, pip downloads distributions from the package index to get dependency information. That’s costly and it forms a major performance hit for pip’s new resolver. For wheels, all we have to do is download and extract the metadata. For sdists, we have to download, unpack, set up an isolated build environment, and call the PEP 517 metadata hook. If we had reliable PEP 621 metadata in the sdist, we could check for non-dynamic dependency data, and if it’s there, bypass all of that. (And to be clear, the build cost isn’t something that’s going to get paid anyway, it’s quite possible we will discard a sdist because it doesn’t lead to a valid result).

So I’m very much arguing that consumers are important from real-world requirements. And while I’d just as happily take a “standard sdist metadata” PEP, I’m frankly sick of getting bogged down in debate on that. The advantage of PEP 621 here is:

  1. It has a mechanism for saying “to be calculated later”.
  2. It defaults to static data - dynamic is explicitly opt-in.
  3. There’s no debate over the name of the flipping file, it’s pyproject.toml.

But if backends know the data, and don’t put it anywhere because we’re waiting for the mythical “standards sdist format”, I can’t use it in pip or anywhere else unless the user migrates their project to PEP 621. And waiting for users to adopt the new standard is probably just as slow as waiting for sdist standardisation.

So yes, in its new form, PEP 621 offers a significant benefit for pip. Probably other consumers as well, but I can’t speak for them. There’s a cost for backends in that they have to update pyproject.toml, but honestly I don’t think that’s a big chore.

I apologise if I assumed incorrectly that people realised the above. I’m pretty certain pip’s use case was presented during the initial PEP 621 discussions, but got dismissed for the same “we’re not standardising sdists” reasons. That’s one of the reasons I became less interested in PEP 621 - it was clearly only being targeted at being a common input format for backends. I was never involved as a backend developer, but as a consumer developer (IIRC, @pradyunsg and @uranusjr were in that category too¹). Fair enough, but we’ve now essentially failed to make it a common input format, at least to the extent that Poetry have said they are unlikely to adopt it for some time. So I was left wondering what’s left. When I realised that we could possibly bring back the benefits for pip, I suggested that to @brettcannon and he was willing to give that a go.

But without being a cross-backend format, and without being usable for tools that want to introspect source distributions, I’m not sure there’s enough left in PEP 621 to warrant standardising (as opposed to just the backends that want to have a common format getting together and agreeing one).

¹ And wasn’t @di involved for the Warehouse side? Surely Warehouse would be another case that would benefit from being able to read pyproject.toml for metadata known to be fixed across all distribution files for a project?

3 Likes

I don’t have much of a say on this, but I think we should be pragmatic here and just accept the current version to avoid further stagnation. It’s frankly a huge net win for backends and consumers alike.

Also, after a few years of gradual adoption by tools + projects, we’ll actually unlock the ability to accurately resolve dependencies for arbitrary platforms/environments!

2 Likes

That is not what I’m talking about. Consumers like pip need sdists with reliable metadata. I was talking about tools that parse repositories rather than any result of a build process.

There is no question that standardized sdist metadata would be useful for any number of use cases.

You can’t avoid the debate on it just because you repurposed a PEP that had broad agreement for a totally different purpose.

There’s a huge amount of debate of the name of the file, because pyproject.toml is a human-written file that changes the semantics of builds! It’s also a second core metadata spec when we already have a core metadata spec!

Slower. Which is why I said we should focus on sdist standardization if that’s what we want. It’s what I said when PEP 621 discussions started and you said, “it will take too long”, and now the result is that we designed something totally unsuitable as a standard store of sdist metadata because we never set out to design a store of sdist metadata.

Again, to be clear, no one is questioning the benefits of having a standardized store of metadata in sdists. I have already assumed that PEP 621 is not that, and was asking about people reading pyproject.toml in repositories. They would plausibly still get some value out of this even without the new pyproject.toml-rewriting scheme.

1 Like

The current version where backends need to re-write pyproject.toml? Strong -1 on that. If that happens, please take my name off the PEP. Obviously I wouldn’t be implementing this myself, so I can’t say whether or not it would be acceptable to other setuptools maintainers.

1 Like

I also find the requirement for backends to update an user provided/facing file (pyproject.toml) awkward.

And I’m not convinced it would actually be that useful:
the most obvious field crossing my mind that would be concerned by this “requirement” is version (with a setuptools-scm backend) but in the case of a sdist the version would also be present in the filename, so the benefit for pip is marginal.

Outside of this updating part, I’m in favour of making the provided information canonical, mainly as a way to inspect dependencies and/or license information of projects.
Plus I like the idea of standardising this “boring” part instead of every backends painting the shed slightly differently.

1 Like

OK, before I make a decision to reject my own PEP, I want to summarize what I’m hearing to see if there’s any agreement somewhere among people.

From pip/@pf_moore, the benefit of PEP 621 is the static sdist metadata. But then setuptools/@pganssle is saying they don’t want to do it this way, period.

From setuptools/@pganssle, there’s benefit to help getting more people to write static metadata as a back-end input source. But then pip/@pf_moore doesn’t really care since they only come into the situation at sdists or later. That’s not an outright “don’t you dare do it”, it’s just inconsequential to pip and so Paul just isn’t interested at that point.

Everyone seems to agree there is benefit to source checkout analysis. There also seems to be general agreement that a standardized way to specify how to write all of this stuff out for users is beneficial.

So here’s my blunt, to-the-point question to @pganssle: if I remove the sdist idea, what does PEP 621 need to make it good enough for setuptools to adopt it? If the answer is there isn’t anything that could make it acceptable then the PEP is dead. If there is something (i.e. all the project info fields like keywords are required or something), then we see if flit and pipenv are on board as well and if one of them signs up I think that’s enough to keep the PEP (I’m also assuming Steve will get his library support :wink:). We might need Paul to still be the delegate on behalf of back-end developers and not pip, or we ask someone else to rule from the back-end perspective.

But I think this is it. Either I get setuptools buy-in or I’m rejecting PEP 621 and moving on.

1 Like

I don’t think @pganssle here was saying setuptoools is not interested in adopting pep 621… Just that himself is not interested in doing so. It would be a major endeavor though to migrate from setup.py and setup.cfg, and given the dynamic nature of setup.py not even sure how much of it could be automated :man_shrugging: That being said setuptoools doesn’t have too many maintainers nowadays, so I feel if pip cares about making the change they should probably contribute (at least the initial) PR.

1 Like

I wasn’t expecting pipenv to get mentioned in the discussion, since pipenv does not currently involve in producing distributable Python packages, and PEP 621 as it currently stands is designed specifically for defining metadata for a distribution. So to make sure there’s no understanding here—in what form would pipenv be expected to adopt the format, if we want to?

1 Like

I’m fairly certain that was just a typo and was intended to say poetry.

1 Like

I don’t want to unilaterally act on setuptools’ behalf (particularly since, especially lately, @jaraco has done something between the lion’s share and all of the work), but to the extent that my opinion is what sways things, I plan to advocate for implementing the configuration file spec from PEP 621 regardless of whether or not it’s accepted (modulo table name). I think it’s a Good Enough Design™ (especially considering that it was designed by committee…).

The only reason I’m suddenly down on this PEP is that the latest iteration changed it into an sdist standardization PEP.

If we remove the sdist standardization stuff, I can see some marginal benefits of PEP 621 being accepted, but I can also see Paul’s point that standardizing the input is not that important, since the interop benefits are kinda weak, and without the interop benefits, it doesn’t need to be a standard, since anyone interested can just adopt it or not.

1 Like

It wasn’t a typo, it was me misremembering that pipenv doesn’t produce any binary artifacts.

Thanks, and I understand you can’t necessarily unilaterally speak for setuptools (unless @jaraco also chimes in :wink:).

@takluyver what do you think of the PEP?

1 Like

I would have thought one of the biggest reasons for standardizing the specification of a project’s metadata is the benefit to users, both project authors and library users. If in 5 years’ time 98% of all Python projects had the same way of specifying author/dependencies/entry-points/etc, then less Python-literate visitors to repos would have a better time analysing a prospective package to include in their application. Was this ever the case with setup.py?

1 Like

98% (made up number :-)) or projects already have a standardised way of specifying metadata, it’s setuptools. (Apologies to flit and Poetry if that 2% I left you is too small :slightly_smiling_face:).

What’s useful about PEP 621 is that it’s an introspectable standardised way.

My argument is that introspection is more often done on sdists than on original source trees, so having PEP 621 data be reliable only in source trees is relatively unimportant. @pganssle’s argument is that making PEP 621 data be as complete as possible in sdists subsumes sdist metadata standardisation.

1 Like

If a file is considered reliable in source trees, why wouldn’t this same un-updated file also be reliable in sdist ?

1 Like

To avoid derailing PEP 621, I’ve made a proposal here for taking the idea of dynamic and adding it to the core metadata for use in sdists.

If we can agree on that, I’m fine with abandoning the idea of backends writing to pyproject.toml. (If we can’t agree on that proposal, I may have to abandon backends writing to pyproject.toml anyway, but there’s less risk of PEP 621 being caught in the fallout if we can deal with sdist metadata separately).

1 Like

It will. But my point is that the amount of data that’s reliable will be too small to be helpful, if it’s just what the user specified in the source tree. I don’t expect to see a rush to switch from setup.py/setup.cfg to pyproject.toml, unless setuptools aggressively deprecates the older files, which I assume is unlikely. So most projects will remain all-dynamic for quite some time.

Hmm… I feel like we should allow for pip to reliably depend on the information in pyproject.toml and not mandate that the file be updated as part of the build process (I’d missed that in my initial reading of this).

This is possible by basically saying something like “if the metadata is specified in the pyproject.toml file, the build tools MUST fail if they would generate metadata that does not match that declaration (changed dependencies and whatnot). Other tooling may use the information in this table, as long as they respect the dynamic key and invoke the build tool for getting those values”.

That way, if pip sees the project.dependecies key in a pyproject.toml file, it can reliably depend on it for sdists (because sdist’s generated metadata would be tied to that user input) and it doesn’t introduce the quirks that are making people react as they are reacting right now.

Compared to iteration 2, then difference would then be “sdist metadata MUST match metadata specified by the user in the project table. Otherwise, the build backend MUST raise an error” replacing the sentences that were to the effect of “this metadata is not for anyone except the build backend”.

(yea, forgive my fairly rough phrasing here)


How does that sound to folks here?

My position (for clarity) is that I agree that we shouldn’t try to update this TOML file as a way to communicate sdist metadata. OTOH, until we have a proper mechanism that’s portable, it’ll be nice to allow tooling to depend on these static declarations made through this mechanism that we’re introducing here.

This also ties in nicely with @pf_moore’s proposal over at Sdists (again): Metadata standardisation incremental update because it’ll then allow the sdist metadata file itself to use the same model as this file around the dynamic key.

As I said, I think the reality is that (setuptools) users won’t migrate to pyproject.toml fast enough for it to be of any practical use. But if we can get somewhere with getting this into the sdist metadata instead then that becomes a non-issue.

Yeah, I think there’s no controversy about “anything specified in PEP 621 is canonical”. Without the part where pyproject.toml is an output of the build process, I think we’re all in agreement about all the other details of PEP 621 and the only question remaining is whether or not PEP 621 as an opt-in way for backends to accept metadata is worth standardizing at all.

I also agree with Paul that PEP 621 itself probably won’t have an appreciable impact on pip's ability to read static metadata for quite some time to come, whereas an output file should scale nicely and quickly.

Any users adopting PEP 621 (either — if it is accepted — in the project table or — if it is rejected — in a tool.backend table) would presumably make it easier for more of the sdist metadata to be reliable, and of course to the extent that people adopt it, it will make it easier for tools to scrape the data when available (if that’s the best way to get your stuff counted for vanity metrics it might actually spur some adoption of the format :sweat_smile:).

1 Like