PEP 621: round 3

ahem :slightly_smiling_face:

I build adhoc tools like this a lot. But if you want the really big potential consumer of this data for me, pip is probably the one to look at. What pip most needs from a sdist is name and version (which we get from the filename) and dependencies. I imagine a bunch of dependency data will be deferred to wheel build time, but we could get a lot of benefit from reliable dependency metadata in sdists.

When resolving an install request, pip downloads distributions from the package index to get dependency information. Thatā€™s costly and it forms a major performance hit for pipā€™s new resolver. For wheels, all we have to do is download and extract the metadata. For sdists, we have to download, unpack, set up an isolated build environment, and call the PEP 517 metadata hook. If we had reliable PEP 621 metadata in the sdist, we could check for non-dynamic dependency data, and if itā€™s there, bypass all of that. (And to be clear, the build cost isnā€™t something thatā€™s going to get paid anyway, itā€™s quite possible we will discard a sdist because it doesnā€™t lead to a valid result).

So Iā€™m very much arguing that consumers are important from real-world requirements. And while Iā€™d just as happily take a ā€œstandard sdist metadataā€ PEP, Iā€™m frankly sick of getting bogged down in debate on that. The advantage of PEP 621 here is:

  1. It has a mechanism for saying ā€œto be calculated laterā€.
  2. It defaults to static data - dynamic is explicitly opt-in.
  3. Thereā€™s no debate over the name of the flipping file, itā€™s pyproject.toml.

But if backends know the data, and donā€™t put it anywhere because weā€™re waiting for the mythical ā€œstandards sdist formatā€, I canā€™t use it in pip or anywhere else unless the user migrates their project to PEP 621. And waiting for users to adopt the new standard is probably just as slow as waiting for sdist standardisation.

So yes, in its new form, PEP 621 offers a significant benefit for pip. Probably other consumers as well, but I canā€™t speak for them. Thereā€™s a cost for backends in that they have to update pyproject.toml, but honestly I donā€™t think thatā€™s a big chore.

I apologise if I assumed incorrectly that people realised the above. Iā€™m pretty certain pipā€™s use case was presented during the initial PEP 621 discussions, but got dismissed for the same ā€œweā€™re not standardising sdistsā€ reasons. Thatā€™s one of the reasons I became less interested in PEP 621 - it was clearly only being targeted at being a common input format for backends. I was never involved as a backend developer, but as a consumer developer (IIRC, @pradyunsg and @uranusjr were in that category tooĀ¹). Fair enough, but weā€™ve now essentially failed to make it a common input format, at least to the extent that Poetry have said they are unlikely to adopt it for some time. So I was left wondering whatā€™s left. When I realised that we could possibly bring back the benefits for pip, I suggested that to @brettcannon and he was willing to give that a go.

But without being a cross-backend format, and without being usable for tools that want to introspect source distributions, Iā€™m not sure thereā€™s enough left in PEP 621 to warrant standardising (as opposed to just the backends that want to have a common format getting together and agreeing one).

Ā¹ And wasnā€™t @di involved for the Warehouse side? Surely Warehouse would be another case that would benefit from being able to read pyproject.toml for metadata known to be fixed across all distribution files for a project?

3 Likes

I donā€™t have much of a say on this, but I think we should be pragmatic here and just accept the current version to avoid further stagnation. Itā€™s frankly a huge net win for backends and consumers alike.

Also, after a few years of gradual adoption by tools + projects, weā€™ll actually unlock the ability to accurately resolve dependencies for arbitrary platforms/environments!

2 Likes

That is not what Iā€™m talking about. Consumers like pip need sdists with reliable metadata. I was talking about tools that parse repositories rather than any result of a build process.

There is no question that standardized sdist metadata would be useful for any number of use cases.

You canā€™t avoid the debate on it just because you repurposed a PEP that had broad agreement for a totally different purpose.

Thereā€™s a huge amount of debate of the name of the file, because pyproject.toml is a human-written file that changes the semantics of builds! Itā€™s also a second core metadata spec when we already have a core metadata spec!

Slower. Which is why I said we should focus on sdist standardization if thatā€™s what we want. Itā€™s what I said when PEP 621 discussions started and you said, ā€œit will take too longā€, and now the result is that we designed something totally unsuitable as a standard store of sdist metadata because we never set out to design a store of sdist metadata.

Again, to be clear, no one is questioning the benefits of having a standardized store of metadata in sdists. I have already assumed that PEP 621 is not that, and was asking about people reading pyproject.toml in repositories. They would plausibly still get some value out of this even without the new pyproject.toml-rewriting scheme.

1 Like

The current version where backends need to re-write pyproject.toml? Strong -1 on that. If that happens, please take my name off the PEP. Obviously I wouldnā€™t be implementing this myself, so I canā€™t say whether or not it would be acceptable to other setuptools maintainers.

1 Like

I also find the requirement for backends to update an user provided/facing file (pyproject.toml) awkward.

And Iā€™m not convinced it would actually be that useful:
the most obvious field crossing my mind that would be concerned by this ā€œrequirementā€ is version (with a setuptools-scm backend) but in the case of a sdist the version would also be present in the filename, so the benefit for pip is marginal.

Outside of this updating part, Iā€™m in favour of making the provided information canonical, mainly as a way to inspect dependencies and/or license information of projects.
Plus I like the idea of standardising this ā€œboringā€ part instead of every backends painting the shed slightly differently.

1 Like

OK, before I make a decision to reject my own PEP, I want to summarize what Iā€™m hearing to see if thereā€™s any agreement somewhere among people.

From pip/@pf_moore, the benefit of PEP 621 is the static sdist metadata. But then setuptools/@pganssle is saying they donā€™t want to do it this way, period.

From setuptools/@pganssle, thereā€™s benefit to help getting more people to write static metadata as a back-end input source. But then pip/@pf_moore doesnā€™t really care since they only come into the situation at sdists or later. Thatā€™s not an outright ā€œdonā€™t you dare do itā€, itā€™s just inconsequential to pip and so Paul just isnā€™t interested at that point.

Everyone seems to agree there is benefit to source checkout analysis. There also seems to be general agreement that a standardized way to specify how to write all of this stuff out for users is beneficial.

So hereā€™s my blunt, to-the-point question to @pganssle: if I remove the sdist idea, what does PEP 621 need to make it good enough for setuptools to adopt it? If the answer is there isnā€™t anything that could make it acceptable then the PEP is dead. If there is something (i.e. all the project info fields like keywords are required or something), then we see if flit and pipenv are on board as well and if one of them signs up I think thatā€™s enough to keep the PEP (Iā€™m also assuming Steve will get his library support :wink:). We might need Paul to still be the delegate on behalf of back-end developers and not pip, or we ask someone else to rule from the back-end perspective.

But I think this is it. Either I get setuptools buy-in or Iā€™m rejecting PEP 621 and moving on.

1 Like

I donā€™t think @pganssle here was saying setuptoools is not interested in adopting pep 621ā€¦ Just that himself is not interested in doing so. It would be a major endeavor though to migrate from setup.py and setup.cfg, and given the dynamic nature of setup.py not even sure how much of it could be automated :man_shrugging: That being said setuptoools doesnā€™t have too many maintainers nowadays, so I feel if pip cares about making the change they should probably contribute (at least the initial) PR.

1 Like

I wasnā€™t expecting pipenv to get mentioned in the discussion, since pipenv does not currently involve in producing distributable Python packages, and PEP 621 as it currently stands is designed specifically for defining metadata for a distribution. So to make sure thereā€™s no understanding hereā€”in what form would pipenv be expected to adopt the format, if we want to?

1 Like

Iā€™m fairly certain that was just a typo and was intended to say poetry.

1 Like

I donā€™t want to unilaterally act on setuptoolsā€™ behalf (particularly since, especially lately, @jaraco has done something between the lionā€™s share and all of the work), but to the extent that my opinion is what sways things, I plan to advocate for implementing the configuration file spec from PEP 621 regardless of whether or not itā€™s accepted (modulo table name). I think itā€™s a Good Enough Designā„¢ (especially considering that it was designed by committeeā€¦).

The only reason Iā€™m suddenly down on this PEP is that the latest iteration changed it into an sdist standardization PEP.

If we remove the sdist standardization stuff, I can see some marginal benefits of PEP 621 being accepted, but I can also see Paulā€™s point that standardizing the input is not that important, since the interop benefits are kinda weak, and without the interop benefits, it doesnā€™t need to be a standard, since anyone interested can just adopt it or not.

1 Like

It wasnā€™t a typo, it was me misremembering that pipenv doesnā€™t produce any binary artifacts.

Thanks, and I understand you canā€™t necessarily unilaterally speak for setuptools (unless @jaraco also chimes in :wink:).

@takluyver what do you think of the PEP?

1 Like

I would have thought one of the biggest reasons for standardizing the specification of a projectā€™s metadata is the benefit to users, both project authors and library users. If in 5 yearsā€™ time 98% of all Python projects had the same way of specifying author/dependencies/entry-points/etc, then less Python-literate visitors to repos would have a better time analysing a prospective package to include in their application. Was this ever the case with setup.py?

1 Like

98% (made up number :-)) or projects already have a standardised way of specifying metadata, itā€™s setuptools. (Apologies to flit and Poetry if that 2% I left you is too small :slightly_smiling_face:).

Whatā€™s useful about PEP 621 is that itā€™s an introspectable standardised way.

My argument is that introspection is more often done on sdists than on original source trees, so having PEP 621 data be reliable only in source trees is relatively unimportant. @pganssleā€™s argument is that making PEP 621 data be as complete as possible in sdists subsumes sdist metadata standardisation.

1 Like

If a file is considered reliable in source trees, why wouldnā€™t this same un-updated file also be reliable in sdist ?

1 Like

To avoid derailing PEP 621, Iā€™ve made a proposal here for taking the idea of dynamic and adding it to the core metadata for use in sdists.

If we can agree on that, Iā€™m fine with abandoning the idea of backends writing to pyproject.toml. (If we canā€™t agree on that proposal, I may have to abandon backends writing to pyproject.toml anyway, but thereā€™s less risk of PEP 621 being caught in the fallout if we can deal with sdist metadata separately).

1 Like

It will. But my point is that the amount of data thatā€™s reliable will be too small to be helpful, if itā€™s just what the user specified in the source tree. I donā€™t expect to see a rush to switch from setup.py/setup.cfg to pyproject.toml, unless setuptools aggressively deprecates the older files, which I assume is unlikely. So most projects will remain all-dynamic for quite some time.

Hmmā€¦ I feel like we should allow for pip to reliably depend on the information in pyproject.toml and not mandate that the file be updated as part of the build process (Iā€™d missed that in my initial reading of this).

This is possible by basically saying something like ā€œif the metadata is specified in the pyproject.toml file, the build tools MUST fail if they would generate metadata that does not match that declaration (changed dependencies and whatnot). Other tooling may use the information in this table, as long as they respect the dynamic key and invoke the build tool for getting those valuesā€.

That way, if pip sees the project.dependecies key in a pyproject.toml file, it can reliably depend on it for sdists (because sdistā€™s generated metadata would be tied to that user input) and it doesnā€™t introduce the quirks that are making people react as they are reacting right now.

Compared to iteration 2, then difference would then be ā€œsdist metadata MUST match metadata specified by the user in the project table. Otherwise, the build backend MUST raise an errorā€ replacing the sentences that were to the effect of ā€œthis metadata is not for anyone except the build backendā€.

(yea, forgive my fairly rough phrasing here)


How does that sound to folks here?

My position (for clarity) is that I agree that we shouldnā€™t try to update this TOML file as a way to communicate sdist metadata. OTOH, until we have a proper mechanism thatā€™s portable, itā€™ll be nice to allow tooling to depend on these static declarations made through this mechanism that weā€™re introducing here.

This also ties in nicely with @pf_mooreā€™s proposal over at Sdists (again): Metadata standardisation incremental update because itā€™ll then allow the sdist metadata file itself to use the same model as this file around the dynamic key.

As I said, I think the reality is that (setuptools) users wonā€™t migrate to pyproject.toml fast enough for it to be of any practical use. But if we can get somewhere with getting this into the sdist metadata instead then that becomes a non-issue.

Yeah, I think thereā€™s no controversy about ā€œanything specified in PEP 621 is canonicalā€. Without the part where pyproject.toml is an output of the build process, I think weā€™re all in agreement about all the other details of PEP 621 and the only question remaining is whether or not PEP 621 as an opt-in way for backends to accept metadata is worth standardizing at all.

I also agree with Paul that PEP 621 itself probably wonā€™t have an appreciable impact on pip's ability to read static metadata for quite some time to come, whereas an output file should scale nicely and quickly.

Any users adopting PEP 621 (either ā€” if it is accepted ā€” in the project table or ā€” if it is rejected ā€” in a tool.backend table) would presumably make it easier for more of the sdist metadata to be reliable, and of course to the extent that people adopt it, it will make it easier for tools to scrape the data when available (if thatā€™s the best way to get your stuff counted for vanity metrics it might actually spur some adoption of the format :sweat_smile:).

1 Like