PEP 621: round 3

ahem :slightly_smiling_face:

I build adhoc tools like this a lot. But if you want the really big potential consumer of this data for me, pip is probably the one to look at. What pip most needs from a sdist is name and version (which we get from the filename) and dependencies. I imagine a bunch of dependency data will be deferred to wheel build time, but we could get a lot of benefit from reliable dependency metadata in sdists.

When resolving an install request, pip downloads distributions from the package index to get dependency information. That’s costly and it forms a major performance hit for pip’s new resolver. For wheels, all we have to do is download and extract the metadata. For sdists, we have to download, unpack, set up an isolated build environment, and call the PEP 517 metadata hook. If we had reliable PEP 621 metadata in the sdist, we could check for non-dynamic dependency data, and if it’s there, bypass all of that. (And to be clear, the build cost isn’t something that’s going to get paid anyway, it’s quite possible we will discard a sdist because it doesn’t lead to a valid result).

So I’m very much arguing that consumers are important from real-world requirements. And while I’d just as happily take a “standard sdist metadata” PEP, I’m frankly sick of getting bogged down in debate on that. The advantage of PEP 621 here is:

  1. It has a mechanism for saying “to be calculated later”.
  2. It defaults to static data - dynamic is explicitly opt-in.
  3. There’s no debate over the name of the flipping file, it’s pyproject.toml.

But if backends know the data, and don’t put it anywhere because we’re waiting for the mythical “standards sdist format”, I can’t use it in pip or anywhere else unless the user migrates their project to PEP 621. And waiting for users to adopt the new standard is probably just as slow as waiting for sdist standardisation.

So yes, in its new form, PEP 621 offers a significant benefit for pip. Probably other consumers as well, but I can’t speak for them. There’s a cost for backends in that they have to update pyproject.toml, but honestly I don’t think that’s a big chore.

I apologise if I assumed incorrectly that people realised the above. I’m pretty certain pip’s use case was presented during the initial PEP 621 discussions, but got dismissed for the same “we’re not standardising sdists” reasons. That’s one of the reasons I became less interested in PEP 621 - it was clearly only being targeted at being a common input format for backends. I was never involved as a backend developer, but as a consumer developer (IIRC, @pradyunsg and @uranusjr were in that category too¹). Fair enough, but we’ve now essentially failed to make it a common input format, at least to the extent that Poetry have said they are unlikely to adopt it for some time. So I was left wondering what’s left. When I realised that we could possibly bring back the benefits for pip, I suggested that to @brettcannon and he was willing to give that a go.

But without being a cross-backend format, and without being usable for tools that want to introspect source distributions, I’m not sure there’s enough left in PEP 621 to warrant standardising (as opposed to just the backends that want to have a common format getting together and agreeing one).

¹ And wasn’t @di involved for the Warehouse side? Surely Warehouse would be another case that would benefit from being able to read pyproject.toml for metadata known to be fixed across all distribution files for a project?

3 Likes