Nobody is following the metadata_directory promise in PEP 517

uranusjr · February 1, 2021, 10:34am

While exploring the possibility to add an interface to prepare_metadata_for_wheel in pypa/build, it was raised that no (as in none at all) PEP 517 backends actually fulfills the promise made in PEP 517:

build_wheel
If the build frontend has previously called prepare_metadata_for_build_wheel and depends on the wheel resulting from this call to have metadata matching this earlier call, then it should provide the path to the created .dist-info directory as the metadata_directory argument. If this argument is provided, then build_wheel MUST produce a wheel with identical metadata. The directory passed in by the build frontend MUST be identical to the directory created by prepare_metadata_for_build_wheel, including any unrecognized files it created.

But instead, all major PEP 517 backends choose to ignore metadata_directory, re-generate metadata from scratch in build_wheel instead, and make no effort whatsoever to verify the generated metadata is identical.

The issue has been raised to setuptools, flit, and poetry. Poetry devs did not respond to the issue (opened in May 2019). Setuptools actively generates non-identical metadata. Flit is the only project that’s responded. But it does have a point to not make use of the argument—It is easier to always re-generate, and since flit relies solely on static metadata, it does not expect anything to go wrong unless in extreme advanced and niche usages (custom backend extending Flit’s PEP 517 interface), so always verifying the output feels wasting.

So now we have a rule that nobody is following, and people using the interface (frontends and backend extensions) are left in a bad place. How can we improve the situation?

steve.dower · February 1, 2021, 10:45am

Popularise the minor backends instead?

Seriously, it was one line. I don’t know what to do if they don’t want to type it. pymsbuild/_build.py at 2c31968d4576a388701f50e8117187afef767d37 · zooba/pymsbuild · GitHub

bernatgabor · February 1, 2021, 10:48am

I consider these backend bugs that should be fixed by the maintainers. The frontends are free to raise an error and refuse to handle such backends.

pf_moore · February 1, 2021, 11:02am

I agree with just calling these backend bugs.

Frontends should work on the assumption that the backend follows the spec, and if that causes an issue, direct the user to yell at the backend. At a minimum, every backend should be able to compare the two sets of metadata and fail if they differ, so it’s not like this is complicated to implement. (Flit argues that it’s an unnecessary cost, I’d say that it’s only unnecessary as long as no-one can tell you’re not doing it ).

Edit: Note that I don’t have a problem with backend bugs not getting fixed promptly. It’s a fairly rare edge case, and all of these projects are volunteer based, so prioritising more significant issues is entirely reasonable. But that doesn’t mean it’s not a bug…

uranusjr · February 1, 2021, 12:46pm

So it sounds like pypa/build (as a frontend) should stick to the standard and look for backends implement the expected behaviour in the future. I’ll bring this back to the discussion there, thanks.

pf_moore · February 1, 2021, 12:52pm

Out of curiosity, what was the pypa/build issue that triggered the question?

bernatgabor · February 1, 2021, 1:04pm

MrMino · February 1, 2021, 2:32pm

Out of curiosity, what was the pypa/build issue that triggered the question?

TL;DR: me not knowing whether the metadata hook is an optional call for the frontend or not, and not seeing it in the build API. Thread linked up above by @bernatgabor - most of my chatter not related to the issue at hand.

Start from this comment instead: Support for metadata hook · Issue #130 · pypa/build · GitHub

njs · February 1, 2021, 3:14pm

It sounds like flit is following the spec? The spec doesn’t say the backend is required to actively check for equivalence, it just says that the output has to be equivalent, however that’s accomplished.

Otoh the frontend would be entirely within its rights to check and raise an error if there’s a difference. And the backend is free to check too, if they want. Might be a good idea for setuptools too, since it’s so hard for setuptools to know what user code and plugins are doing. The reason backends get the metadata_directory argument is to maximize their flexibility for implementing this rule, by reusing the old metadata, comparing against it, etc.

pf_moore · February 1, 2021, 4:06pm

Agreed. “The way that the backend works means that there can’t be a difference” is an entirely valid approach.

Yes, but the frontend is also entirely within its rights to assume the backend follows the spec, and not check. That’s basically the point of a constraint like this, it allows frontends to avoid checks that the backend did what they are required to do.

Yes, the frontend can check, and if they do, and find a discrepancy, they can give a friendlier error - but the error is still “the backend has a bug and I can’t proceed, please file a bug with the backend”…

dholth · February 1, 2021, 7:42pm

Yes, the requirement is OK, but the solution is ‘generate the same metadata every time’ not ‘copy the passed in folder because the metadata might be different each time’.

bernatgabor · February 1, 2021, 9:26pm

Well, implementation detail. Some backend might be cheaper to compare and generate delta than regenerate it (e.g. where you need to compile some binary to find out the records ).

wim · May 15, 2022, 7:49pm

Shirking the responsibility to look at metadata_directory also makes wrapping the hooks unreliable, unfortunately.

More info here: build meta wheel does not respect metadata_directory per PEP-517 · Issue #1825 · pypa/setuptools · GitHub

ofek · May 17, 2022, 10:24pm

As a backend maintainer, my view is that the main issue is prepare_metadata_for_build_wheel itself. It serves no purpose IMO and is bound to lead to this issue, see Feature request: hatchling should implement the prepare_metadata_for_build_wheel hook · Issue #128 · pypa/hatch · GitHub

If dynamic things happen during builds metadata will change unless you literally build the wheel for prepare_metadata_for_build_wheel.

pf_moore · May 18, 2022, 9:49am

From what I recall, the point was that backends might be able to determine metadata quickly, but building the wheel may be costly. For example, setuptools where the metadata is accessible statically (from setup.cfg, maybe) but there’s a binary to build. If frontends can signal “we only need the metadata just now, and we may not ever need the wheel”, this could be a huge win. In practice, though, it’s not turned out to be as useful as we’d hoped (you could in fact argue that adding it was a pretty clear case of premature optimisation…)

But the standard is absolutely clear that prepare_metadata_for_build_wheel is optional, so it’s absolutely acceptable to simply not implement it.

bernatgabor · May 18, 2022, 1:25pm

I don’t think it’s premature optimization, especially not for c-extensions where building a wheel is more expensive. tox 4 uses this feature a lot.

ofek · May 18, 2022, 4:17pm

What does it use it for? The metadata will change in this case btw

bernatgabor · May 18, 2022, 5:19pm

To determine dependencies of the run environment. Mostly only interested in the Requires-Dist, which should really not change between prepare_metadata_for_wheel and build_wheel.

layday · May 19, 2022, 7:06am

It’s unclear what purpose passing the metadata directory produced by the prepare_metadata hook to build_wheel is intended to serve. Is it to allow modifying the metadata, or is it for the backend to be able to verify that the metadata haven’t changed in the intervening period? The few backends that support the directory argument seem to think it’s the former but several people here (myself included) are uneasy with backends producing wheels with potentially arbitrary metadata. What are the pros and cons of either approach and what are their practical applications?

As for @wim’s specific use case (or perhaps more generally), there exists an additional complication, which is that frontends are not required to invoke prepare_metadata prior to building a wheel or pass its output to build_wheel even if they do. Therefore, you would not be able to isolate the metadata transformation to prepare_metadata unless you controlled the frontend.

pf_moore · May 19, 2022, 10:35am

It’s so that backends can avoid doing a bunch of work twice.

Imagine the extreme case, where the backend implements prepare_metadata by building the wheel in the metadata directory and then unpacking just the .dist-info part. In that case, the backend can implement build_wheel by simply noting that a metadata directory was passed, and returning the already-built wheel stored in there. In reality, such a backend would be better just implementing build_wheel and not implementing prepare_metadata at all, but intermediate variants where the backend builds most of the wheel (maybe just not running C compilers on extension modules) are plausible.

Absolutely not, the PEP requires the metadata is the same from both calls. (Edit: But I hadn’t considered wrappers when I said that, see below for the situation wor wrappers),

No, it’s so the backend doesn’t have to recreate the metadata at all, it can just re-use it. But the backend can do this if it wants (but by doing so, it would prohibit wrappers that change the data - which it may or may not want to do).

Correct. build_wheel must be prepared to do the whole job, if prepare_metadata doesn’t get called, or if the metadata directory isn’t passed (a frontend that calls prepare_metadata but doesn’t pass th directory on to build_wheel is not technically broken, but it’s pretty stupid as it’s deliberately blocking the backend from a potential optimisation).

Absolutely. You cannot isolate anything to prepare_metadata, because the front end isn’t required to call it.

I don’t quite understand the setuptools-ext use case, because it seems as if setuptools-ext is assuming the front end is required to call prepare_metadata if it’s present, which isn’t true. So if that’s the case, I think the problem here is simply that there’s a bug in setuptools-ext.

I will say that we didn’t consider backend wrappers when defining prepare_metadata - the approach of a wrapper adding its own “stuff” to the metadata directory and expecting a wrapped backend to preserve it is a reasonable one, and requiring the backend to copy what’s in metadata_directory into the final wheel seems like a fair way of achieving that. But it’s not mandated by PEP 517 and it would need an update to the spec to require it (and, as this thread demonstrates, it would need existing backends to change so we’d need a way to detect which version of the spec a given backend conforms to).

On the other hand, preserving the metadata directory could be a private arrangement between setuptools-ext and setuptools. We don’t have a use case of a backend agnostic wrapper needing this functionality yet, so forcing all backends to conform might be a little premature.