Nobody is following the metadata_directory promise in PEP 517

blink1073 · May 19, 2022, 2:09pm

Might another use case be to extract a dynamic version? Previously we’ve used python setup.py --version. Lately we’ve switched to hatchling version, but it would be a nice to have a backend-agnostic way to fetch version without building a wheel or installing.

pf_moore · May 19, 2022, 2:57pm

I don’t understand. To get the version, I can think of 3 possibilities:

Build a sdist using the build_sdist hook. While the sdist name isn’t standardised, it’s almost certainly {name}-{version}.tar.gz (see here). If you want a standardised approach, read the resulting file, it must have a single top-level directory called {name}-{version} (see here). You can even read the PKG-INFO directory for other metadata (but note that unless the metadata is version 2.2 or greater, you cannot be sure wheels built from the sdist will have the same metadata).
Build a wheel using build_wheel. The filename and structure are standardised, so there’s no ambiguity at all.
Try prepare_metadata_for_build_wheel, but you have to be prepared to fall back to build_wheel if the backend doesn’t support it. So this isn’t a backend-agnostic way to avoid building a wheel, even though it might do so sometimes. If the prepare call does exist, then yes, you can rely on the metadata it returns.

OK, in the third case, all you can technically be sure of is that the version you get is the same one you’d get if you then built a wheel after passing the metadata directory to build_wheel. Pathological backends are possible, such as one that generates a random version number every time it’s called. But a tool that extracts a dynamic version always has that problem to consider, so I wouldn’t worry too much about it.

layday · May 19, 2022, 3:33pm

There are several issues with this approach beginning with the fact that the backend does not own the containing directory (the metadata directory is dist-info). The frontend is better positioned to cache prepared metadata and built wheels (and pep517 does in fact employ a similar trick :). I don’t know under what circumstances the backend would opt to build most of the wheel when asked for the metadata - that would be one exotic backend. Metadata generation is cheap so if this was metadata_directory’s intended purpose, simply for the backend to avoid regenerating the metadata, it makes sense that people can’t make sense of it. It is such a minor thing, that it doesn’t seem to justify the resulting complexity.

blink1073 · May 19, 2022, 4:26pm

As you said, the sdist doesn’t really help us become the name isn’t standardized, and it is not always cheap to build an sdist. The rest of what you wrote is repeating exactly what I said, we’d like to get the version from a metadata hook to avoid building a wheel if possible. Frankly your dismissive tone isn’t helpful (here and in many other places). I am going to take a break from participating in packaging discussions.

layday · May 19, 2022, 4:40pm

pypa/build contains a utility function which uses prepare_metadata to generate a project’s metadata:

from build.util import project_wheel_metadata

print(project_wheel_metadata('my_package').version)

CAM-Gerlach · May 19, 2022, 6:32pm

Its seems that’s completely undocumented for now, though; if it is considered public, it would have been very helpful when porting Pyroma to use modern packaging mechanisms.

However, it still ultimately relies on the prepare_metadata_for_built_wheel hook; using the documented public API, the solution I came up with was

    with tempfile.TemporaryDirectory() as tempdir:
        metadata_dir = build.ProjectBuilder(str(path)).prepare("wheel", tempdir)
        with open(pathlib.Path(metadata_dir) / "METADATA", "rb") as metadata_file:
            metadata = email.message_from_binary_file(metadata_file, policy=email.policy.compat32)

layday · May 19, 2022, 6:41pm

I could’ve sworn that was in the docs, not sure what happened there. The latest doc build has it: API Documentation - build 1.0.3

Return the wheel metadata for a project.

Uses the prepare_metadata_for_build_wheel hook if available, otherwise build_wheel .

CAM-Gerlach · May 19, 2022, 6:45pm

Ah, I see—I’d only checked the stable docs which doesn’t have it. Thanks for the tip!

Unfortuantely, the PackageMetadata return type isn’t documented (or if it is, it isn’t linked and I couldn’t easily find it elsewhere), so its unclear other than by inspection and inference what attributes we can rely on there, unless there’s something else I’m missing (which there might).

layday · May 19, 2022, 6:50pm

The PackageMetadata type is from importlib.metadata. There’s a long-standing issue with importlib.metadata types not working with Sphinx.

pf_moore · May 19, 2022, 7:27pm

I apologise if I came across as dismissive. That wasn’t my intention (and it’s my error for not choosing my words better).

I was addressing your comments from the perspective of what the current spec guarantees. In that context, I agree with what I thought you were saying, which is that it doesn’t help your use case much (if at all). However, I didn’t consider that you may have been offering a use case to motivate a change to the current spec, in which case my reply would have been at best irrelevant (and as you say, would appear dismissive). I don’t personally have an opinion on changing the spec - as a build front end developer it’s sufficient for me, so I’m happy to leave discussions on what changes might be needed to others.

And if I’m still missing the point of what you were saying, then I apologise again. I’ll say no more at this point, as I’m clearly not helping the discussion much.

wim · May 19, 2022, 7:46pm

That’s not quite right, the setuptools-ext build_wheel hook could itself call prepare_metadata_for_build_wheel hook first, and then pass along the metadata directory (which gets returned by the hook) to setuptools’ build_wheel hook. No cooperation from build frontend is necessary.

It makes no such assumption. In fact, setuptools-ext doesn’t even bother to implement prepare_metadata_for_build_wheel, because there would have been no way to provide the resulting information to setuptools.build_meta. So it does the wheel rewrite instead, ugly and potentially expensive for big packages.

My main point here was that hook wrapping requires for the backend to actually use the metadata_directory argument, which setuptools doesn’t, though I think the wording in PEP 517 seems to imply that it must:

If this argument is provided, then build_wheel MUST produce a wheel with identical metadata. The directory passed in by the build frontend MUST be identical to the directory created by prepare_metadata_for_build_wheel , including any unrecognized files it created.

pf_moore · May 19, 2022, 11:09pm

I think the key problem here is that PEP 517 assumed that only build frontends and build backends exist, and frontends call hooks and backends implement them. The idea that anything other than a build frontend might want to call the hooks (and in particular the idea of one backend wrapping another) was never considered^[1].

Ultimately, I think we’ll need an update to the spec to make it properly support backend wrappers. At best the existing spec is unclear on key points, at worst it allows backends to do things that make wrapping impractical.

The example @blink1073 gave is another use case that wasn’t considered - callers of the hook API that aren’t build frontends, and aren’t interested in building a wheel, except possibly as a means to an end. It’s possible such consumers would be better served by a new, optional hook that simply generated and returned the package metadata in something like the JSON-compatible dictionary format defined in PEP 566 - without any of the baggage of writing and subsequently parsing a dist-info directory.

At least, not as far as I recall. ↩︎

wim · May 20, 2022, 2:28am

It seems to be considered at least for for in-tree backends to me:

Project-specific backends, typically consisting of a custom wrapper around a standard backend, where the wrapper is too project-specific to be worth distributing independently

layday · May 20, 2022, 5:25am

Good point, I hadn’t really thought of that. (Although I wonder if it wouldn’t be better for your specific use case to subclass egg_info or the metadata writer and register either one as an entry point with setuptools.)

pf_moore · May 20, 2022, 8:14am

Ah, I’d forgotten that. In-tree backends were an addition after the original PEP.

abravalheri · May 20, 2022, 8:41am

Hi @wim, it might take a while until setuptools attempts to re-use the metadata directory instead of recreating it. First we probably need to solve the problem of the cyclic build dependencies with wheel and absorb the bdist_wheel command in the process… That is something considered in the existing discussions in the setuptools tracker. Step by step we will arrive there. If you or anyone is interested in contributing towards that future, we welcome any contributors.^[1]

There are other things that you potentially could do right now to overcome this difficulty (e.g. overwriting/wrapping/extending the egg_info command via entry points), but I believe that would be more difficult than the existing wheel re-write, so probably not worthy…

Regarding the hook and the general backend responsibilities, as previously mentioned, something a backend could do is to verify the existing metadata is identical to the re-generated one and halt with an error. I don’t think setuptools will ever do that, but it is just to illustrate that the need of coordination between a backend wrapper and the backend itself would still be required.

Maybe I am wrong here and we could solve that by tactically changing the arguments for calling bdist_wheel inside setuptools.build_meta, but I haven’t dig that deep to know yet… ↩︎

ofek · May 21, 2022, 1:32pm

Another thing that might not have been said is that since all metadata either is generated or already exists on disk it’s wasteful to create this directory rather than putting everything directly in the archive.

They can though, I’ve already added that ability for a feature request Best practice to modify editable wheel · Issue #228 · pypa/hatch · GitHub

To be clear the artifacts are still reproducible but the logic occurs during the actual build.

pradyunsg · May 22, 2022, 1:54pm

I think there’s one improvement we can make here:

Allow prepare_metadata_for_build_wheel to return None. The implementation of this hook would then mean “I have the ability to pre-compute metadata in some cases” instead of “I can always pre-compute metadata all cases”. When None is returned, that’d basically mean “I can’t pre-compute the metadata for this case” (“I” here refers to the build-backend, personified).

In such situations, it’s safe for the build frontends to delete the contents of the directory passed to this function, although it won’t be strictly required.

finswimmer · May 24, 2022, 12:42pm

Hey,

one of the Poetry maintainers here. I would be glad to implement the expected behavior when passing metadata_directory to build_wheel. My problem is, that I’m not sure what the expected behavior is.

Must the backend check, if the provided metadata matches those, that would be created by the backend? Then the metadata_directory would be there for triggering a validation process.

If the backend must not compare the input with what the backend would create, then the metadata_directory argument would enable front-ends to manipulate the metadata that should be used in the package.

That are two totally different goals. So what’s the expected one?

fin swimmer

bernatgabor · May 24, 2022, 1:03pm

I think checking that the input is valid is optional. A backend could do that and would defend against a malicious user passing in wrong things, but the backend could decide that’s up to the user to only pass in a metadata directory that previously was generated by the service. And the service would need at that point to check if the previous generated metadata is still valid. Would be also valid for a build backend to ignore the metadata directory and generate it from scratch. The reason the metadata directory is passed in to the build is to offer the opportunity for the backend to reuse earlier builds, not that it must do so. The frontend should definitely not change the content of the metadata directory and the backend is free to check that and raise if it detects tampering.