Should building wheels from sdists be recommended behavior?

Indeed. This works as designed, unless you want to create an sdist from an sdist, which doesn’t make too much sense.

For context: the main license file should match exactly a known license text (BSD-3 in our case) for GitHub and other tools to recognize the license. So licenses of vendored code go in a LICENSES_bundled.txt in the git repo, and they get concatenated to the main license file at sdist creation time. Unfortunately there’s no other way to both have a complete license file in an sdist and make license checkers happy.

1 Like

Thanks for clarifying this in such detail. Indeed I was trying to create an sdist from an sdist, however I realize this is fairly nonsensical (it’s just simpler when working with different sources that could be any of a commit, a github tag zip, sdist… to always first turn it into an sdist since it does happen to work for most sdists). Anyway not trying to push for that to be guaranteed to work.

I didn’t mean to imply that a README file be required to be present. I was rather was poorly bringing up the point that, as an example, setup.py files that get their description like this (a lot of them) long_description = open('README').read() will fail because that file won’t exist after the sdist building if the README is excluded.

However, I think the “valid input” here could be read as covering these cases.

Anyway I don’t have a lot of specifics or stats on these sorts of projects. I mostly wanted to come here to give a +1 FWIW to the original recommended behavior. There are already projects that do this as re-packagers and I wouldn’t say we can depend on this behavior now, although it does work for the large majority of projects. Having an official recommendation would be nice to reference when I do have to open an issue or a PR.

1 Like

Heads up: I’m working on finishing the big migration of PEP 517, 518 and 660 to the PyPA specs page, and hope to have a draft PR up this coming week. While there appears to be a general consensus here, I will refrain from modifying the normative existing language around the build_sdist hook as stated in PEP 517, but can open a separate PR once that’s merged to effect the change, subject to any further discussion, review and objections.

The current language for the build_sdist hook states

Some build backends MAY have extra requirements for creating sdists, such as version control tools. However, some build frontends MAY prefer to make intermediate sdists when producing wheels, to ensure consistency.

We could implement the recommendation that package developer-facing frontends build the wheel from the sdist as follows:

Build frontends MAY prefer to build wheels from an unpacked intermediate sdist built by this hook; to ensure consistency, frontends building wheels for distribution rather than installation SHOULD do so. However, some build backends MAY have extra requirements for creating sdists, such as version control tools.

The paragraph goes on to state:

If the backend cannot produce an sdist because a dependency is missing, or for another well understood reason, it SHOULD raise an exception of a specific type which it makes available as UnsupportedOperation on the backend object. If the frontend gets this exception while building an sdist as an intermediate for a wheel, it SHOULD fall back to building a wheel directly.

Should the second sentence recommendation be weakened to a MAY or removed entirely for package author-facing tools building artifacts for distribution? It would seem that at least one of these should be done, perhaps along with recommending the display of an appropriate author-visible warning message, so the author is aware of the inconsistency. For example:

If the frontend gets this exception while building an sdist as an intermediate for a wheel, it SHOULD fall back to building a wheel directly, except in the case of building a wheel for distribution, in which it MAY do so but SHOULD issue an appropriate user-visible warning.

Thoughts? Are there any other changes that should be made to implement this recommendation?

Also, the language in the build_wheel hook states:

To ensure that wheels from different sources are built the same way, frontends MAY call build_sdist first, and then call build_wheel in the unpacked sdist. But if the backend indicates that it is missing some requirements for creating an sdist (see below) [presumably a reference to the above-mentioned paragraph], the frontend will fall back to calling build_wheel in the source directory.

This is inconsistent with the language in said-referenced paragraph, which only states that build frontends SHOULD, not “will” (i.e. an implied MUST) fall back to calling build_wheel directly in the source tree.

For now, so the specification is at least internally consistent, I’ve clarified it to read

To ensure that wheels from different sources are built the same way, frontends MAY call build_sdist first, and then call build_wheel in the unpacked sdist. But if the backend indicates that it is missing some requirements for creating an sdist (see below), the frontend SHOULD fall back to calling build_wheel in the source directory.

Presumably, to implement these changes, the first MAY should be replace with SHOULD for package author facing backends, and the second sentence modified according to what is decided on that latter point as discussed above. What I nominally propose is

To ensure that wheels from different sources are built the same way, frontends MAY call build_sdist first, and then call build_wheel in the unpacked sdist; frontends building wheels for distribution rather than installation SHOULD do so. If the backend indicates that it is missing some requirements for creating an sdist (see below), the frontend SHOULD fall back to calling build_wheel in the source directory, except in the case of building a wheel for distribution, in which it MAY do so but SHOULD issue an appropriate user-visible warning.

+1 for removing entirely, it makes little sense to me. If you get a build failure, it typically will fail again the second time around. The concrete example given is “because a dependency is missing” - clearly invoking a different hook isn’t going to make that dependency materialize.

This was true for Pip’s retries as well (removed already) - those just resulted in a painful infinite loop in many cases. Build failures should abort the process in a well-designed tool, no retries with different hooks/versions/whatever.

Just to be clear, this is specifically referencing missing dependencies (such as version control tools) that are only required for building the sdist and not the wheel. Therefore, in this example, building the wheel directly from the source tree would succeed, while first building an sdist to build the wheel from that would fail. I’m not sure how common this is in practice, though.