Should building wheels from sdists be recommended behavior?

In Build wheel from sdist (optionally?) · Issue #257 · pypa/build · GitHub, we are questioning whether building wheels from sdists, as opposed to the local source directly, should be the recommended behavior.

My opinion is that it should. Building wheels, or other future distributions, from sdists makes it so that things like unchecked files, cache, etc. don’t influence the built wheel. This avoids issues like accidentally including extra source files, and helps with reproducibility.
I believe this behavior should be recommended to users, and that package builders should be recommended to default to it.

Backends should still take steps to prevent such issues, but given we are entering the age of fully interchangeable backends that anyone can roll out, we need to accept that they might not, and try to mitigate issues there.

1 Like

Well, the “recommended behaviour” if a backend doesn’t support build_sdist would clearly have to be to use build_wheel :slightly_smiling_face: But apart from that, I’d take the view that just like build isolation, building via sdist should be the preferred approach, but tools should provide an “escape hatch” to allow building directly.

One concern I would have is that building via sdist is going to be slower than building the wheel in-place (particularly on slower filesystems, or where things like virus checkers make file operations slow). I can imagine that building a simple pure-python wheel could take more than twice as long with build-via-sdist. I’d like to see some sort of review that this won’t be an issue.

This may not be important for a tool like build, but if we’re making this the recommended practice, we need to consider other tools. Note that pip is not planning on switching from build-in-place to build-via-sdist, but from build-via-copy to build-via-sdist, so the fact that pip intends to go to building via sdist isn’t a directly relevant data point here.

Also, as a procedural point, if you want to change PEP 517 to make this a formal part of the spec¹, it should be written up and approved in the same way as previous changes like support for in-tree backends. If you just want to get consensus here, but not give it formal PEP status, that’s fine.

¹ I just noticed, PEP 517 has never been moved to PyPA specifications — Python Packaging User Guide. Someone should rectify that at some point.

2 Likes

Well, I think the tools should be able to expect that build_sdist is available given that is a mandatory hook, but like you said, tools should allow to build directly.

Hum… The only extra operation would be unpacking the sdist. I did a quick search about unpacking files being slow on some Windows systems, and it seems to me that even those low speeds aren’t that low to have a lot of impact here. Though, seeing proper testing would be great.

At this point, I just want to give an opportunity for everyone to raise issues with this approach, so that we can make a decision on how to proceed in pypa/build. Though, I do think it would be good to put this in PEP 517 if we reach consensus here.

I think building wheels from sdists makes a ton of sense. If we ever get to the point in the future where PyPI (or some other service) has the ability to automatically build wheels for a given release, the “input” would almost definitely be a source distribution.

5 Likes

Sorry, I was a bit quick in my reading. What I was referring to was this comment in the PEP:

If the backend cannot produce an sdist because a dependency is missing, or for another well understood reason, it should raise an exception of a specific type which it makes available as UnsupportedOperation on the backend object. If the frontend gets this exception while building an sdist as an intermediate for a wheel, it should fall back to building a wheel directly.

Also building the sdist. For something like setuptools, that’s potentially expensive as it runs arbitrary code.

And I’ve been on machines where virus checking runs on any unpack of an archive and is stupidly expensive. I’m not saying it’s necessarily a problem, but it is something I like to mention, because in my experience Unix users are often surprised by how costly some things can be on Windows.

2 Likes

My biggest concern here is that this would imply that build requires for sdist needs to be compatible with build requires for wheel, otherwise we’d need two separate isolated environments to build a wheel. This could be a reasonable assumption to make, but as things stand today is not mandated by the PEP. Also now the frontend needs to acquire and install both of these dependency sets, which would be a lot more overhead than just a copy/extract.

2 Likes

I think this is a good recommendation for building artifacts to distribute – in particular, building wheels to upload to PyPI. Basically it makes sure that distributed sdists and distributed wheels match each other.

If you’re a front-end like pip, then different considerations apply.

5 Likes

I agree. It’s either going to be a sdist or something that resembles something similar, e.g. a git checkout or git archive of a tag. For security and compliance it would make even more sense to start with a signed git tag and build all artifacts from the tag. This makes it easier for users and vendors to verify the provenance of code (assuming reproducible builds for tar balls, zipfiles, pure wheels and binary wheels).

2 Likes

I agree. Any recommended build/publish workflow should probably be:

  • build sdist once
  • build wheels for each platform from the sdist
  • upload everything

But for tools doing the installs, leave it up to the build backend. If it can recognise how to build a wheel from the source directory, then let it. If it can’t presumably it can recognise how to build an sdist, so it is totally capable of doing that.

In terms of specification, I think that just means not specifying whether build_wheel will “always” get anything at all. I’d hate to think that pip install . would start showing an error saying “you need to do pip wheel . first”, and I wouldn’t want pip to have to figure out the logic to ask the backend whether it needs an sdist, or detect a failure, or whatever.

Just say that the backend needs to work on a directory containing the pyproject.toml and “the rest of the sources”, and let backends figure it out (which may mean forcing their users to keep sources laid out in a certain way if they want it to work, which is totally fine in my view).

3 Likes

Yes, please ignore what I said :man_facepalming:.

That is a fair point, and I think build frontends should be able to decide what to do in those cases, but we could still recommend this practice.

I don’t agree that it would. Wheels should still be able to have different build requirements than sdists. I understand that some might jump to that conclusion, but it isn’t really said anywhere, and we could preemptively clarify this.

1 Like

Yeah, but if they can have different requirements you must always construct two isolated environments to build a wheel. One for the sdist, and one for the wheel.

1 Like

That’s what PEP 517 already says and no-one (as far as I know) is suggesting that we change that. The question here is whether frontends should call build_wheel directly on the source tree, or whether they should call build_sdist, unpack the sdist into a temporary location, then call build_wheel in that location, and then tidy up.

  • For building files for distribution, it makes sense to build the wheel(s) from the sdist.
  • For installers, where building the wheel is going to be a step in an install, I think it makes sense to leave the decision to the installer. There are trade-offs that individual tools should decide on for themselves.
  • For libraries, building a wheel should not imply building the sdist. Tools using the library can make that choice. A convenience function that builds both the sdist and the wheel, using the sdist to build the wheel, may be useful, but it should not be the default.

For other consumers, I don’t really have an opinion.

As things stand, there is no requirement that the build requirements must be the same. And adding such a requirement would definitely be a (backward incompatible) change to the standard. But I’d expect a good build tool to compare the return values of get_requires_for_build_sdist and get_requires_for_build_wheel (the only places a difference can arise) and if they are the same (which I’d expect them to be in the vast majority of cases) to re-use the environment.

(Note: I have no idea whether build implements environment re-use like this).

1 Like

Which I answered earlier with a (somewhat) clear “no”. Frontends should not do anything different than what the spec says.

As a best practice, though, we should definitely recommend that package publishers build their wheels from their sdists. Otherwise, how else will they know that their sdist works? :slight_smile: All of my binary builds go to sdist first, then I use the sdist on the matrix of machines I need to build binaries.

([Extra] Am I detecting some more “we don’t want users complaining to pip about a broken backend” concerns here? Unfortunately, I still don’t think there’s going to be any way around that, and you’ll just have to keep redirecting lost and confused users to the right place. That’s the eternal curse of being the frontend…)

2 Likes

OK. My confusion was that you said “leave it to the backend”, by which I presume you mean “call build_wheel”. But the backend never builds wheels via the sdist (OK, they might, but none do in reality) so I wasn’t clear how that was relevant. But yes, if I read your comment as “no, just call build_wheel” I get what you’re saying now.

I didn’t mean there to be any, no. Pip isn’t affected by this discussion, as we already have a roadmap for what we’re doing here. So what gets decided here won’t make much difference to us.

3 Likes

Speaking as someone that works on a project that does just this, I would very much welcome the proposal as an official recommendation. At the very least it would be nice to have a strong official guideline that published sdist, when unpacked, can even be used as a build source. Some projects aggressively minify their project (like removing their setup,py) making this behavior impossible.

2 Likes

Really? And they publish those as sdists? I’d have thought that it was obvious enough that this isn’t acceptable that it didn’t need to be stated explicitly :slightly_frowning_face:

I would 100% support saying that a sdist must either contain a setup.py or a pyproject.toml, and when unpacked must be a valid input for either setuptools (old-style setup.py only sdist) or PEP 517 (pyproject.toml based). I’d be happy to have that added to PEP 517 as a clarification of intent - I don’t think unbuildable archives being published as sdists is in anyone’s interests.

8 Likes

Let’s see… the last example that comes to mind is numpy, which excludes one of its license files, but always attempts to read that license file when running setup.py. So it’s not just enough to require that setup.py and pyproject.toml exist in the sdist, but any other files they try and access must also be in the distribution.
Hmm, actually reopening the numpy setup.py, it looks like maybe this license file is only read when creating specifically an sdist. Maybe it is a foolish requirement that an unpacked sdist must be able to create another (presumably identical) sdist, but then some of the consistency of recommending that wheels are always built from sdists is lost by downstream packagers.

Nevertheless, some github searching shows that there are in fact some project that exclude either setup.py or pyproject.toml, not to count those that might exclude e.g. README.

3 Likes

The requirement should be that sdists can build the other distributions, such as wheels, which includes having either a setup.py or pyproject.toml.

4 Likes

Did you reach out to numpy devs to verify whether this is a deliberate decision? From your description it sounds very possible to be an oversight, especially since license files is probably the one thing project maintainers would want to include in an sdist (so they don’t get sued etc.)

2 Likes

It’s hard to do much here without specifics. Have you reported this as an issue to those projects? “It’s not possible to build this project from the sdist” seems like a reasonable bug report. But anything that doesn’t prohibit building from the sdist is a different matter, and is more a quality of build issue than an actual bug - I certainly wouldn’t consider omitting a README to be a matter for a “strong official guideline”…

2 Likes