Building sdist in place for local directories

pradyunsg · October 1, 2019, 6:40am

pip currently copies the entire directory that it’s been requested to install with something like pip install . (minus .nox and .tox in the root).

There’s been a lot of discussion on pip’s issue tracker about building an sdist in place, then unpacking that sdist in a temporary directory and proceeding from there.

Are there any major blocking technical concerns to doing this, beyond just implementing it? The implementation work is likely gonna be tricky but hey, let’s figure out if there’s any reason to not do this first.

bernatgabor · October 1, 2019, 7:07am

My concern is that this would imply at least one more subprocess call, on some platforms (windows ) this can be still costly as opposed of building the wheel in place. Why are we building the wheel off-site exactly?

pf_moore · October 1, 2019, 8:06am

I believe this should be fine. Let’s go for it.

pradyunsg · October 2, 2019, 7:14am

I’m thinking in terms of failure modes, which is why I think we should go via sdist – failing always is better than failing sometimes – for a faulty backend, for example, an install could work when done with pip install . but not pip install name which get the sdist built from that directory on PyPI.

This kind of failure can become tricky/frustrating for the user to debug and I strongly prefer to not have that happen.

I guess it comes down to should pip trust backends to do the right thing always, I’m not too comfortable with that given that it’s not guaranteed by PEP 517 and it’s super easy to mess up anyway. PEP 517 also says:

To ensure that wheels from different sources are built the same way, frontends may call build_sdist first, and then call build_wheel in the unpacked sdist.

So, yea, I don’t think I’d trust all backends would implement this in the manner we’d need, to be willing trade off users being exposed to the (IMO) confusing failure modes, especially since the benefit is a single reduced subprocess call for installations from directories/VCS.

Anyway, going via sdist would be an improvement over status quo so I don’t think this concern is a blocker.

If someone really wants pip to skip that extra subprocess call, I’m open to being convinced otherwise after we’ve rolled this out into the wild, treating this as an optimization.

pf_moore · October 2, 2019, 10:09am

I think that always building via sdist reduces the number of variations in behaviour, as you say, and I think that’s a good thing. The extra subprocess call doesn’t bother me (as a Windows user!) - in fact, we get enough complaints about slow builds because we currently copy the whole source tree, even data that wouldn’t be part of the sdist, that I think the net effect would be faster builds on average.

I can imagine adding an option to build wheels in-place (no source copy, just call PEP 517 build_wheel on the source tree) if there was sufficient demand for it. But it would put all the responsibility for ensuring that the build gave the expected result on the backend (no unexpected results caused by stale build artifacts, etc). There was some talk of such a thing during the PEP 517 discussions (mostly for incremental builds, IIRC) but no-one has really pursued the idea since, so I suspect it’s not as important as it seemed at the time.

njs · October 2, 2019, 10:26am

We don’t really have useful PEP 517 tooling for developer workflows yet, so I think everyone’s still calling setup.py or cmake or whatever directly. I think it would be nice to eventually have this kind of tooling, so that when you git clone random Python project there’s a consistent interface to building it, and that would definitely want to support incremental builds. But I don’t know if pip wants to be that tool or not. (I think it would be nice if pip supported these use cases. But I also think pip should have an upload command :-).)

bernatgabor · October 2, 2019, 10:28am

But that’s a major regression introduced with PEP-517. The blanket of that regression IMHO should not mean that we can allow more extra/unnecessary operations, once we fix it.

How so? Maybe variations within pip, but not variations within possible. While pip might do tree → sdist → wheel always; other tools might do tree → sdist → wheel + tree → wheel. In this case pip would silently hide all bugs on the tree → wheel path.

How so? The backend still provides an artifact into the requested destination. There’s no change from the current status quo. The frontend is entitled to poke around on the built artefact to perform any subsequent sanity checks. That being said expecting a backend to produce the requested artefact seems like a sane assumption.

I’m personally still not sure what benefits we get by copying the source folder to a temp folder, and then getting the wheel there, compared to just building the wheel inplace.

One of the main reasons for this is that the existing tooling for iterative development is slow. People will not adopt something that slows them down significantly.

pradyunsg · October 2, 2019, 10:37am

To make sure we’re all on the same page here, no one is opposed to doing in-place builds of a distribution. If you are, please holler and @-mention me in your post mentioning why.

As for the discussion about choosing which one we do: tree -> sdist -> wheel or tree -> wheel, I’ll let y’all hash out the details.¹

¹ I don’t really have much to add beyond my previous comment and, well, I have exams going on – I should really study/practice how to do Discrete Cosine Transforms, by hand… because college.

njs · October 2, 2019, 10:41am

No, when you do pip install path-to-source-tree/, then it has always made a copy of the entire source tree in temporary storage and then run setup.py bdist_wheel there. PEP 517 replaces the setup.py bdist_wheel, but it didn’t introduce the copy step.

bernatgabor · October 2, 2019, 10:44am

I meant the way pip implemented PEP-517.

pradyunsg · October 2, 2019, 11:07am

(sigh) I’m pretty sure you mean well but I feel like I should respond to this:

It’s a bit much to call not doing an in-place build, a “major regression”. I think you’re saying we should’ve switched to in-place builds when implementing PEP 517, we didn’t because:

no one advocated for it at the time
- I wanted us to but I didn’t demand for it from fellow volunteers.
no one volunteered to do the work.
status quo wins by default.

Functionally, for PEP 517 implementation, pip swapped its calls to setup.py with the relevant PEP 517 calls. That in itself, wasn’t easy to implement so, let’s not critique any of the work that’s been done till now, since I don’t think that’ll help our discussion here.

All that said, IIUC, @bernatgabor is in favor of doing in-place builds (and feels we should’ve done that from the start).

IMO that does not raise any concerns w.r.t. the original question I asked above, which I’ll rephrase to include more context+emphasis:

Are there any major blocking technical concerns to doing in-place sdist (or wheel) builds, for local directories, beyond just implementing it?

pradyunsg · October 2, 2019, 11:15am

ALSO super quickly, @bernatgabor and I had a text chat elsewhere about this and here’s my summary of it: (they OK’d me to post a summary, but might want to correct me if I’m wrong below)

If tree → wheel differs in behavior to tree → sdist → wheel, there’s a decent chance that the PyPI uploads are broken, if the maintainer isn’t using tox / check-manifest. And if the sdist is broken but not the wheel, that’s even less likely to be found until later – since pip will default to wheel installs.
- Note that the failure is not at build time – but rather when installing from sdist.
There are no guarantees that the result of build_sdist → unpack → build_wheel should be the same as build_wheel. It’s reasonable to expect it to be. But it’s not required by the PEP. Maybe, we should modify the PEP to explicitly make this a guarantee that the backend must provide/maintain.

Every subprocess call involves significant overhead as the interpreter needs to initialize itself all over again
Doing the extra sdist step makes pip behave more like a distribution validation tool, rather an installation one
- check-manifest exists because setuptools is bad at this
  - improve setuptools be better at this
The backend should take ownership of its correct behaviour. It would be preferable to have a tool built, that allows users to do such validations, rather than pip doing it on all systems, all machines, all the time

bernatgabor · October 2, 2019, 11:21am

My bad. The implementation of the PEP-517 was a major win and benefit build correctness wise. The regression it introduced was of a performance nature, when doing pip install on a source tree, as this before was much faster. I agree going down the PEP-517/8 is the right choice, we just should try to ensure that we don’t trade correctness for speed as much as possible, and with that come up with a good solution to remove this performance regression.

njs · October 2, 2019, 7:59pm

Huh, this is the first I’ve heard of this performance regression. Is this well known? Is there a pip issue or something I could use to catch up?

Thinking about it, maybe the isolated build feature could be the culprit? That’s definitely a correctness/speed tradeoff that has a lot of room for optimization. It’s not the topic of this thread, though.

sbidoul · October 16, 2019, 8:07pm

People interested in this matter may want to have a look at pip issue 3500 and in particular this comment and following discussion with @pf_moore.

That issue talks about other issues than performance wrt building out of tree.
Another potential issue is the case of read-only source trees.

One question I asked (not knowing why pip does build out-of-tree in the first place) was whether it is considered safer to build an sdist in place than building an wheel in place (building an sdist does modify the source directory in some cases).

Another remark I made there was that today pip wheel . (in pep517 mode) and python -m pep517.build -b . can give different results, the latter building in place.

I thought it might be interesting to discuss whether pep517 needs to be more explicit on the matter of where the build takes place, to ensure predictable results, independently of the front-end being used.