Proposal: Adding a persistent cache directory to PEP 517 hooks

pf_moore · May 8, 2020, 8:18am

Are we not at this point back to the “trust the backend” dilemma that we hit with PEP 517 itself? (Sorry if people find that phrase confrontational - I don’t mean it as such, it just encapsulates well for me the idea that was discussed at the time that the PEP shouldn’t try to mandate behaviour from the backend other than that it produces the required wheel).

I understand that backends might not be able to control all the tools they run, but I do think the PEP should be clear that if the wheel doesn’t get built correctly because the backend can’t respect the cache, it’s the backend’s problem to deal with that.

C/C++ standards have an “as if” concept that I think is useful here - the PEP can say “backends must act as if all generated artefacts get stored there”, meaning that they don’t have to do it, but they have to hide that fact from front ends. Maybe we should use that idea here? I’d certainly prefer it over wording that meant that front ends could supply a cache, but were then expected to deal with the possibility that the backend ignored it (which is basically what “SHOULD” says…)

bernatgabor · May 8, 2020, 6:30pm

I’m happy with making it as if But there should be some mechanism to allow backends that can build out of source tree to do so, and allow the frontend to communicate this to the backend.

dstufft · May 8, 2020, 8:03pm

Yea sorry I was speaking more towards further up where there was discussions about pip somehow working around a mandated hook by passing in a temporary directory each time or trying to shoehorn pip’s own cache for this.

pf_moore · May 8, 2020, 8:39pm

Just to clarify:

I don’t see any problem in general with PEP 517 adding a “cache directory” option.
I don’t think pip will be interested in using it (but that’s a discussion that should be had among the pip developers later).
I think callers should be allowed to not specify a cache directory.
I think backends should be required to behave as if everything is stored in the cache directory, if provided. I.e., front ends should be allowed to assume that if they supply a cache directory, they don’t have to write code to cover for it being ignored.

I’m not clear what the precise semantics of a cache directory are. I don’t really need to know the semantics from the backend’s perspective, but from the front end’s perspective, I would like to assume that if there’s a cache directory, then repeated builds using different cache directories would not interfere with each other. This would mean that front ends could safely do “in-place” builds. If that isn’t the case, I think it should be explicitly noted in the PEP, as I’m pretty sure people other than me will make that assumption.

Apart from the question of “in-place” builds, my assumption of what the cache means for front ends is “if you pass the same cache for two builds, the second build may be faster - but it’s your responsibility to ensure that exactly the same build is being requested; using the same cache for different builds is not permitted”. Is that accurate?

I’d like it if the semantics of not specifying a cache directory were defined as being the same as if the front end passed a temporary directory that was immediately deleted after the build. That makes “in-place” builds safe by default. If setting a build cache doesn’t mean that in-place builds are safe, I don’t care about this point, though.

bernatgabor · May 10, 2020, 9:15pm

I’d the impression Solving issues related to out-of-tree builds · Issue #7555 · pypa/pip · GitHub could use this.

Agreed.

Yes.

Wouldn’t this contradict how setuptools works now? It stores the cache in-line, so secondary builds would not behave as if they would be deleted after the first.

pf_moore · May 10, 2020, 9:37pm

I was meaning a persistent cache. In-tree builds (the issue linked) could pass a temporary directory and delete it immediately afterwards, certainly. But that’s why I’d prefer to say that if the frontend doesn’t specify a cache directory, the backend should do this anyway - I see no real use case for not isolating build artifacts unless the frontend is aware of it (and will therefore help manage the cache).

Sorry for not being more explicit.

Yes it would. What I’m not clear about is who actually finds setuptools’ current behaviour useful. Given that most (?) builds go via pip, and setuptools’ current behaviour is a problem for pip, what I’m trying to find out is who actually wants setuptools’ current behaviour.

But it’s not a huge deal. If we want to say that backends can store artifacts in-place when the frontend doesn’t specify a cache directory, then pip will just specify a throwaway directory and we’re fine.

bernatgabor · May 10, 2020, 11:10pm

Be that might as be so, can we really introduce a breaking change for PEP 517, given we did not mandate any such behaviour until now?

pf_moore · May 11, 2020, 7:32am

I don’t see this as “breaking” for consumers, as it’s just defining something that was left undefined previously. For backends, it’s a new feature, and so “breaking” in the sense that the default for the new argument doesn’t match the behaviour that currently occurs with the version of the hook that has no cache directory argument. But generally I think of “breaking changes” as being from the consumer’s point of view, so I don’t feel this is a major issue.

I thought this was mandating a change to setuptools anyway, as there’s currently no means of setting a build cache.at all. And of course, setuptools can continue behaving as it does now when not called via PEP 517, it’s just that PEP 517 builds will no longer pollute the source directory.

As I said, if anyone has a use for PEP 517 builds putting build artifacts in the project root, I’m happy to drop the idea. I’m not trying to argue strongly for it - I just think that “safe by default” is a better approach for frontends in general, and specifically for pip. (But even if we do add this, someone who really wants the existing behaviour could, of course, specify build_directory=project_root.)

pganssle · May 11, 2020, 1:11pm

Depending on exactly what you mean by setuptools’ current behavior and how pip plans to change, I think I have some examples where it is useful.

Taking as assumptions that the change to setuptools is that instead of putting build artifacts in the build root, setuptools will always put them in the cache directory, which if unspecified will be a setuptools-generated temporary directory, the options for pip are:

Always pass None, in which case no build artifacts persist after the build is complete.
Pass a build directory in the source root.
Create a persistent out-of-tree cache directory somewhere, to be managed by pip.

I think that if you do the first one, you’ll have the same problems that this thread started with — incremental builds are impossible and no C extensions of any size will choose to use PEP 517. The third one is more workable, because the artifacts continue to exist, but it’s way less discoverable that any sort of build cache exists, which could cause some serious bugs. The second one, an in-tree cache, is basically what setuptools does now. It’s annoying for reproducibility (as is the third option), but at least git clean will remove it.

One thing I’ll note is that changes to this may affect coverage in C extensions. gcov has some assumptions baked in about the locations of .o files, in that they need to exist and they need to be in the tree. The current advice I see out there is to use setup.py -i build_ext or to use an editable install (neither of which is going to be viable in a PEP 517-only world). I decided to be a guinea pig for this in the reference implementation for zoneinfo, and found that the only reasonable thing to do in the world of out-of-tree builds was to copy the build files out of the temporary directory in my setup.py. If tox has a persistent build cache this won’t be a problem (because those files only need to live as long as it takes to run gcov anyway), but it’s suggestive that there may be other issues at play here. For example — how will this affect debugging with gdb if pip always passes None?

pf_moore · May 11, 2020, 1:43pm

Thanks @pganssle, yes it does sound like we’re talking at cross purposes here.

From my point of view, and I think @dstufft was saying more or less the same, pip won’t be trying to support incremental builds in terms of managing a cache for build artifacts - the cache management requirements don’t match pip’s current caches, and adding a whole cache infrastructure for this seems like a bad idea, given that pip does’t know enough about how builds work to make good choices. At best, pip might offer a way for end users to pass on a directory that the user manages - but I’d be cautious about that, as it seems like a recipe for pip to get bug reports caused by user mismanagement of the cache.

So I expect pip to not supply a cache directory, and my only interest in this feature from pip’s point of view is to specify “somewhere other than the source root” for build artefacts on the basis that this will allow us to do in-place builds safely.

It sounds like you have a different expectation, based around wanting to do incremental builds, but I don’t have a clear understanding of how you expect the front-end interface for that to work, so I can’t really comment on whether that’s something pip would support. If this is driven by a desire for some sort of front end feature, maybe we should restate the problem by describing what you’d like that feature to be, and then we can design the hook based on what pip needs if it’s to add that feature, rather than offering a hook design without a frontend feature that needs it?

[Edit: Re-reading that last sentence, it comes across as “you did this the wrong way around”. I didn’t mean that - all I meant was let’s try starting from the user interface end of the feature, and see if that ends up at the same place that we got to by starting from the backend capability. Sorry if I implied differently]

steve.dower · May 12, 2020, 8:35am

I think for Paul’s scenario (which is also often mine), the answer is just to run the backend directly.

While developing one of my projects right now, I use setup.py build_ext all the time, and then add some path manipulation into my tests to do it for me and get the results in the right place.

For a release, I create the sdist directly and then use pip wheel to make my wheels (not I, but CI).

But I’m happy to ignore the prominent view that we should never touch the backend Perhaps this goes along with the editable installs issue, and we really just need a “developer install” that can also do incremental builds and doesn’t keep cleaning up after itself. But at least stop making it sound like such a bad thing for developers to just use their backend during development.

(The other issue of when the target environment has different dependencies than what’s available for the build environment is also a developer scenario, and also resolved by bypassing the front end right now.)

pf_moore · May 12, 2020, 9:21am

+1 to this. The idea that people shouldn’t use the backend directly is very much what results in people expecting frontends like pip to support every possible workflow, and PEP 517 needing to support features that may not make sense in all backends (I doubt flit has any concept of incremental builds, for example).

pganssle · May 12, 2020, 3:02pm

Re-reading the original post, I realize I did not go into the reasoning for the incremental build support thing, which is something I consider a major issue. The problem is that pip’s transition to PEP 517 came bundled along with out-of-source builds, which means that for projects like matplotlib and pandas, “use PEP 517/518 for builds” became a non-starter, because every minor change was a fresh build, which can take minutes or longer. It’s a barrier to adoption for PEP 517.

Some libraries have gone ahead and adopted PEP 517, but it complicates their dev workflow, because in order to avoid the fresh build they have to just delete pyproject.toml (and as a result need to install all their build dependencies manually…).

So we need some solution to this problem, otherwise we won’t get people adopting modern build workflows. My assumption at the time was that pip was concerned about isolated and clean builds and would not want this incremental build behavior by default (hence the “copy-the-source-tree” approach). This cache directory is useful for allowing pip to provide users a mechanism to choose the kind of build they want (clean, incremental, etc).

I think the proposal is strong based on solving the tox problem alone, but I do think that if we want wider adoption of PEP 517, pip needs to support incremental builds.

This is fine as long as that backend isn’t setuptools, where we’re aiming to deprecate all front-end invocation of setup.py for a variety of reasons. One of the biggest reasons being that we want it to be possible for you to have dependencies in setup.py, and “directly invoke the backend” gets a lot more complicated in that scenario.

I’ll also say that I think that “install this in a way that is totally different from the way it’s installed in prod” is not a good idea for designing test workflows like the “artifacts must be preserved for coverage purposes” use case.

pf_moore · May 12, 2020, 4:31pm

Considering that in 20.1 we switched to in-tree builds, that’s clearly not the case any more¹ Unfortunately, we had to revert because the artifacts setuptools leaves in the source tree are a problem for significant use cases. So in reality, we’d be 100% happy with setuptools working to provide clean in-tree build support so we could switch to that mode without hurting users.

My stance that “pip will just pass a throwaway directory” is specifically based on the fact that pip just wants in-tree builds to work, and we don’t want anything to do with getting in the way of whatever backends choose to do to make that happen. On the other hand, we also don’t want to get involved in managing incremental builds, so I wouldn’t expect pip to ever offer any sort of option to allow users to control such details (beyond the general approach of backend-specific config_options).

This probably causes some of the confusion, because you’re viewing the proposal as “enable incremental builds” whereas I’m interested in it for the (possibly incidental) benefit of making in-tree builds safe (or at least safer).

If there are features of setuptools (such as incremental builds) that are currently only available via direct setup.py invocation, then certainly it’s fine for setuptools to say that that interface will ultimately be deprecated. That’s a question entirely for setuptools to decide. But equally, it’s for setuptools to determine what people who use that functionality should do in future - there’s no reason to assume that the migration path for users will be to being able to access those features via the frontend.

We seem to be getting into a situation where there’s an assumption that pip will implement every feature that’s exposed by PEP 517, so that “writing an interop standard” is a proxy for “get pip to support this”. I don’t think that’s realistic, and we should be careful not to make that assumption. Whether pip implements any given (optional) PEP 517 feature will be a decision for pip to make, and “what if no frontend chooses to offer access to this feature” is a question that proposals need to address.

¹ It is true, though, that the pip developers have never really presented a clear view of what we prefer here. So it would be surprising if you did correctly judge pip’s preference, because I’m not sure we had one!

bernatgabor · May 12, 2020, 4:50pm

For me feels like a good thing to take advantage of incremental builds, or at least allow users to do so when they know they’ll be reinstalling the same folder over and over:

pip install .  # this is slow will take ages as needs to compile C
pip install .  # second run is now fast

If not by default at least via some --cache-build-folder /tmp/ok manual and explicit interface.

pganssle · May 12, 2020, 5:16pm

I am advocating two separate things here:

I think that we should add this feature to PEP 517, regardless of what pip does. At minimum tox could use it.
I think that pip should support incremental builds because people want to use pip to install things in their dev workflows and because pip currently supports incremental workflows in non-PEP 517 builds, it is a blocker for PEP 517 adoption.

Currently #2 is technically solved by using in-tree builds if that’s going to continue, but my impression was that the in-tree build change was going to be reverted in the next version. If in-tree builds are not reverted and we add this, then having pip pass a temporary directory for build artifacts would be a pretty significant reversion for some people.

The question of what pip does is important in the discussion of the interop standard insofar as we should take pip into account as a very important front-end, but I don’t think anything we’ve proposed to go into the PEP would mandate that pip or any other front-end do anything here.

pf_moore · May 12, 2020, 6:51pm

That’s a pip feature request and not really on-topic for here. I have some reservations that I mentioned above, but there’s no real point getting into too much detail until there’s a framework in place that pip could take advantage of if the feature request were deemed to be a good thing.

And yes, I know things get a bit circular at this point, if the justification for the hook is based on frontends using it…

That is correct, but basically because the build artifacts that setuptools leaves in the build directory make in-place builds a problem for more users than we’re comfortable with breaking. This issue is a good starting place if you’re interested in the details. If setuptools is able and willing to address some or all of the problem scenarios, then I’m pretty sure pip would switch back to in-place builds like a shot.

My understanding of the proposal is that even if it’s not the intended use, passing in a throwaway directory would force setuptools to put the problem build artifacts outside of the source tree, so we could go back to in-place builds safely. Doing so would unfortunately not enable incremental builds, but we’re already in that situation, so I’m not too worried about that. We can address that later.

There’s some really hairy code in pip in this area that I haven’t looked at in a while, so please assume that anything I say here could be wrong. And I don’t have the time right now to investigate, or look into the implications of what I’m saying here, but I’m baffled as to why we get such pushback over in-place builds if that’s what people get via the legacy install route. My assumption was that we did out-of-tree builds in all cases, but I can’t reconcile that with your statement that pip supports incremental builds in the legacy case.

Honestly, I’m really confused at this point as to what pip’s actual behaviour is. And if I, as a pip maintainer, am this confused, I’d certainly advocate extreme caution when it comes to making assumptions about what extra complexity pip can realistically sustain in this area

pganssle · May 12, 2020, 7:23pm

Yeah, I’m aware of the issue.

I think I may have been mistaken on this point, and possibly mistaken as to whether incremental builds is a blocker for adoption of PEP 517. In this issue I seemed to think that adding a pyproject.toml was triggering the “build-out-of-source” behavior, but that may have been related to the particular version of pip I was using throwing exceptions on pip install -e (which does its builds in-tree).

I will say, though, that the idea that pip would deliberately not support the use case of incremental builds is a bit world-shifting to me. For the past few years I’ve been under the impression that we were moving towards pip as “the way to install Python packages”, but incremental builds are such a critical part of a compiled extension-based workflow that if pip is not interested in supporting that use case it can hardly be considered a reasonable recommendation. We’re definitely left in a terrible void where setuptools wants to stop being a front-end, but no front-ends exist that support this common workflow (possibly tox, I suppose, but the ergonomics of tox are not ideal for many common situations).

Certainly we can move the discussion of “will pip support incremental builds” to a ticket on pip, particularly if you think it has no bearing on the cache function.

uranusjr · May 12, 2020, 7:41pm

I tried the behaviour, and it seems to me @pf_moore’s remembering correctly:

$ ls -a
.  ..  setup.py

$ pip list
Package    Version
---------- -------
pip        20.0.2
setuptools 46.0.0

$ pip install .
Processing c:\users\uranusjr\downloads\testcase\package
Installing collected packages: package
    Running setup.py install for package ... done
Successfully installed package-1

$ ls -a
.  ..  setup.py

$ pip uninstall -qy package

$ pip install -q wheel

$ pip install .
Processing c:\users\uranusjr\downloads\testcase\package
Building wheels for collected packages: package
  Building wheel for package (setup.py) ... done
  Created wheel for package: filename=package-1-py3-none-any.whl size=957 sha256=36ce21cf5b304beb4ded83b7d74ce3e39ff3e4c128995a703b454489ce8a5977
  Stored in directory: C:\Users\uranusjr\AppData\Local\Temp\pip-ephem-wheel-cache-4i5r1yez\wheels\93\54\9a\90f29ce83c8ce35e731cf3b472d43fb67c94f0aa21cfea47fd
Successfully built package
Installing collected packages: package
Successfully installed package-1

$ ls -a
.  ..  setup.py

IOW I believe pip currently (before 20.1) always copies the source tree, even for non-PEP-517 projects.

pip install -e . does build in-place, and I think is what @pganssle is recalling that prevents projects switching to PEP 517. The editable flag is not supported by PEP 517, but they can’t use non-editable instead because that builds out-of-tree. So the only choice left is not adopting the PEP.

pf_moore · May 12, 2020, 7:54pm

I’ll just say at this point that I always considered “support incremental builds” to be a back-end responsibility that shouldn’t need any special consideration from the front end. So it’s not so much “deliberately not support” as “never expected to have to be involved”.

That’s pretty much what I understood the discussion way back when PEP 517 was being debated (the one that I characterise as “trust the backend”) to be about. But (a) it was a long time ago, and someone should probably go and review that discussion to make sure we’re not just going over old ground, and (b) I’m very tired, so I should not say anything more now, for fear of confusing things further…

Topic		Replies	Views
Nobody is following the metadata_directory promise in PEP 517 Packaging	45	2399	September 9, 2022
Wheel caching and non-deterministic builds Packaging	13	2360	March 20, 2021
Offer a "dumb" PEP 517 develop hook Packaging	22	3030	May 5, 2020
PEP 517: Inject runtime when building Packaging help	5	483	March 25, 2022
Sdist idea: require `pyproject.toml` and PEP 518/517 Packaging	19	1907	July 18, 2020

Proposal: Adding a persistent cache directory to PEP 517 hooks

Related Topics