Proposal: Adding a persistent cache directory to PEP 517 hooks

The only way pip can solve this is to pass in different cache directories per python interpreter, and the backend to adhere to that because here you point at setuptools as the culprit:

But then you just said that setuptools cannot be expected to solve this, and this sentence puts the ball very much back in setuptools court.

We can also solve it by just not caching anything between builds, which I think is both Paul and I’s preferred option.

1 Like

But that would be a setuptools feature request, and one that would will be very unpopular for source tree builds I’d guess. :man_shrugging: I get less problem for the build frontend but is really what’s best for the end user? Is what I think @pganssle is getting at, and I agree with him on that.

Incremental builds are not inherently best for the user. They are faster on rebuilds, but they also introduce subtle edge cases where things will break for users, particularly in strange, hard to debug ways. I could just as easily say that what’s best for end users is that the output of the build takes a little longer, but more consistently produces the correct artifact.

I tend to agree on the little. But I define little in ~1s. In practice on c-extension rebuild little may mean a lot more.

Most C extensions in the Python ecosystem are relatively quick to build. The ones where it takes an appreciable amount of time are the exception not the rule.

1 Like

I’m confused. I don’t think I said that, but if I did, I’ll stick with “I’m confused”, because I guess I must be :slightly_smiling_face:

From a front end point of view, I still think that pip should just be calling build_wheel and getting a wheel back, and that’s it. Pip’s users have pointed out that we can’t just do that in the case of setuptools, and so we currently copy the build tree. We’d like to not do that (because copying the build tree triggers a whole load of other issues our users care about). But if setuptools doesn’t work in such a way that we can call build_wheel multiple times on the source directory the user supplies, then so be it. PEP 517 isn’t explicit on the backend’s responsibilities so we have to live with that.

The current proposal may or may not be linked to that issue. It’s designed to help with incremental builds, at least for setuptools (I don’t know if any other backends would benefit). I’m fine with that. The proposal doesn’t say anything about how front ends would use the interface, so to that extent “what pip will do” is irrelevant here. It’s possible that pip will choose to use the interface in a way that doesn’t make incremental builds any easier with pip (maybe as a result of trying to use it as a workaround for the issue of copying the source tree). If that’s a problem, then the proposal needs to make more demands on how front ends use it. It’s the converse of the problem we have with build_wheel not specifying enough restrictions on the back end, I guess. And I suspect we’ll hit the same problem, that we won’t be able to get consensus on over-constraining front ends.

I’m not sure this aspect of the discussion is particularly productive any more, though, so I suggest we let it drop.

I was referring to @dstufft under:

Ah, OK. Yes, it’s quite possible @dstufft knows more about this than I do :slightly_smiling_face: (But I suspect there’s still a middle ground where setuptools could make changes to reduce the chance of issues, if not eliminate them totally).

Anyways I’ll write up a PEP by next Monday :+1: On the need to add this cache interface and we can go from there.

2 Likes

Can you link the PEP?

I just wanted to give a heads up that I actually took the effort the write up the PEP, and I will post it for public review later this week once I get the first batch of reviews on it.

Hi, I created a PEP draft. See here TBD: Persistent cache for packaging frontends by gaborbernat · Pull Request #1976 · python/peps · GitHub (also looking for someone who is willing to sponsor it and believes in the need and solution for this).

Some thoughts.

  1. This is not backward compatible. If a frontend supplies this argument, and the backend is an older one that isn’t aware of this new requirement, it will fail. The frontend has no way to know if this is the case. The original proposal suggested having an attribute on the backend, but we should probably look more closely at how we evolve PEP 517 - adding a whole bunch of feature attributes over time doesn’t seem very scalable.
  2. It’s not clear to me what the frontend requirements mean - “The build frontend must persist the content within the cache directory, such as subsequent calls can recall data saved in prior calls” vs “The folder must be empty during the first backend call”. How long must the frontend persist the data for, and based on that answer what counts as the “first” backend call?
  3. I think the “motivation” section needs improving. I’m not convinced that “the backend pollutes the source directory” is a compelling argument (for pip’s use cases, at least) - there’s plenty of tools that add directories to the user’s source directory - tox, mypy, and coverage, for example. Why are build backends so special?

For what it’s worth, from pip’s point of view I don’t see any clear benefits, and a reasonable amount of complexity in implementing the feature, so I’d expect that we’ll ignore this option.

1 Like

Because pip copies the entire source directory to a temp folder this is less important for pip. The idea here a UX benefit of doing a package build does not pollute the current working directory with build backend cache files, but instead allows the frontends to move it to a more hidden location that’s managed by the frontend (for example tox would move the build and .dist-info folder of setuptools into the .tox folder).

We won’t be copying the source in future, but I still don’t see this as being useful for us. But :man_shrugging: if other frontends want to clean up the source directory, who am I to argue :slight_smile:

One thought I do have, what if someone uses tox to do their testing, and pip to install their project? If tox is using a different build cache, does that mean there’s a risk that behaviour could be different because pip and tox supply different cache directories? For example, testing with tox using a clean cache everything looks fine, but then pip wheel builds an invalid wheel because there’s out of date stuff in the cache pip uses - which will likely be the default of the source directory?

I’d imagine it’s the responsibility of the build backend to ensure the cache it starts with is still up to date and valid. So a such use case in my books would be classified as a build backend bug.

1 Like

I also think it definitely won’t be useful for pip. But it’s very useful to frontends that build a project repeatedly, e.g. for development purposes. For example, a dev tool can be run in the background listening for source code changes, and incrementally rebuilding (+ reinstalling) the project into an environment for testing, similar to how npm-watch and sphinx-autobuild. This will work better for native modules than editables (because the extensions still need to be rebuilt), and extra powerful when combined with other modification-watching tools like pytest-watch.

I might see a world where pip might not set this by default, but could allow the user to set it via a flag; for all the automation workflows that use pip under the hood to build and install their packages :thinking:

I would hope those tools stop doing that altogether instead. pip is really not a good tool for that (way too slow to launch repeatedly, for one).

3 Likes