Proposal: Adding a persistent cache directory to PEP 517 hooks

Currently, the PEP 517 can specify a wheel_directory, and I believe it is guaranteed to be executed from the repository root (though I think pip first copies your directory into a temporary directory so the “repository root” is ephemeral anyway), but there is no standard way for front-ends to pass a non-ephemeral location other than the repo root to the backends.

One problem this causes is that setuptools will generate a bunch of build detritus directly into your repo root like build/ and/or lib/, which is not a great practice. I think this also makes it so that it is difficult for pip to cleanly implement any sort of incremental builds, because the only available location for the build detritus is the repo root.

There is open issue in setuptools to allow moving the locations of these folders, but I am worried that this will end up with tox or other front-ends writing setuptools-specific code to pass these options in order to avoid polluting the local development environment. I think we can solve both this problem and the “allow incremental builds” problem by adding a new “persistent cache directory” which is created by the front-end and passed to the back-end. The idea would be that the front-end should create a directory where the backend can store expensive-to-create objects that should persist between builds. It is up to the front-end when this cache is cleared.

I think there are two options:

  1. We modify the hooks in the PEP 517 build interface to add an optional cache_directory parameter, like so:

    def build_wheel(wheel_directory, config_settings=None,
                    metadata_directory=None,
                    cache_directory=None):
        ...
    

    For backwards compatibility, I think backends that support this feature would maybe have to specify something like backend.__SUPPORTS_CACHE_DIRECTORY___ = True.

  2. We add a top-level configuration function that can be called prior to calling a build hook that would configure this or other global options, possibly like this:

    def set_backend_configuration(*, cache_directory=None, **kwargs):
        ...
    

    In this case frontends would just check for the existence of set_backend_configuration and pass options if and only if it exists. This version also has some built-in forwards compatibility by taking arbitrary keyword arguments that will be ignored if unsupported (possibly with a warning). We could also add a get_backend_configuration_options function so that frontends could have different behavior based on what the backend supports.

We don’t have to specify what backends should interpret a missing cache_directory as, though I imagine setuptools would default to the repository root. I’m thinking that this will allow tox to specify a cache manager in some per-env directory under .tox and pip could easily grow a flag like --incremental-builds that turns off the isolated build behavior of copying the repository into a temporary directory.

I am OK with writing a new PEP since this may be beyond the scope of what we want included in PEP 517 itself, but I’d like to hear thoughts and criticisms before moving to the “draft a PEP” stage.

CC: @bernatgabor @jaraco.

As the initiator for this change I’m very much plus one on this. We’ve talked this with @pganssle in person at the core dev sprint. I think, I personally, prefer adding the cache_directory=None to all PEP-517 interface end-points. All functions in there for now seem to not depend on another function being called beforehand, and I think we should keep that.

Sounds like a reasonable idea. There was a fairly long discussion during the PEP 517 debate about incremental builds, which might be worth hunting out (it would be in the distutils-sig archives).

From what I recall, part of that thread was linked to pip’s copying of the source directory, which makes it impossible for backends to persist build artifacts in the build directory - but not copying risks the possibility of inconsistent builds if stale build artifacts are picked up. The general view was (from what I recall) that front ends should trust backends to handle the staleness issue themselves, but it didn’t really matter that much, because pip wasn’t changed to build in place - and longer term, pip is likely to build via sdist rather than in place anyway.

This proposal offers an interesting alternative, and probably expands the options available to front ends. But I’m not that sure how many options we need - it’s not like there are many front ends, for better or worse. You mention tox - is tox doing direct PEP 517 builds these days? (If so, I’m encouraged in some ways, it’s not healthy if pip’s the only frontend around). I’m not sure how pip would expose something like this - pip’s UI is already very cluttered with options that are only needed in edge cases, and this seems like it could be another one.

Regarding PEPs, I’d say this would be best done as a standalone PEP extending PEP 517 (“Adding persistent build cache support to PEP 517 backends” seems like a good title) that would then result in an update to the build system interface page that should be linked from the packaging specifications reference (but isn’t yet :slightly_frowning_face: - hmm, PEP 592 should probably also be mentioned in the repository API page…)

1 Like

tox definitely has it’s own PEP-517 build system, and tox 4 will keep this. We really only need one option, a cache directory that the backend can write additional content too. I would envision three folders for now in there in case of setuptools: dist, build and the x.egg-info.

1 Like

We already have a config settings dict that gets passed into every backend hook. Maybe we just need to standardize a "cache_dir" key? That might simplify the backwards compatibility issues.

2 Likes

It’s not just about having the key, but writing it down and stating that backends must respect it (e.g. it’s fine to have the source tree read only as long as the cache dir is writable, a backend build should still succeeds). Can be under the config setting but that potentially raises backwards compatibility questions. :thinking:

I did consider this, and it’s a possibility, but I think it would be best to reserve the config_settings namespace for the backends themselves and not have it be a mix of standard “reserved” keys and arbitrary backend-dependent keys. It also means that we can’t continue to add things there as necessary, since the more PEP 517 backends there exist, the less we can be sure that our new standard keyword wouldn’t conflict with an existing setting in the backend.

I also think that at least for this case, we should make it possible for backends to signify which post-PEP 517 features they support. For example, I could imagine that pip might want to use the “copy to a temporary directory” behavior for isolated builds with backends that don’t support a persistent cache directory, but simply pass the cache directory to backends that do support one.

For forward compatibility reasons, how about changing it like this:

SUPPORTED_FEATURES = {
    'config_settings',
}

def build_wheel(wheel_directory, config_settings=None,
                metadata_directory=None, **kwargs):
    cache_directory = kwargs.get('cache_directory', None)
    ...

(You could also spell this with an explicit cache_directory=None in the signature and it won’t make any difference). We’ll document in the spec that kwargs are reserved for future specification-standardized keywords, which will all come with an accompanying feature flag.

I think that this “no way to persist build artifacts” thing will be a real problem for adoption of PEP 517 by some of the bigger extension-based projects like scipy, matplotlib, numpy, pandas, etc. Some of these take a very long time to build, and slowing down a developer’s update-build-test cycle would be a huge drag on development.

I agree with you about the complicated UI and options and whatnot, but it’s a pretty complex tool. It might be worth talking to a UX designer about this sort of thing (though I think the bigger issue here is the combinatorial complexity of the test matrix - with a dozen binary flags you’re looking at 4096 different possible run states). I really do think pip needs this option, though, because the solution right now is "stop using pyproject.toml", which is not sustainable in the long term.

2 Likes