PEP-517 - do not enforce fresh subprocess calls

Taking advantage that the PEP is still provisional I propose to remove this guarantee from it:

Frontends should call each hook in a fresh subprocess, so that backends are free to change process global state (such as environment variables or the working directory). A Python library will be provided which frontends can use to easily call hooks this way.

I feel this guarantee offers little benefit at a very costly price. The benefit only affects build backend maintainers, who no longer have to care about the global state. However, the drawback is effecting the entire python user base, because now all frontend-backend interactions need to pay the full interpreter startup/teardown (plus imports) price every time. On a normal tox run for example tox needs to do the following 3 calls:

  • get-requires-for-build-sdist
  • prepare-metadata-for-build-wheel
  • build-wheel

This overhead (on my high-spec MacBook Pro) is around 50ms, but things get much worse on a Windows machine where starting subprocesses is even more expensive. And this is an overhead that must be paid on every tox invocation. I think it would be beneficial to pay it just once, rather than 3 times.

Now from the POV of the build backend, this seems not that expensive to handle. You can wrap every PEP-517 hook with a os.getcwd/os.chdir and os.environ mangling. It’s a one-time cheap cost, but most backends don’t mangle the global state, either way, most of the time. I think at the very least we should allow build backends to opt-out of the need for the fresh subprocess on every call.

PS. I did do a POC of an interface that does not require fresh subprocess calls, https://github.com/gaborbernat/tox/blob/p-impr/src/tox/util/pep517/backend.py#L43-L74, and excluding some rare edge cases it actually works fairly well.

I think as far as guarantees go it would suffice to enforce that the PEP-517 invocation part (the part that invokes the backend method in the subprocess) will not alter the global state.

8 Likes

Strong +1 from me.

This was one of the discussions I wanted to bring up at an earlier date, but life got in the way. While this would need some library updates to take advantage of this and maybe some build backend updates, the performance benefits here would be worth the additional cost of enforcing correctness on the build backends.

2 Likes

I disagree. Most backends will likely impact sys.modules.

What should frontends do when needing to build wheels for 2 different packages requiring 2 different versions of setuptools ?

@bernatgabor This doesn’t necessarily change the overall point, but I think you only have to do 2 calls, not 3? prepare_metadata_for_build_wheel is only provided as an optimization for when frontends want to get the wheel metadata but don’t want to pay the cost of building the full wheel; the work it does is a strict subset of the work that build_wheel does.

Whenever the build requirements are not same for two packages the frontend will always provision two separated isolated build environment. In this case you can have two parallel backend process running with whom the frontend might interract.

True you can optimize away one of the calls, however only if you’re bound to build a wheel afterwards. Sometimes though you might to get the metadata before you build the wheel or will build an sdist. Imagine for example when you’re building an sdist not the wheel, but you still want install requires metadata.

At the moment, the section you quote is non-normative, so it’s perfectly OK (in principle) for frontends to not use a subprocess call. However, in doing so they are relying on the backend not “messing things up”. If we want to make it easier for frontends, then we need to make additional restrictions on backends, and we have to be careful to define what those are. Saying that “backends must not mess with global process state” is insufficient, because what constitutes “global process state” isn’t well-defined - in the past pip has cared about stdio data that’s not passed through Python’s IO mechanisms (consider the output of a C compiler called without IO redirection), and whether the process creates extra threads, as two further examples.

One alternative approach, which would work perfectly well with the current PEP 517 design, would be to implement one persistent, dedicated subprocess for each isolated build environment, and have that subprocess communicate with the backend via hook calls and with the frontend via a dedicated IPC API. The persistent subprocess can do the work of preserving the process state around API calls. That’s basically what the current pep517 library does, except that it doesn’t use a persistent subprocess, but instead uses a subprocess per call.

To be honest, I think your requirement would be better handled by “someone” enhancing the pep517 library to work like this, rather than getting bogged down in debates over how to modify the standard.

(I’m somewhat interested in doing something like this myself, but my time is sufficiently limited that waiting for me to deliver anything more than a prototype might be ill-advised :slightly_frowning_face:)

Edit: But I should say that like @pradyunsg I’m a strong +1 on doing something to reduce the current process creation cost of using PEP 517).

1 Like

This is what I did in my poc if you check the link in my initial post. I have it working and likely will release it as competing library to pep517 (mostly because has totally differing API, as my implementation also returns stdout/stderr).

How so? It’s seems written as guarantee. I want to avoid the case when a backend breaks and the bug report is closed with the frontend is violating PEP-517 guarantees.

My bad I misread the proposal :confounded:

This would be cool. I’m not sure about supporting multiple runs per process in enscons. It is working well for you with setuptools? (Don’t we have to trap a system exit call to get results from setup()?)

Works for me most of the time, though did find some bug tickets on setuptools issue tracker for some cases, seems to not have such trap calls for now. For example https://github.com/pypa/setuptools/issues/1800. Hence me wanting to reach on agreement on this here, because I’d like to call setuptools in the need to improve here, rather than allow them to close the issue with PEP-517 entry points must be called in fresh subprocesses.

Have you tried starting a persistent subprocess and forking it to run the individual pep517 tasks?

I did not. I’m not sure what’s the fork performance, on Windows especially, and perhaps we can avoid the need altogether.

If the main motivation here is to reduce subprocess startup overhead on Windows, then fork() doesn’t help much :slight_smile:

Obvouisly windows doesn’t do fork

Obviously I’m looking for solutions that perform similarly good on all platforms. :relaxed: Windows is not a second class citizen please :pleading_face:

Remind me whether tox installs everything every time? Couldn’t it skip ahead to building the wheel and backtrack if there was an error, calling the backend only once?

You can do that if you’re installing a wheel. If you’re testing an sdist you cannot. That’s also a valid use case for tox. Besides what if you want to have a daemon in the background, automatically rebuild on file changes and automatically install? In this case no point forcing the backend startup every time.

Suggestion: tweak PEP 517 to say that IF get_requires_for_build_sdist return [], THEN it’s allowed to reuse the same process to immediately call prepare_metadata_for_build_wheel or build_wheel.

That majority of projects don’t have build requirements, and for the ones that do you need to go install some packages anyway before you continue, so restarting the interpreter is probably a good idea.

1 Like

What about backend daemons for continuous build? Can we just allow keeping the same process as long as the frontend wants. It’s up to the frontend to ensure the backend has its dependencies, and a backend process restart is not always needed for this.

You’ve tried it out. When does the persistent process fail and in what way?

If it’s installing a sdist does it fall all the way back to setup.py install without caching the results?