PEP 517 Backend bootstrapping

njs · February 2, 2019, 12:30am

I’m worried that this part does add finicky edge cases… at a minimum, you need to say what the frontend should do if importing the backend mutates sys.path so that sys.path[0] is no longer the bootstrap location. And what if the backend does want to keep that directory on sys.path? For example, lots of large packages eventually start doing lazy imports, which could be a mess if directories are randomly disappearing from sys.path. Removing things from sys.path is a really rare thing in Python in general. Most packages are going to assume it never happens, and may react in weird ways.

In software, it’s usually better to do things in the most boring and unsurprising way if at all possible, which means setting up sys.path once at the beginning of your program, and then leaving it alone.

You’re right, it doesn’t have to be nice. If there was no nice option, we would pick a non-nice option. But, that doesn’t mean we should intentionally pick the ugliest option! We should pick the nicest of the options available to us.

We want to encourage lots of build backends to be developed. The nicer we can make the experience of developing a build backend, the better the packaging ecosystem will be for everyone.

I agree that your "<backend>" hack is easy to implement, but compared to build-backend-location it’s much harder to explain or remember or use – it’s full of arbitrary quirks. As the Zen says: “If the implementation is hard to explain, it’s a bad idea.”

And I still haven’t seen any example of a problem that people are worried will be caused by build-backend-location? I get that you’re worried that if we implement it, then people might use it, and you think they shouldn’t do that, or that it would be ugly or something. But I don’t know what bad consequences you’re afraid will happen if people use it, beyond “if people do an ugly thing, it will be ugly”. Do you have any concrete examples of a problem that it could cause?

A lot of people (especially at large companies) are wary of using pre-built wheels, because they want to keep reliable records of which source code they’re running in production, so that they can do things like audits (manual or automated). Using pre-built wheels can break this chain. For example, the npm folks recently dealt with a really severe compromise of the event-stream package, and part of how the malicious code was hidden was that it was only injected into the “pre-minified” package on npm, not the “source” package.

Of course there are ways to work around this; a company that wants to bootstrap their internal Python builds could add some sort of manual exceptions for the “root” packages, or build them by hand. Or they could do some recursive bootstrap thing, like C compilers do. These approaches all involve significant manual hassle, and introduce a lot of room for errors, neither of which are things that tend to make security auditors happy people. (Rust relies on recursive bootstrapping for their compiler, and it’s been such an issue that people have actually built an entire second compiler for the sole purpose of simplifying the bootstrap chain and checking for “trusting trust” attacks – Reddit - Dive into anything).

Also, this hassle scales with the number of different build backends in use. We want to encourage people to design new build backends. We definitely don’t want to end up in a situation where companies refuse to use any package that doesn’t use one of a small set of pre-selected build-backends that they’ve already set up special bootstrap hacks for.

So… you’re right that it in principle we could get away without having any special support for bootstrapping in PEP 517. But given that it’s simple to add, has minimal downsides, and will significantly simplify a lot of people’s lives, I think having some special support is a good idea.