PEP 517 Backend bootstrapping

Time to tickle the sleeping dragon. :wink: What follows is a summary of the state as I understand it:

There haven’t been any posts disagreeing with @dstufft’s proposal a few days ago. There is some discussion over whether the specification should say projects MUST support being built purely from sdist (avoiding build cycles), or if this is merely a SHOULD. @njs points out that we can’t realistically enforce it, so the practical difference is limited.

I see on the discussion of setuptools PR 1674 that @pganssle feels somewhat disillusioned with this thread. Perhaps it’s never a true packaging discussion until it’s over 100 messages long and we’re all fed up with it, but Paul, I hope you’ll rejoin the conversation. We build stronger tools when we consider more perspectives, even if we can’t satisfy everyone.

As I understand Paul’s position, he is concerned that an easy way to put the source directory on sys.path will be cargo-culted to let setup.py import the source of the package, a headache which setuptools is trying to get away from. His preferred options are to force installing wheels for build dependencies, letting people who insist on ‘from source’ builds figure out bootstrapping themselves, or to have a special sentinel value of build-backend= which tells frontends to use a specially named file in the source directory as a build backend.

Paul, please correct me if I’ve misrepresented your position.

I think we have basically agreed that we want to make some provision for bootstrapping within PEP 517, so we’re taking the ‘force build-deps as wheels’ option is off the table (discussion starting here convinced me). That leaves us with two basic options:

  1. A way to add one or more directories to sys.path before finding the backend (Donald’s proposal and variants thereof)
  2. A special build-backend name which tells the frontend to use a well-known filename in the source tree such as _build_backend.py (Paul’s proposal from some time ago). Please delay any bikeshedding about the specific names to use.

I’d ask all of us who have a preference to seriously consider the other option. Can we live with it?

Paul: I know I’m rehashing points already covered, but people using setup.py will have at least two other ways to get the source directory on sys.path: adding it in setup.py itself, and explicitly declaring the setuptools legacy backend. People who find their setup is broken and don’t want to think about fixing the issue are going to find ways to work around it. Is it so bad if there’s a third way?

(I tried to come up with some compromises that would let us add directories to sys.path only for finding the backend, but I’ve since been convinced that these are not worth the extra complexity)

People who prefer option 1 (including me): does option 2 have serious problems, or just cosmetic ones? I think pretty much everyone in the discussion besides Paul is in this camp, so if we can’t reach agreement we could presumably overrule his concerns, but he has put a lot of thought and effort into trying to get away from the problematic import-yourself-in-setup.py pattern, and he clearly believes that this option undermines that. If we decide to specify option 2, I know I’d go away and implement it for Flit happily enough.

:arrow_forward: Finally, I hope we can wrap this discussion up without returning to the hailstorm of messages it was for a while. I know I’m as bad as anyone in this, but I’m going to make an effort to read, think, and reply slowly. I hope you’ll join me. Exhausting people so they drop out of the conversation is not a healthy way to reach consensus. :slightly_smiling_face:

2 Likes

I think the two options are largely isomorphic with regards to the capabilities provided. If you have direct ability to add directories to sys.path and to set a background, you can mimic option 2 in option 1 by doing:

[build-system]
backend = "_build_backend"
python_path = ["."]

Likewise if you pick option 2, you can recreate option 1 by doing:

import sys

sys.path = ["paths"] + sys.path


from real_backend import api_methods

So I think ultimately it doesn’t matter in terms of what is possible. The only real difference is in what the UX of doing so is, which is largely “cosmetic”, but I think does matter to some degree.

I think I’m in the same place as Donald. I think cosmetic problems are real problems, because UX matters, and in Python we care about beauty :slight_smile:. But the problems are I see are all restricted to the UX, not limitations on what’s technically possible.

I’m okay with @dstufft’s original proposal and agree with the recent comments. Also, for the reasons @dstufft mentioned, I don’t expect it will be hard for pip to detect cycles.

Regarding whether build-backends SHOULD or MUST support building from zero, unless I misunderstand, my instinct is that this should be SHOULD. Like, if someone wants to do that for their personal project, shouldn’t we not prevent that? (Of course, for setuptools we want this to be MUST.)

On this one, though, I’m wondering why we wouldn’t want this to be a MUST:

Unless I’m misunderstanding, shouldn’t building from source mean that it wouldn’t possible with a cyclic dependency, or am I confusing “building from source” with --no-binary :all:?

I think pretty much the only difference will be semantics. It will really only matter in cases where wheels are not available (either because they were not published, or because someone used --no-binary to deselect them), and in either case if we say MUST or SHOULD pip is going to fail here. The only real difference is whether the messaging would be "this is a bug, please report it to the build backend (MUST) or "this is due to design decisions of the build backend, please provide feedback to them (SHOULD).

I think that’s pretty much literally the entire practical difference of doing it, so it’s possibly that using SHOULD is a bit of a footgun in the spec, but it’s one that can easily exist by people just ignoring the spec too.

You’re correct. I think this is another one of those semantic things. Technically if you don’t check for cyclic dependencies you’ll just recurse forever trying to build the cyclic dependency chain because nothing will break the graph. SHOULD basically just allows for frontends to possibly be lazy at first and/or if someone comes up with a novel approach then it maybe allows it?

But ultimately I think this one is basically entirely down to semantics too, I think every serious frontend is going to do cycle checking and reject cycles in the build backends.

I’m still at unease with allowing users to specify arbitrary paths on the python path. Why allow them to misplace it when 99% of the time this is going to be .. Instead, I would prefer this to be a simple binary flag. If the self-building mode is set we automatically add the root path to the python path, and trigger the currently in place mechanism. If the user wants to add additional paths (sub-folders?) he can at the start of his build script himself.

Granted, this leans more towards @pganssle proposal. However, I think would make things less error prone. I’m not entirely on board with the magic syntax of <>, so I would propose we add a simple by default self_building = false key, that when set to true triggers to above mechanism.

Note this would be trivial to add to pip too, it’s basically just if this is true set python path to the extracted source distribution. Everything else cam stay as it is now.

Circle checking should be trivial and I think should be mandated.

Except for the times when it’s ["src"] or [".", "vendored"] :slight_smile: So probably somewhat less than 99% I think.

I really don’t think there’s a serious danger that people creating their own build backends will be so confused and incompetent that they can’t be trusted to set up a python path. And if they do mess it up somehow, then they’ll notice really quickly, since a mistake will make their package completely unusable.

We worry a lot about protecting users from themselves, but that’s for mistakes that are easy to make, or are tempting hacks that will bite them later, or that are good for individual users in the short term but that create ecosystem-wide technical debt. Do you see any way allowing arbitrary paths inside the source tree would risk one of those things?

2 Likes

Yeah but those packages ca just do sys.path.insert(0, "src"), sys.path.insert(0, "vendored"). if they really want and it’s one less thing frontends need to worry about to validate.

@takluyver Thanks for taking the time to put this summary together. I fully intended to do so myself, but I have been ill, and unable to find the time to do so.

Hopefully, by the end of this week, I should be able to find some time to review the discussion. Like you, I really hope that @pganssle contributes his thoughts here, as I think he has some good points about the need to guide people away from difficult-to-support practices while still providing backward compatibility, but I do feel that option 1 is where we’re heading in terms of consensus - I’d just prefer that it were possible to address Paul’s concerns rather than simply accept that he’s outvoted.

1 Like

@bernatgabor It’s clear that the different options proposed all ultimately allow the same capabilities, because the code that runs can smooth over any differences. But I agree with @njs - this doesn’t seem like a particularly appealing or likely footgun. It’s easy to get it right (except for Paul’s concern, which also applies to your proposal), and there’s no obvious reason to set a wrong value.

I mean I’m very open on this, but feels to me we pushed back on adding python path to PEP-517 initially, and now somehow everyone is like, you know forget it, let’s just do it. I myself still think that all build backends should be satisfied as wheels and be done with it. However, if majority wants source only builds for backends too, let’s make it self documenting and hard to miss-use is my angle of approach.

Once we add this capability setuptools legacy backend is moot. And now we have two things to do the same thing. People will start using this magic flag also when they want to do some fancy thing for their own build, not just self build backends.

There’s definitely a certain amount of appeal to this (it’s what PEP 517 originally assumed, implicitly) but I’ve yet to see anyone explain how we make that work.

Specifically, we need to consider:

  1. Frontends like pip that have a “use only sources” option - these will need to make an exception for backends. So how do they know what’s a backend?
  2. Not all build requirements are backends, and we can’t just say “all build requirements must be satisfied by wheels” because we know there are use cases where that isn’t acceptable.

Are you able to tighten up your proposal to the point where it covers these points, and could be a 3rd option?

We’ll my proposal actually would be that all build requirements must be wheels, so that’s cut short.

I see the whole thing as a risk of that :blush: (primarily the technical debt side, as we start seeing more arbitrary code in builds again that doesn’t get updates like a separately maintained backend would - I’d be much more comfortable if we scanned the custom backend for “import distutils” :wink: )

That said, if we do it this way, my custom stuff would go in a “build” or “scripts” directory, to keep random importable files out of the top of my repo. Since nobody has even mentioned those, let alone proposed them as the one-true-path, I’m very much in favour of letting me type exactly the path I want.

I think I’ve missed or forgotten something. What use cases do we know of that would accept build backends as wheels, but not other build dependencies? The cases I remember will either reject all prebuilt wheels and figure out ‘manual’ bootstrapping if necessary, or might be persuaded to accept pure-Python wheels as source.

If we decide not to add any bootstrap mechanism and to rely on wheels for build dependencies, my view is that the only reasonable approach is to apply that to all build dependencies. If you allow some build dependencies to be wheels, anything other than get-them-all-as-wheels is added complexity. Although if we’re catering for use cases that might accept pure Python wheels as source, maybe that means you can’t have extension modules as build dependencies.

The ‘just use wheels’ option seems less and less of a ‘just’. We implicitly picked it before by not specifying anything else, maybe because we were all exhausted by the PEP 517 discussion. But I think figuring out what it actually means and implementing it may be more work than any of the various self-bootstrapping options people have proposed.

[Aside: Maybe we should figure out more convenient options to control package selection in pip, so it’s not trying to bootstrap everything from source when the user doesn’t really need that. I’ve started a separate thread for that.]

None that I know of. My comment was more intended as a clarification that “build backends must be wheels” isn’t a practical restriction we can make. It’s either “build requirements must be wheels” or “we have to allow everything to be sdists” and nothing much in between makes much sense.

I’d been planning to put together a summary and proposed way forward this weekend, but illness has meant that I might not get to it for a little while longer. What I was planning on saying on this subject was something along the lines of:

Build frontends MAY require that all build requirements are available as wheels (and hence do not require a build step of their own to install), but this is not expected to be the norm, as such a frontend would not support the “build everything from source” use case. Build backends MUST NOT require such a frontend, and MUST ensure that they can be built in an environment where pre-built wheels are not available.

Projects MUST NOT have cycles in their build requirements, and frontends SHOULD fail with an error if they encounter such cycles (however they are not required to detect all cases where a cycle occurs - if satisfying a build requirement with a wheel avoids hitting a cycle, it is acceptable for the frontend to not report any issue).

That’s the framing assumptions I was planning to include - the need for self-hosting basically comes out of these, so with these up front, we’re in a good place to define how we want to provide a solution for that problem.

Thanks Paul, and sorry to hear that you’ve been ill. I hope you’re recovering!

I’m not sure how useful it is to specify that. If the norm allows installing build deps from sdists, then it won’t be obvious to project authors whether all their build dependencies are available as wheels. If another frontend requires build-deps as wheels, it will fail to build some valid projects. Maybe there’s a use where someone can ensure that a private index is populated with wheels of the necessary projects?

As you say, I think we’re converging on a need for some system of self-hosting backends. Does anyone want to make a concrete case for ‘build dependencies must be wheels’ and completely rule out self-hosting?

If we are ready to define a self-hosting mechanism, I think we have three proposals now (again, let’s postpone bikeshedding over the exact names):

  1. Extending sys.path: python-path = ['.', 'vendored'] (preferred, I think, by @dstufft, @pf_moore, @njs, @steve.dower and myself, plus @cjerdonek is okay with it)
  2. Special backend token build-backend = '<bootstrap>' referring to a well known file like _build_backend.py (@pganssle’s proposal).
  3. Boolean self_building = true flag, which adds the root of the source tree to sys.path (@bernatgabor’s proposal). This is deliberately a less flexible version of option 1.

All three options have technically the same capabilities, so it comes down to elegance, clarity, and the potential for mistakes. I think we’ve all now got fairly set preferences, and we don’t seem to be effectively persuading one another. So I look forward to hearing your proposed way forwards! :wink:

1 Like

I’m not convinced this is true yet - practicality beats purity would suggest that “rely on something that’s already published” is okay, and I think everyone who may have a real need to do the initial bootstrap has said we’ll figure the first step out. And if backends include a script to use themselves directly to self-bootstrap, rather than go via pip, then that’s fine (and helpful, from my POV).

So I think the fourth option is “build requirements must come from wheels”, which would mean this discussion is done and we can all go argue on pip’s issue tracker that --no-binary :all: doesn’t apply to build requirements instead :wink:

I would be strongly against pip ignoring the --no-binary option for build dependencies, so I don’t see that option as workable at all.

1 Like

Then you’ll need another option for “build the packages and dependencies I requested from source, but don’t bother building setuptools from source yet again” anyway. I don’t think anyone really expects multiple nested isolated builds (given nobody really expects the first layer of isolated build yet).