PEP 517 Backend bootstrapping

Yes, but again only the bottom of the stack needs to be self-bootstrapping. intreehooks is a perfect example of what I’m talking about because it’s basically one backend that is self-bootstrapping that allows every other backend to get the semantics they want for their own backend. You are talking about allowing people to break the build isolation for specific backends in the configuration file. I am suggesting that PEP 517 should always be calling the hooks without manipulation of the PYTHONPATH, which should be the sole domain of the backend.

If you want to understand the part about the semantics, I suggest reading the many times that I explained it earlier. I have no idea why you are so insistent on making it so the front end’s search path for the backend leaks into the backend’s import path, but that has always been my objection. I see no reason why it needs to be a critical feature of the spec and if you want an option for it, you can super easily implement a backend that does that (in fact that’s literally what I’m planning on doing with setuptools).

I read this discussion as “should intreehooks be made part of PEP”, and I feel the answer might be yes. It is not a must, and things can be done in other ways, but I feel it is most intuitive to be able to do what interhooks currently does within the built-in section. I would certainly be rolling my eyes to read that I need another backend to build my own backend if I were new to this topic.

Because you’re proposing import semantics that are not normal Python semantics - the closest that exists is the isolated mode option, which disables the normal behaviour of putting the script directory on sys.path (or the current directory for -c, -m, stdin execution, and the interactive REPL), but in isolated mode, nothing gets imported from the directory that gets omitted from sys.path.

So @njs and I are both objecting to the “This one package can be imported from this directory, but nothing else can” idea from a “Let Python be Python” perspective, not anything to do with packaging specifically.

So I’m generally fine with Proof of Concept: Bootstrap backend from specified directory by takluyver · Pull Request #42 · pypa/pyproject-hooks · GitHub from an interface specification perspective (it’s very similar to the PEP517_SYS_PATH_0 workaround I added in Issue #6163: Temporary workaround for legacy setup.py files by ncoghlan · Pull Request #6210 · pypa/pip · GitHub), but the implicit sys.path.pop(0) is spectacularly weird from a backend execution environment perspective.

Instead, I’d prefer that frontends provide self-bootstrapping backends with normal Python import semantics (similar to the way PYTHONPATH already works), and let backend decide for themselves whether or not to do sys.path.pop(0) before running any project provided code. In the case of setuptools specifically, the way that would end up looking would presumably be:

  • setuptools.build_meta_legacy would make sure that the source directory was on sys.path, but otherwise not care
  • setuptools.build_meta would make sure that any source directory subpaths are not on sys.path, and direct folks to use their own sys.path manipulation instead

I have no idea why you think this is an option, but it’s not realistic. If the PEP 517 path semantics are broken by this change, we’ll simply have to live with it. Backends will have no simple way to tell whether an entry in sys.path was added there because the frontend thinks that it needs to be bootstrapped or for some other reason. The only reasonable place to enforce this is in the frontend.

There are lots of options proposed here that are well within “normal” path semantics (including the __build_backend__.py one) that don’t allow manipulation of the backend’s PYTHONPATH as part of the project configuration. You won’t be selling me on anything that allows a project to manipulate an arbitrary (non-bootstrapped) backend’s sys.path in the configuration file to solve what is essentially a non-problem anyway. pip can just make it so --no-binary :all: doesn’t work for build backends, or do some cycle detection and fall back to a wheel in the case where a project is trying to bootstrap itself - all without any changes to the PEP. If everyone’s so gung ho on making it easier to write backends that can build themselves, the easiest way to do that is to make it so that they can simply declare a dependency on their existing wheels - then they don’t need to maintain any sort of complicated bootstrapping operation in their source tree, they just have to ship a wheel once.

Note that since setuptools declares a build-time dependency on wheel, which itself is built by setuptools, I’m not even sure the in-tree bootstrapping will work without vendoring wheel. This problem is neatly sidestepped by either disabling --no-binary :all: while satisfying the build requirements or disabling it to break cycles. The only downside to the wheels approach is that it will be harder for some superstitious companies to build their entire tree from source, and frankly that’s not a use case we need to subsidize.

Can I suggest that we put this discussion on hold for a couple of days and then return to it with cooler heads? The messages are flying thick and fast, and we’re getting “I can’t see why you’d think…” kind of posts - the atmosphere is on the way to getting confrontational, and we’re not going to decide anything this way.

I’d like to take the (rest of the) weekend to think through the options, read what other people have written, and see if any new ideas or compromises come to mind. I don’t think this is an urgent one to solve now, and pip/setuptools have more pressing problems to resolve (edit: I’m thinking of https://github.com/pypa/pip/issues/6163, for instance).

2 Likes

For the specific case of Red Hat, there’s already a multi-step bootstrap for new Python stacks in order to get initial bootstrapping binary RPMs built without docs (to break the circular dependency on sphinx et al), without tests (to break circular dependencies on pytest et al), and without ensurepip (to break CPython’s own circular dependency on setuptools and wheel). Only once that core library set has been constructed does it then go back and rebuild them all properly, using their limited-but-good-enough-to-self-host bootstrap variants to build the fully functional versions. (For the curious: rpm-list-builder/python37.yaml at python37 · hroncok/rpm-list-builder · GitHub )

I’d expect any organisation with a “we build everything from source” policy that extends all the way down to core language ecosystem tooling to have a comparable technical capability, and if they adopt the policy without investing in the capability, I agree that’s their problem, not ours.

That means I’m a fan of the idea of making the enhancement here a way for projects to indicate that build requirements must be resolved from the wheel archive, and not built from source, even if a frontend has been asked to build everything from source.

That then makes the key remaining PEP 517 question on the bootstrapping topic the following: should we specify exact conditions under which a frontend MUST consider a build dependency unresolvable and fall back to installing from an available wheel archive?

In addition to promoting consistency across frontends, the benefit I see to our doing that is that projects like pip that need to modify the behaviour of options like --no-binary to still allow binary build dependencies will be able to point to the relevant section in PEP 517 as the rationale, rather than having to defend themselves to their users on a case-by-case basis.

The first clause I think we should add is something like:

Frontends MAY be implemented such that all declared build dependencies are installed from binary wheel archives and never implicitly built from source.

That lowers the barrier to building compliant frontends (those frontends just won’t be able to build some projects, for the same reason pip 18 couldn’t build them).

And then the second clause would be something like:

Frontends that allow for declared build dependencies to be implicitly built from source MUST fall back to instead installing from a binary wheel archive if the project’s own name is listed in build-system.requires (regardless of any other settings that have been given to the frontend).

Given such regression terminating behaviour in pip, both setuptools and wheel would terminate the current infinite regress, since they’re implicitly added to build-system.requires if there’s no build-backend set.


[Reordered this to put it after the wheel related discussion, as I now doubt we’re going to go in the direction of explicit self-bootstrapping, and instead rely on binary archives to terminate regressions (the same way RPM et al already do).]

I still don’t understand why you view this approach that way. The only code that runs in the backend process before the backend’s own code is frontend wrapper code, and even a modified PEP 517 would still tell frontends to keep in-tree paths out of sys.path unless bootstrap-backend-location had been set. Thus even without any help from the front end, backends would be able to infer the use of the setting (or a functional equivalent like the intreehooks meta-backend) from the presence of in-tree paths in sys.path.

It would also be possible to make the PYTHON_BOOTSTRAP_BACKEND_LOCATION environment variable a defined part of the PEP rather than a pep517 library implementation detail.

That said, I think we can put aside that entire digression, as I agree with you that having the ability to force the use of wheels for build dependencies is a more promising direction to resolve bootstrapping issues.

1 Like

I think this is sufficient, but I’d like to head off at the pass the potential objection that there could be cycles of length > 1. For example, if setuptools were to depend on intreehooks, and intreehooks were to depend on setuptools, you’d still get an infinite regress.

Practically speaking, I think we can just not worry about it, because again very few build backends will actually need to build themselves and the ones that do are almost certainly going to be incredibly conservative with their declared dependencies because the nature of the tool makes adding dependencies a pretty painful experience. If front-ends want to do longer-range cycle detection that is an option they have, but I see no reason to mandate it as part of the spec.

Yeah, agreed - at the spec level, we’re aiming to make it possible for well-behaved backends to let frontends know where to stop, and “Include the project’s own name in build-system.requires” should suffice for that.

Frontends may still want to be more paranoid about dependency cycle detection, but it would be a general “add arbitrary length cycle detection to handle misconfigured backends that haven’t flagged themselves as part of a bootstrap cycle” feature that a robust implementation might want, rather than the bare minimum that the interoperability spec requires.

I don’t think I understand why we’d say some things can just only be installed from wheels when any of the options first listed allows breaking thst cycle anyways. While I have preferences for which of the options, going the route of having to provide wheels for these even if you’re building from source seems like it makes the UX worse not better.

I also don’t think it’s any business of the spec what frontends are willing to install or not. It should be perfectly fine to have a front end that only supports wheels or only supports sdists as desired.

My preferred option doesn’t provide a general option that I feel like will be an attractive nuisance. That being said, I think even the generalized python-path config is better than just sort of throwing our hands up and saying we just can’t build somethings from source.

1 Like

@dstufft I’m fine with the bootstrap-backend-location approach I proposed above (and @takluyver wrote a rough draft implementation of) , but @pganssle doesn’t seem willing to use it for setuptools without a rider of requiring weird dynamic sys.path manipulation in frontend implementations (which I’m opposed to while wearing my “import system co-maintainer” hat), and if setuptools won’t use it, then there’s no point in adding it to the spec.

So bootstrap cycle detection and breaking is a compromise that setuptools would be willing to rely on.

My objection is not to setuptools using it, it’s a perfectly fine mechanism for setuptools. If I were to refuse to use the mechanism when it exists, it would be out of spite, and even if that were in my nature (and hopefully it is not), I am just one of three setuptools maintainers, and I have not consulted the other two about this.

My only objection to any of the path-manipulation mechanisms (other than the fact that they are largely way over-designed) is that it makes it possible for the front-end to manipulate the semantics of the backend as a side-effect of the implementation.

As @dstufft says, I think this thing will be an attractive nuisance, and whether or not setuptools even uses the thing, we’ll probably bear the brunt of the support burden on it, since most of the edge cases will be in setup.py invocations.

It’s probably fine to leave it out of the spec entirely, but one thing to note is that this would mean that the backends can unilaterally choose to break one version of this or the other, all while remaining in compliance with the spec. If setuptools chooses to add setuptools to the build-system.requires, practically speaking it means that no one can write a front-end that only builds from source without hard-coding in setuptools as special. Might as well add it to the spec at that point.

+1. I had said that I would look at making a decision this weekend, but I don’t think at this point that we’re anywhere near consensus. There’s no rush on this, so I think a break to let things settle is the right thing to do. So, for clarity, I will now definitely not make a final decision this weekend.

Right now, setuptools can build its wheels for distribution using setup.py bdist_wheel and (thanks to Install with --no-binary ignores PEP 517 build system · Issue #6222 · pypa/pip · GitHub), pip users who use --no-binary :all: have no access to PEP 517 anyway. So let’s not give this issue a false sense of urgency.

One thing I will say is that, having thought it through, I don’t think the “backends must be supplied as wheels” option is enforceable (unfortunately). The problem is that there’s nothing in pyproject.toml to say whether an entry in build-system.requires is a backend, or a simple build requirement. Consider

[build-system]
requires = ["flit", "mystery_package"]
build-backend = "flit.buildapi"

How can the frontend tell where the backend will be imported from? There’s no rule that says module flit is bound to come from package flit. If the frontend can’t tell that a requirement is a supplying a backend, it has no way to decide whether to insist on a wheel. (We know from pip’s PEP 518 implementation, that insisting that all build requirements are available as wheels is a non-starter).

Note: sorry for being grumpy last night (especially to @pganssle). I’d gone past the point where I really should have just gone to bed and refrained from further comments until after I’d had a chance to sleep on things.

This problem is why my draft addition to the PEP 517 text focused on detecting self-reliant components rather than detecting backends in general: the frontend knows which component it is currently attempting to install, so it can tell when that same name also appears in build-system.requires. So you’d get scenarios like the following:

  • installation from wheels allowed, setuptools is found as a wheel, frontend never even tries to build it
  • source builds requested, setuptools sdist is found as a build dependency, setuptools self-reference in build-system.requires is detected, setuptools is found as a wheel instead, frontend prints a warning, and uses the wheel rather than building from source
  • source builds requested, setuptools sdist is found as a build dependency, setuptools self-reference is detected, setuptools is NOT found as a wheel, frontend errors out saying it needs a wheel file (or a non-isolated build) to break the declared build dependency cycle

Importantly, this model should work even if setuptools declares additional build requirements, as long as those dependencies are also marked as being self-reliant. Specifically, both setuptools and wheel would set build-system.requires = ["setuptools", "wheel"] to reflect the environment that pip is implicitly giving them today.

With the “self-references are always resolved as wheels” approach, a frontend can resolve this mutual interdependency, as it will be able to detect that both setuptools and wheel are marked as self-reliant, and then install them both from wheel files.

By contrast, to get the in-tree hooks approach to work as the foundation, either the two projects have to merge into a single package, or else one of them has to vendor the other in order to make itself truly self-hosting.

Sounds like maybe we need to understand better what the actual problem is :-).

I have questions!

  1. Is it important to be able to bootstrap Python environments without relying on wheels? There’s some argument that this is important, but we don’t really know for sure. How can we learn more?

    • One crude measure: what %age of setuptools downloads use sdists vs wheels? I checked PyPI downloads for January 2019, and it looks like, ~8% of setuptools downloads are sdists. To me this suggests that a fair number of people are using --no-binary :all:, though all the usual caveats apply.

    • Hey @barry, you know something about big companies who worry about provenance for their Python packages. Can you give us any insight?

  2. If wheel-free bootstrapping is important, then how complicated is it currently? Upthread, Nick linked to the elaborate bootstrap process that RH uses. It’s very different than my impression, which was that right now you can pretty much do unpack python.tar.gz, ./configure && make && make install, python -m pip install --no-binary :all: <whatever> and you’re off to the races. I think this might be because of RH having complex requirements around packaging the docs etc. that most companies don’t care about? But I’m not sure. (Maybe another good question for @barry to weigh in on?)

  3. If wheel-free bootstrapping is important, then how complicated will it be, if we don’t extend PEP 517?

    • I’m guessing it involves manually downloading setuptools and some other packages, running some commands by hand, maybe with hand-tweaked PYTHONPATH etc. to get wheels, and then putting those wheels in a location where pip can find them, and then making sure to always do --no-binary :all: --only-binary setuptools,wheel,...? Or… something like that? We should write it down in full detail instead of guessing.

    • How much more complicated will it be as new backends are developed?

  4. If wheel-free bootstrapping is important, then how complicated will it be, if we extend PEP 517 so that certain packages can force satisfying build-requirements out of wheels, regardless of the --no-binary :all: setting?

    • I guess it looks like the previous, except now you don’t have to manually specify --only-binary setuptools,wheel,... all the time? We should still write it down because I’m probably missing details.

    • How much more complicated will it be as new backends are developed? Will @barry have to hard-code anything new when a new backend is released? Will there be pressure to stick to setuptools as the One True Backend?

  5. If wheel-free bootstrapping is important, then how complicated will it be, if we extend PEP 517 so that certain packages can add an in-tree backend to sys.path?

    • As someone pointed out up-thread, this probably would also require that “root” build backend packages be self-contained, i.e. can build themselves in an empty environment with no other packages in it. Right now setuptools and flit both have external dependencies.

      • @takluyver: in some ways flit seems very attractive as a potential “root” build system… it’s much simpler than setuptools, and in principle setuptools and wheel could easily use flit as their build system :-). But it does have a number of external dependencies currently. Do you think it’s viable for flit to bootstrap itself without external dependencies?
  6. Are there other changes that would make this kind of use case easier? One thing that occurs to me is that in all of these proposals, if you use --no-binary :all:, that means that every time you run pip install <package that uses setuptools>, you’ll end up rebuilding setuptools from source, which seems a bit silly. Maybe we need a way to tell pip not to use binaries from pypi, but that using binaries that it finds in my-built-wheels/ is fine? What kind of tooling are people using to handle this right now?

1 Like

I think it’s realistic with a bit of work. The dependencies are:

  • requests: used to get the list of valid classifiers to check against, but we can skip that check or use urllib.
  • docutils: used to check long description if it’s written in rst. Skip the check.
  • intreehooks: only needed because of the limitation of PEP 517. python3 -m flit can bootstrap itself without this.
  • zipfile36: backport of the zipfile module from Python 3.6. If this is an issue before Python 3.5 EOL, it would be possible, if a bit inelegant, to have a fallback implementation for older Pythons.
  • pytoml: this is the trickiest one. pyproject.toml seems to be on its way to becoming the standard place to put metadata and tool configuration. Maybe a suitable TOML parser should be added to the standard library? If not, maybe the root build system has to vendor it.

In fact, I wonder if it makes sense anyway to separate the core of Flit that builds packages from the tooling that it incorporates for validation, uploading, etc. Then the core part could be used as a minimal backend to bootstrap other packages.

I’m not committing to this idea yet; there’s some extra complexity in the core (e.g. to handle reproducible builds), but I’ll think about it.

1 Like

Oh yeah, the part where flit requires Python 3 probably makes it a non-starter for packaging wheel and setuptools in the next year or two.

I doubt that will ever be a realistic option for any number of reasons, but I don’t think it’s productive to discuss what will be “the” root package in the stack anyway, which is completely against the spirit of PEP 517 / 518.

I took @njs to be asking about the feasibility of having at least one leaf to start with (“a” potential root build system). Nothing would prevent there from being more with that ability.

Possibly the more important question is whose responsibility is it to maintain this if they want it? If we put it in the spec, every frontend and backend needs to support it, which means we’re adding a burden to a bunch of OSS maintainers simply to satisfy an arbitrary requirement some corporations have. I think it’s fair to say that the spec allows root packages to build themselves from their own wheels and if some company wants to build wheels from source then they can work out a bootstrap mechanism.

I’m also curious how these companies build clang and gcc and other compilers that are built with an earlier version of themselves. For something like setuptools, the “binary artifact” is literally a zip file with the source code in it - the use case of “we want to see the source code of what we’re building” is hardly compelling for someone who would be willing to accept an actual compiled binary as a “trust root” for gcc or clang.