PEP 517 Backend bootstrapping

I think this is sufficient, but I’d like to head off at the pass the potential objection that there could be cycles of length > 1. For example, if setuptools were to depend on intreehooks, and intreehooks were to depend on setuptools, you’d still get an infinite regress.

Practically speaking, I think we can just not worry about it, because again very few build backends will actually need to build themselves and the ones that do are almost certainly going to be incredibly conservative with their declared dependencies because the nature of the tool makes adding dependencies a pretty painful experience. If front-ends want to do longer-range cycle detection that is an option they have, but I see no reason to mandate it as part of the spec.

Yeah, agreed - at the spec level, we’re aiming to make it possible for well-behaved backends to let frontends know where to stop, and “Include the project’s own name in build-system.requires” should suffice for that.

Frontends may still want to be more paranoid about dependency cycle detection, but it would be a general “add arbitrary length cycle detection to handle misconfigured backends that haven’t flagged themselves as part of a bootstrap cycle” feature that a robust implementation might want, rather than the bare minimum that the interoperability spec requires.

I don’t think I understand why we’d say some things can just only be installed from wheels when any of the options first listed allows breaking thst cycle anyways. While I have preferences for which of the options, going the route of having to provide wheels for these even if you’re building from source seems like it makes the UX worse not better.

I also don’t think it’s any business of the spec what frontends are willing to install or not. It should be perfectly fine to have a front end that only supports wheels or only supports sdists as desired.

My preferred option doesn’t provide a general option that I feel like will be an attractive nuisance. That being said, I think even the generalized python-path config is better than just sort of throwing our hands up and saying we just can’t build somethings from source.

1 Like

@dstufft I’m fine with the bootstrap-backend-location approach I proposed above (and @takluyver wrote a rough draft implementation of) , but @pganssle doesn’t seem willing to use it for setuptools without a rider of requiring weird dynamic sys.path manipulation in frontend implementations (which I’m opposed to while wearing my “import system co-maintainer” hat), and if setuptools won’t use it, then there’s no point in adding it to the spec.

So bootstrap cycle detection and breaking is a compromise that setuptools would be willing to rely on.

My objection is not to setuptools using it, it’s a perfectly fine mechanism for setuptools. If I were to refuse to use the mechanism when it exists, it would be out of spite, and even if that were in my nature (and hopefully it is not), I am just one of three setuptools maintainers, and I have not consulted the other two about this.

My only objection to any of the path-manipulation mechanisms (other than the fact that they are largely way over-designed) is that it makes it possible for the front-end to manipulate the semantics of the backend as a side-effect of the implementation.

As @dstufft says, I think this thing will be an attractive nuisance, and whether or not setuptools even uses the thing, we’ll probably bear the brunt of the support burden on it, since most of the edge cases will be in setup.py invocations.

It’s probably fine to leave it out of the spec entirely, but one thing to note is that this would mean that the backends can unilaterally choose to break one version of this or the other, all while remaining in compliance with the spec. If setuptools chooses to add setuptools to the build-system.requires, practically speaking it means that no one can write a front-end that only builds from source without hard-coding in setuptools as special. Might as well add it to the spec at that point.

+1. I had said that I would look at making a decision this weekend, but I don’t think at this point that we’re anywhere near consensus. There’s no rush on this, so I think a break to let things settle is the right thing to do. So, for clarity, I will now definitely not make a final decision this weekend.

Right now, setuptools can build its wheels for distribution using setup.py bdist_wheel and (thanks to Install with --no-binary ignores PEP 517 build system · Issue #6222 · pypa/pip · GitHub), pip users who use --no-binary :all: have no access to PEP 517 anyway. So let’s not give this issue a false sense of urgency.

One thing I will say is that, having thought it through, I don’t think the “backends must be supplied as wheels” option is enforceable (unfortunately). The problem is that there’s nothing in pyproject.toml to say whether an entry in build-system.requires is a backend, or a simple build requirement. Consider

[build-system]
requires = ["flit", "mystery_package"]
build-backend = "flit.buildapi"

How can the frontend tell where the backend will be imported from? There’s no rule that says module flit is bound to come from package flit. If the frontend can’t tell that a requirement is a supplying a backend, it has no way to decide whether to insist on a wheel. (We know from pip’s PEP 518 implementation, that insisting that all build requirements are available as wheels is a non-starter).

Note: sorry for being grumpy last night (especially to @pganssle). I’d gone past the point where I really should have just gone to bed and refrained from further comments until after I’d had a chance to sleep on things.

This problem is why my draft addition to the PEP 517 text focused on detecting self-reliant components rather than detecting backends in general: the frontend knows which component it is currently attempting to install, so it can tell when that same name also appears in build-system.requires. So you’d get scenarios like the following:

  • installation from wheels allowed, setuptools is found as a wheel, frontend never even tries to build it
  • source builds requested, setuptools sdist is found as a build dependency, setuptools self-reference in build-system.requires is detected, setuptools is found as a wheel instead, frontend prints a warning, and uses the wheel rather than building from source
  • source builds requested, setuptools sdist is found as a build dependency, setuptools self-reference is detected, setuptools is NOT found as a wheel, frontend errors out saying it needs a wheel file (or a non-isolated build) to break the declared build dependency cycle

Importantly, this model should work even if setuptools declares additional build requirements, as long as those dependencies are also marked as being self-reliant. Specifically, both setuptools and wheel would set build-system.requires = ["setuptools", "wheel"] to reflect the environment that pip is implicitly giving them today.

With the “self-references are always resolved as wheels” approach, a frontend can resolve this mutual interdependency, as it will be able to detect that both setuptools and wheel are marked as self-reliant, and then install them both from wheel files.

By contrast, to get the in-tree hooks approach to work as the foundation, either the two projects have to merge into a single package, or else one of them has to vendor the other in order to make itself truly self-hosting.

Sounds like maybe we need to understand better what the actual problem is :-).

I have questions!

  1. Is it important to be able to bootstrap Python environments without relying on wheels? There’s some argument that this is important, but we don’t really know for sure. How can we learn more?

    • One crude measure: what %age of setuptools downloads use sdists vs wheels? I checked PyPI downloads for January 2019, and it looks like, ~8% of setuptools downloads are sdists. To me this suggests that a fair number of people are using --no-binary :all:, though all the usual caveats apply.

    • Hey @barry, you know something about big companies who worry about provenance for their Python packages. Can you give us any insight?

  2. If wheel-free bootstrapping is important, then how complicated is it currently? Upthread, Nick linked to the elaborate bootstrap process that RH uses. It’s very different than my impression, which was that right now you can pretty much do unpack python.tar.gz, ./configure && make && make install, python -m pip install --no-binary :all: <whatever> and you’re off to the races. I think this might be because of RH having complex requirements around packaging the docs etc. that most companies don’t care about? But I’m not sure. (Maybe another good question for @barry to weigh in on?)

  3. If wheel-free bootstrapping is important, then how complicated will it be, if we don’t extend PEP 517?

    • I’m guessing it involves manually downloading setuptools and some other packages, running some commands by hand, maybe with hand-tweaked PYTHONPATH etc. to get wheels, and then putting those wheels in a location where pip can find them, and then making sure to always do --no-binary :all: --only-binary setuptools,wheel,...? Or… something like that? We should write it down in full detail instead of guessing.

    • How much more complicated will it be as new backends are developed?

  4. If wheel-free bootstrapping is important, then how complicated will it be, if we extend PEP 517 so that certain packages can force satisfying build-requirements out of wheels, regardless of the --no-binary :all: setting?

    • I guess it looks like the previous, except now you don’t have to manually specify --only-binary setuptools,wheel,... all the time? We should still write it down because I’m probably missing details.

    • How much more complicated will it be as new backends are developed? Will @barry have to hard-code anything new when a new backend is released? Will there be pressure to stick to setuptools as the One True Backend?

  5. If wheel-free bootstrapping is important, then how complicated will it be, if we extend PEP 517 so that certain packages can add an in-tree backend to sys.path?

    • As someone pointed out up-thread, this probably would also require that “root” build backend packages be self-contained, i.e. can build themselves in an empty environment with no other packages in it. Right now setuptools and flit both have external dependencies.

      • @takluyver: in some ways flit seems very attractive as a potential “root” build system… it’s much simpler than setuptools, and in principle setuptools and wheel could easily use flit as their build system :-). But it does have a number of external dependencies currently. Do you think it’s viable for flit to bootstrap itself without external dependencies?
  6. Are there other changes that would make this kind of use case easier? One thing that occurs to me is that in all of these proposals, if you use --no-binary :all:, that means that every time you run pip install <package that uses setuptools>, you’ll end up rebuilding setuptools from source, which seems a bit silly. Maybe we need a way to tell pip not to use binaries from pypi, but that using binaries that it finds in my-built-wheels/ is fine? What kind of tooling are people using to handle this right now?

1 Like

I think it’s realistic with a bit of work. The dependencies are:

  • requests: used to get the list of valid classifiers to check against, but we can skip that check or use urllib.
  • docutils: used to check long description if it’s written in rst. Skip the check.
  • intreehooks: only needed because of the limitation of PEP 517. python3 -m flit can bootstrap itself without this.
  • zipfile36: backport of the zipfile module from Python 3.6. If this is an issue before Python 3.5 EOL, it would be possible, if a bit inelegant, to have a fallback implementation for older Pythons.
  • pytoml: this is the trickiest one. pyproject.toml seems to be on its way to becoming the standard place to put metadata and tool configuration. Maybe a suitable TOML parser should be added to the standard library? If not, maybe the root build system has to vendor it.

In fact, I wonder if it makes sense anyway to separate the core of Flit that builds packages from the tooling that it incorporates for validation, uploading, etc. Then the core part could be used as a minimal backend to bootstrap other packages.

I’m not committing to this idea yet; there’s some extra complexity in the core (e.g. to handle reproducible builds), but I’ll think about it.

1 Like

Oh yeah, the part where flit requires Python 3 probably makes it a non-starter for packaging wheel and setuptools in the next year or two.

I doubt that will ever be a realistic option for any number of reasons, but I don’t think it’s productive to discuss what will be “the” root package in the stack anyway, which is completely against the spirit of PEP 517 / 518.

I took @njs to be asking about the feasibility of having at least one leaf to start with (“a” potential root build system). Nothing would prevent there from being more with that ability.

Possibly the more important question is whose responsibility is it to maintain this if they want it? If we put it in the spec, every frontend and backend needs to support it, which means we’re adding a burden to a bunch of OSS maintainers simply to satisfy an arbitrary requirement some corporations have. I think it’s fair to say that the spec allows root packages to build themselves from their own wheels and if some company wants to build wheels from source then they can work out a bootstrap mechanism.

I’m also curious how these companies build clang and gcc and other compilers that are built with an earlier version of themselves. For something like setuptools, the “binary artifact” is literally a zip file with the source code in it - the use case of “we want to see the source code of what we’re building” is hardly compelling for someone who would be willing to accept an actual compiled binary as a “trust root” for gcc or clang.

Note that this discussion came from this pip issue comment, “it will not be possible for people to use pip install --no-binary :all: for any project that depends on setuptools (or presumably any PEP 517 backend provider)”.

In all the discussion, one thing we haven’t really covered properly is the option to say “pip install --no-binary :all: is just broken in the context of PEP 517, there’s no issue here, backends must be made available as wheels and frontends that block loading backends from wheels may break when building from source”.

People wanting to build everything from source should have a bigger infrastructure. For a start, where did they get pip from (the pip that comes when you build Python from source is a wheel distributed with Python)?

Pip has some significant issues to address in that case, but it’s a valid response. And one I’m (as a pip developer) not completely opposed to.

That big list is actually for Fedora. (It’s more complicated than what Red Hat does, even though the “big corporation” issues are left out.) Also, it’s for bootstraping Python itself – a lot of the complexity comes from GDB integration, single-source py2/py3 libraries, docs, and tests for everything involved. (Projects tend to be very liberal with their test dependencies.) I don’t think it’s too relevant here, really.

Don’t design for big corporations; those do have the resources to do their own thing. And they won’t tell you their actual requirements, anyway. That’s why Red Hat sponsors Fedora – which also attempts to do things right as a distro :‍)

Now, in Fedora, we often don’t use sdists, going directly for Git releases. (Sdists usually don’t contain tests, and often even omit licenses). And we defintiely don’t download anything from PyPI – all dependencies are installed and all sources are copied to the build system and before the build. (We can’t realistically trust all of PyPI, npm, Crates, GitHub, Hackage, CPAN, …)

I don’t see a problem with Python-only wheels in principle – they’re just as auditable as sdists.

2 Likes

Agreed with the requirements here. For the few places where I deal with this kind of stuff, we prefer to have the entire source repo in our own infrastructure (for resiliency/patching) and build from that. Under PEP 517 we’d want everything to come from our internal index servers, and if that means manually bootstrapping the backend wheels (up to and including manually creating a wheel with 7zip) then we’ll just do it and move on with life.

Don’t design the automatic/magic commands for corporations, we have humans :slight_smile:

1 Like

I still don’t see a good reason to break the idea of installing the entire dependency set from source, the “corporation” thing is a bit misleading IMO. There are many OSS projects that have a need to build everything from source as well. setuptools, flit, etc are not nearly as special as gcc/clang/etc are to where they need to be special cased all over the place because it’s literally impossible to build without them.

I don’t think anyone’s suggesting “breaking” the idea as such. But PEP 517 was designed without considering self-hosting backends, and that wasn’t 100% an oversight (there was a definite expectation that backends would be provided as wheels).

PEP 517 as it stands doesn’t make the “build everything from source” idea impossible, the straightforward approach is to build your backends (setuptools, etc) first and host them somewhere as wheels, then build everything else from source using those wheels.

The problem is that pip’s users expect a simple --no-binary :all: to work for “build everything from source”, and it doesn’t under PEP 517. This discussion is essentially about how we modify the PEP to rescue --no-binary :all: . But the (legitimate) question has been raised, should we even try to rescue that, or should people wanting to “build everything from source” be taking a more nuanced view (and something as simple as --no-binary <except files in directory XXX> would be sufficient).

To clarify, if we assume that someone wants to do pip install --no-binary :all: numpy (note, I’m assuming that because an actual example of a use case is somewhat elusive - if anyone can point to a real world use case, that would help the discussion), then it’s entirely fair that they expect numpy and its dependencies to be built from source. But do they really expect the build tools (setuptools, et al) to be built from source? If so, why are they OK with using pip? Shouldn’t they build that from source too? I’m genuinely unclear - particularly given that a pure-python wheel essentially is source, there’s nothing there that isn’t a straight copy from the original source code archive.

I’m very concerned that we’re trying to design things in a vacuum here. We have assumptions of requirements, and partially fleshed out examples of use cases. @steve.dower and @encukou have both said that what we do here won’t matter to them, as they build from source in a different way anyway. So who, precisely, has a problem that needs fixing here? Can we get anyone who’s raised an issue on the pip tracker relating to this, to explain their use case for us? Or put a call out (maybe on distutils-sig) for a use case?

1 Like

If they’re asking for everything to be built from source… then why wouldn’t they expect the build tools to also be built from source? It seems odd and counter productive to me that an option that says “build everything from source” would mean it actually “build everything from source, unless it was self referential, then YOLO use a wheel”.

This isn’t even necessarily about pip here either. It should be entirely possible to have a tool other than pip consume a PEP 517 enabled package and be able to appropriately build it. If we’re just saying “well you have to figure out how to build a wheel for a self referential backend… somehow” then we’re basically opting all of those tools out of the generic-ness of PEP 517.

I don’t think this just affects people who want to build from source for a matter of policy. I think it also affects actually developing a backend. For instance, setuptools already has workarounds to enable it to bootstrap itself in development. Its tox.ini uses usedevelop = True for the tests which works around the fact that tox can’t build a wheel of setuptools (it wasn’t added for that reason, it predates 517, but it has the same effect) and their docs job explicitly calls their bootstrap.py script in order to bootstrap the installed setuptools to test their documentation.

As I see it, if we don’t make it possible to do it, we’re basically making the job of implementing a backend harder, because they’ll need to either rely on another backend to package themself (as flit effectively does via setuptools in intreehooks) or have some bootstrapping logic. Since backends are going to generally need to either use another backend or imlpement the bootstrapping anyways… we might as well expose it to make it easier for everyone involved.