PEP 517 Backend bootstrapping

I don’t know what level of paranoia needs to be assumed here, but it is possible to tell from the filename whether a wheel contains anything other than pure Python code. And conversely, sdists can contain DLLs if people want to be sufficiently tricky.

Can I suggest that debating whether wheels are or are not binary artifacts is not particularly relevant, and is probably counterproductive?

It’s not about paranoia or provably-DLL-free builds, it’s about setting up an automatable process to maximize the chances that if you have some issue with some piece of software you’re using, you can reliably find the source code for the version you’re using, fix it, and deploy it.

The “process” part is crucial. It’s very easy to write an automated system that check “sdist: ok, wheel: not ok”. And that gives you 99.9% of the possible value. If you start making exceptions, you start getting way into the weeds really quickly: which wheels exactly are “close enough” to source to count? and where do you maintain the list of exceptions, and how do you justify it to auditors?and do you have to train your users on how to edit wheels by hand (?!) and design a special process for injecting new wheels-without-sdists into the deploy pipeline that never gets exercised so you’ll only discover it’s broken when your business is on fire…?

AFAICT it’s a very practical set of concerns. Being able to deploy from source when you have to is basic good engineering.

1 Like

Ones with only .pyXY tags (no cpXY ones). Check the filename, and you’re done.

Obviously the tools you’re using need to do this, and there’s no --no-binary-except-pure-python-wheels flag in pip (yet!) but we’re not talking about that here, we’re talking about whether it’s OK to use wheels when locating backends (or build requirements, because “what’s a backend” is non-trivial to determine). And frontends could do that relatively easily.

Well… no :slightly_smiling_face:. There are wheels on PyPI right now that are tagged py2.py3 and contain compiled DLLs. This is correct, b/c the cpXY tags describe Python C API usage, and these wheels don’t use the Python C API; they access the DLLs through ctypes/cffi (example).

You could probably come up with a better heuristic based on the architecture flags, but it’s never going to be a fully reliable guide… heck, in one project I’m working on right now, the end goal is that large portions of the python code in popular py2.py3-none-any wheels will be generated when the wheel is built.

And anyway, even if you could reliably determine which wheels are secretly sdists in disguise, we still wouldn’t have tooling or workflows for hand-editing wheels.

OK, sorry. .pyXY-none-any. Or we could update the wheel spec to have a way to explicitly say “contains no built artifacts”.

This is why debating whether wheels are binaries or not is futile :wink:

Going back to an earlier point you made

… and that’s where (something like) --no-binary :all: makes sense.

Is anyone here (apart from @pganssle, who has made his position pretty clear) arguing for a solution that requires frontends to use pre-built wheels at some point in the PEP 517 build chain, in order to support self-hosting backends?

If not, then can we discard that option1, and return to looking at ways to allow self-hosting backends to express how they should be built? (And to keep the discussion concrete, can I suggest that we also accompany each proposal with a description of how setuptools would be specified in that proposal?)

1 I’m not ignoring @pganssle’s points, I just don’t think endlessly rehashing the discussion in an attempt to persuade one person is a good use of our time. Better to explore alternative options.

To put my money where my mouth is, with @dstufft’s proposal for a python-path key, I think setuptools would need the following:

pypyroject.toml:

[build-system]
requires = []
python-path = "."
build-backend = "selfhost"

It would also need to vendor wheel (see below).

The requires value is empty because it’s not possible to require wheel without introducing a circular dependency in a case where only source is available. That’s also why setuptools needs to vendor wheel. However, setuptools would not need to ship the vendored wheel code in its own wheel - having the setuptools wheel depend on wheel would be fine. The selfhost.py file would simply add the local copies of setuptools and wheel to sys.path and redirect to the normal setuptools backend.

It would also be possible for the build-backend to be setuptools.build_meta, if the local copies of both setuptools and wheel are visible from python-path. But personally, I think that having a selfhost backend wrapper is a bit clearer.

I think this pattern would be fairly common for self-hosting backends - all dependencies would need to be vendored (or would need to be built with a different backend, although this introduces risk should they change their build process, as noted above) and the self-hosting would be done by a thin wrapper or directly calling the local copy of the code, as above.

If anyone can describe a simpler way of setuptools expressing its build under this proposal, or write up how it would be done under one of the other proposals on the table, then please do.

If we’re going this way, I’d vote for tweaking it slightly to make python-path a list, like $PYTHONPATH is. If we think a common pattern is for a selfhost.py that does nothing except add extra entries to sys.path and then redirecting to the real backend, then we might as well let you write both paths directly in pyproject.toml:

[build-system]
requires = []
python-path = [".", "vendored"]
build-backend = "setuptools.build_meta"

If projects want to use the selfhost.py version because they think it’s clearer, then they’d still have that option.

I’m not categorically arguing for it, but I’d rather not discard it yet. I think this discussion is conflating two separate questions:

  1. Should PEP 517 include a specified way for a backend to self-host (similar to using intreehooks, but without an extra package)? In other words, if you want to make a wheel of Flit, can you use the same copy of the code both to run and to put in the package? I think we are mostly agreed that self-hosting somehow should be allowed.
  2. How do we handle build dependency cycles? E.g. In a hypothetical scenario where you need requests to build Flit, and you need Flit to build requests. I imagine it would be horrible for the frontend to automatically derive some joint build process to get both of them built from source, so the remaining options are either to error out or to use a pre-built wheel to break the cycle.

Maybe we can separate these questions out more explicitly and take them one at a time. I think we need a solution for 1, and I don’t think pre-built wheels are part of it, but they’re still an option I’d consider for 2.

1 Like

For dependency cycles, I propose that we take a non-prescriptive approach in the PEP, discussing the issue and offering options, but not trying to make anything mandatory. How about wording like the following?

Build requirements SHOULD NOT include cycles. However, cycles only cause an issue if prebuilt wheels are not available to break the cycle, so in many cases cyclic dependencies will not cause a problem - frontends SHOULD ensure that they break cycles using prebuilt wheels when this is possible. If unbreakable cycles do exist, front ends MAY take one or more of the following approaches:

  1. They may treat that situation as causing “undefined behaviour” and make no guarantees about what will happen.
  2. They may report the situation as an error, and abort the build.
  3. They may ignore any user-supplied options that prohibit the use of prebuilt wheels, and break the cycle using such wheels. If they do this, they SHOULD warn the user that a prebuilt wheel has been used.
  4. They may take some other action to handle the situation.

Front ends SHOULD describe in their documentation how they handle build requirement cycles.

Major backends like setuptools will then need to take care not to have cycles (which probably implies vendoring dependencies) but smaller, more specialised backends are free to have dependencies and just note that they may not work if a frontend encounters a cycle as a result.

Any “let’s not prescribe this” solution runs into the “pip as de-facto standard” question. If pip can’t handle dependency cycles, there’s not much point building a backend which has one. If pip can, then there’s a good chance some backends will have dependency cycles, and any alternative frontend will be under pressure to handle them. So there’ll be a standard, in PEP or in pip. :wink:

I think I’m inclined to explicitly forbid dependency cycles for now. There’s more than enough else to decide, specify and implement, without adding complexity to handle cyclic dependencies. We can revisit it later if the limitation proves to be a problem.

I tend to see this the other way round. If we don’t say front ends have to handle cycles, backends can’t assume they will work. But if pip does handle them (in some cases) and backends are willing to do so, they can delay the point where they have to deal with the problem by accepting a pip-specific solution. But when a new frontend comes along, they can dump the issue on the backend for not being PEP-conformant.

Personally, I suspect pip will end up going with making cycles an error (assuming we can detect them), maybe also ignoring --no-binary with a warning message in cases where that helps (which won’t be all cases).

That works, too, if everyone is OK with that. (I personally don’t mind, as I never use --no-binary of any similar approaches that trigger this issue).

But to be clear, by “forbid” do you just mean “dependency cycles MUST NOT exist”, and frontends are allowed to assume that without checking, or do you mean “frontends MUST reject cycles with an error”?

I’d say that build dependency cycles must not exist, and encourage but not require frontends to check for and reject such cycles.

If they’re implicitly allowed because pip doesn’t check and few people are restricting it to build everything from source, I think there’s a real chance we’ll end up with packages that can’t easily be installed by people who do care about building from source.

1 Like

See @dstufft’s post earlier for how pip would probably detect cycles, but I doubt our approach would detect a cycle if it’s broken by using a wheel. So I’d describe what pip’s likely to implement as “failing with an informative error” rather than “checking for cycles”. Given that build requirements can only be determined after a sdist is downloaded, I doubt any frontend will pre-check for cycles.

Importantly we can’t detect a cycle if we’re using wheels. Since wheels don’t have build dependencies and it’s only build dependencies that matter (including cycles that involve runtime dependencies of build dependencies). Cycles that only involve wheels runtime dependencies and no build dependencies are fine.

Good point, I hadn’t thought through the mechanics of it properly. In that case, as @pf_moore says, I think it makes sense to fail with an informative error if a cycle does crop up when pip is trying to build everything from sdists.

Also, FWIW:

[build-system]
requires = []
python-path = [".", "vendored"]
build-backend = "setuptools.build_meta"

Is how I would envision it working for setuptools if setuptools choose to just bundle wheel as a build dependency rather than seeing about the possibility of merging the two projects or otherwise worked out how to build at least itself with zero dependencies.

This is pretty close to how flit already bootstraps itself so it shouldn’t be an issue there, (although it currently falls under the implicit “danger” of a self hosting build backend having any build dependencies where if say requests switched to flit).

I think those are the only two build backends in major use today.

(For the benefit of others on this thread, there has been some discussion over on https://github.com/pypa/setuptools/pull/1674)

@pganssle based on what we’re discussing over on that setuptools PR, would you be OK with something like the configuration @dstufft shows above (with a vendored copy of wheel in setuptools’ sdist)?

My feeling is:

  • At the policy/spec level, we should encourage packages to support being built purely from sdists. I don’t have a strong feeling on SHOULD vs MUST, and I’m not sure it matters that much.

  • By default, pip should error out if it runs into a loop/forkbomb, just because that’s kind of necessary to avoid DoSing people, and it sounds like the folks using --no-binary :all: don’t really want anything cleverer than that. Pip doesn’t need to go to extra trouble to check for loops, just detect any that it actually hits.

  • If a user does hit a cycle, and they decide they want to resolve it by just using a wheel, then it’d be nice if pip had a convenient UI for that. Maybe it already does – can you do --no-binary :all: --allow-binary flit?

  • When pip does hit a cycle, it’d be nice if the error message told you which packages were involved, and notes that using --allow-binary <those packages> might help. Not a hard requirement, but as a quality-of-life feature.

MUST means it’s a bug if a project doesn’t support it. SHOULD means it’s recommend but not a bug if they don’t.

It’s spelled only binary instead of allow, but yes.

Unless the project maintainers disagree with our spec, in which case they’ll do whatever they want :-).

I guess I’m +0 on MUST, if only because any popular build backend is going to be forced to cope with this sooner or later by its users, so we might as well warn them up front.