PEP 517 Backend bootstrapping

My question still is why go for all this trouble? Can’t we just state that all build backends (or even simpler all requires elements) are to be fulfilled via wheels, and it’s up to the build backend to publish wheels (as build frontends ignore all source distributions). I would argue this could guarantee we’re not stuck in an endless PEP-517 loop and build would be much faster. Am I missing here something?

In some contexts, such as downstream packaging, people want to build everything ‘from source’, avoiding pre-built wheels.

This is partly just a perception thing; for most straightforward Python packages, the wheel is just as much source as the sdist. And there’s nothing to stop you having a build step before creating the sdist (e.g. Jupyter does this to bundle CSS and Javascript). But the assumption for things like conda and Linux distros is that you build from sdists.

(And obviously there are plenty of packages where the wheel really is a built artefact, including anything that compiles an extension module)

1 Like

I know that the PEP 517 backend isn’t putting the source tree on sys.path, but I thought that was a bug, not an intentional change.

Doing this intentionally seems like a poorly motivated and user-hostile change. Users are already constantly pissed off at us for breaking stuff for (what they perceive to be) no reason… no need to give them more evidence.

I don’t think we can say that for requires in general, as we got a lot of pushback on that in pip from people whose setup.py depended on things that weren’t available as wheels. But I think it’s a valid question to ask for backends.

There are two key points, as I understand it:

  1. People who want to “build everything from source” (often via --no-binary :all: in pip). I’m not sure I have much sympathy with that requirement, but others may differ. And nobody’s ever really articulated why they need to do that.
  2. Building the backends themselves. I’m not entirely sure that backends need to generate their wheels using pip and the PEP 517 machinery, but I’m not a backend writer so I’m not really qualified to comment. This does seem to be something that is only affecting setuptools (at the moment) though - flit seems to have successfully worked around it.

The package itself gets build from the source, that I understand. They want to repackage, so you want to build using your own platforms dependencies - usually is a good test that your dependent packages are correctly built.

I don’t think at all though that the build tools for this repackaging need to be also provided as a source. Until I hear multiple people coming up with some solid use cases I’m tented to say pip should use wheels only as build requires dependencies.

All Linux distributions repackage python packages, similarly as they do for C applications. The build tools though are provided as a binary, e.g. when they compile git they don’t provide clang as source and start building clang itself first (with clang, for what now you’re back to this same infinite loop). Similarly providing build backend as wheels for such builds I find perfectly acceptable, and if anything follows a well established practice.

If they really want this then why not let them build their own custom frontend that solves this problem exactly as their use case demands. We can in the meantime watch, and learn important lessons to get the pip implementation right. I’m kinda reluctant to commit to a solution for a niche use case that can be heavily abused by people in ways we don’t want to.

There’s a lot to be said for this argument. To put it another way, I don’t really see any urgency that we “fix” this problem right now. At the moment, it’s only affecting setuptools, and they can manage at least for now just using setup.py bdist_wheel directly.

The only problem for pip would be that we can’t tell whether a build-requirement is supplying the backend, or just a dependency, so we can’t enforce “must use wheels” (or disable --no-binary :all:) for backends only. So I think we’d have some work, but that’s a pip-only issue.

Does anyone have any good reasons why this needs addressing at the PEP level right now?

It will only look that way if you choose to support projects doing that in setuptools. The wording we’d need at the spec level to explicitly let setuptools (or any other backend) disallow using the option that way would be something like “Backend implementations MAY disallow the use of bootstrap-build-backend with any package other than themselves (e.g. by raising an exception at sdist creation time if they find the setting in pyproject.toml and the project metadata indicates this is a different project)”.

Doing it implicitly wasn’t intentional - it was the result of a miscommunication between pip devs and setuptools devs regarding the goals of setuptools.build_meta (specifically, the setuptools devs designed setuptools.build_meta to align with PEP 517 – A build-system independent format for source trees | peps.python.org, so it doesn’t put the source tree path on sys.path implicitly. That’s fine for an explicitly opt-in backend, but it means it isn’t appropriate for use as an implicit fallback backend the way pip 19 is currently doing).

Hence the proposed addition of setuptools.build_meta_legacy in https://github.com/pypa/setuptools/pull/1652/ and the draft PR to use that in pip at Fix #6163: Default to setuptools.build_meta:__legacy__ by ncoghlan · Pull Request #6212 · pypa/pip · GitHub

To be clear, flit worked around it by building itself with setuptools (or rather, indirectly by building intreehooks with setuptools). For obvious reasons this is not an option for setuptools.

The point of the setuptools.build_meta_legacy change is that it will make the use of setuptools.build_meta an opt-in change. I believe @uranusjr has other reasons why having the setup.py’s location assumed to be in sys.path can cause problems.

The reason I want to make the change in the setuptools.build_meta backend is that when setuptools.build_meta_legacy becomes the default PEP 517 backend, the semantics of setuptools.build_meta becomes opt-in, which means that it’s an incredibly rare opportunity for setuptools to make some adjustments to the semantics.

The reason for adjusting the default semantics (again, you can just get the old semantics by manipulating sys.path in your script) is that having the source tree in your path can break the isolation of the build environment in some respects (same reason PEP 517 explicitly says not to put it on the path for frontends), and is generally a source of bugs when someone is accidentally relying on features of the build environment that don’t exist in the deployed environment (mostly around testing, TBF).

This is needlessly complicated, though. It means setuptools has to parse the pyproject.toml file (which I’m not even sure is guaranteed to exist at build time by PEP 517) to detect whether this option is passed and what backend is being used, to no major benefit. The original PEP 517 chose to make it so that the frontends would not put the source root into sys.path, and the Option 3 proposal is suggesting adding the capability for users to opt-in to changing those semantics to solve a problem that can easily be solved in a simpler way.

I think the gap between Option 1 and Option 3 is less than you may think, though. If we imagine a version of pyproject.toml that looks like this (syntax doesn’t matter):

[build-backend]
requires=["setuptools", "wheel"]
build-backend="{BOOTSTRAP[.]: setuptools.build_meta}"

Then we can implement it the same way that @njs suggested, with a final additional check that when we import setuptools.build_meta, the package we imported was found in the tree under the location specified by the “argument” to BOOTSTRAP. The spec is essentially as easy to explain as the original explanation: “For packages that build themselves using a backend that exists in the source tree, you may specify the module with the syntax {BOOTSTRAP[pth]: backend} where pth is a path relative to the source root and backend is the name of a module to import.”

Again, the syntax doesn’t really matter, we could just as easily do it with:

[build-backend]
requires=["setuptools", "wheel"]
build-backend="setuptools.build_meta"
bootstrap-backend-location="."

It would still be a valid Option 1 if the requirement is that those two things specify the location of the backend precisely.

How about using bootstrap-backend-location=".", and specifying that frontends may check that the backend is loaded from the specified location. Frontends that want to keep things simple could just add the location to sys.path, but if pip does implement the check, then it won’t be a useful way to get the CWD on sys.path if you’re not implementing a backend.

That’s probably fine, but I don’t think it’s necessary. It introduces unnecessary uncertainty for such a super narrowly-scoped feature and doesn’t dramatically increase the complexity of the implementation.

I suspect there’s a fair bit of complexity to thoroughly checking ‘is this thing imported from here?’, once you get into weird cases with symlinks and unicode normalisation. If it’s optional for frontends, then a simple frontend can just sys.path.insert() it and assume that it’s loading from the correct place.

Even complex frontends could have an option to disable the check in case it goes wrong in corner cases.

Fair enough, though there are many versions of Option 1 where this is absolutely not a problem. For example, if we go with @dstufft’s preferred option of a “magic module” __build_backend__.py, then there’s no need to worry about the complexity of the implementation.

I would be in favor of restricting the way backends must implement the bootstrap mechanism as tightly as necessary to make it easy for frontends to implement this. Every front-end needs to implement this as part of the spec. Very few backends need to implement a bootstrap backend - could be that only setuptools implements it, intreehooks uses setuptools and everyone else uses intreehooks. For something this obscure, we can forbid edge cases rather than handling them.

It also introduces a weird separation of responsibility - the front end has to enforce this, but it’s the back end (and specifically setuptools) that wants to prevent that directory being on sys.path when setup.py runs.

I’m not sure I’d want to add (or maintain) that check in pip.

My concern with this is that it potentially makes life more difficult than it needs to be for backend authors to correctly bootstrap themselves (since absolute imports may not work properly depending on exactly how the frontend handles loading the in-tree backend without putting the location of the backend on sys.path), based on a concern that the feature may be misused for purposes other than those that are intended.

We should keep in mind that this is an ecosystem where the core language definition lets people write code like this:

>>> import builtins
>>> builtins.__dict__.clear()
>>> str
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'str' is not defined

We rely on “Yes, we know you can do that, but please don’t - future you will thank you” a lot, and it’s one of the reasons folks enjoy using our tools.

That said, setuptools can still prohibit use of the hook by anyone other than itself without needing to read pyproject.toml:

  1. setuptools.__file__ indicates where setuptools was loaded from
  2. setuptools.build_meta runs code in-process before setup.py runs
  3. whether or not backend-bootstrap-location was specified can be inferred based on whether or not sys.path[0] is inside the current directory

So if setuptools.__file__ is outside the current directory, while sys.path[0] is inside it, then that indicates the scenario that @pganssle wants to prevent, and setuptools.build_meta can throw an exception advising users to manipulate sys.path in setup.py instead of setting backend-bootstrap-location.

And now I realise I misread what @takluyver was suggesting, and we’re actually pretty much suggesting the same thing, just enforced on either the backend or the frontend side. At the spec level, I still don’t think we’d need to mandate anything, just explicitly give both backends and frontends permission to implement such a check:

Both build backends and build frontends MAY report an error if the build backend is not actually loaded from the specified bootstrap location.

Simpler idea to implement: what if the pep517 module (which pip uses) adds the specified path to sys.path to import the backend, but removes it again before calling the hooks? That way the setup.py would be run when sys.path is back to normal.

It would place some minor restrictions on what the self-hosting backend could do, because running the hook couldn’t import something else from the CWD. But relative imports within the package would still work, and of course the backend can manipulate sys.path itself if it needs to.

This whole thing is a totally random digression. I was hoping we could come to a general agreement about narrowly scoping and then discuss the implementation details, but I am now firmly in the camp of the original option 2 - zero options for the bootstrapper. pyproject.toml looks like this (requirements is not required to be empty):

[build-backend]
requires=[]
build-backend="<bootstrap>"

At which point the front-end will select __build_backend__.py (located in the source root) as the build backend. It MAY add the source root to sys.path but is not required to do so.

I see no reason to complicate things any more than this. It’s incredibly simple to implement this and to explain how it works.

I don’t understand why you’re concerned about folks potentially using bootstrap-backend-location to manipulate sys.path, but aren’t concerned about them doing the same thing by writing a __build_backend__.py that looks like:

from setuptools.build_meta import *

Again, doing it this way seems to make things more difficult for self-bootstrapping backend implementors, and make bootstrapping diverge further from regular source builds, for no readily apparent reason.

For one thing, in the spec I mentioned above, this is not even guaranteed to work, but also because I am not trying to actively prevent people from being able to add '' to their sys.path, I don’t care about that. I just want it to happen in the backend, not in the frontend. Among the problems with build-backend-location='.' just adding the source root to the PATH is that it’s way too broadly scoped.

We should 100% be optimizing for people implementing front-ends, not self-bootstrapping backends. Pretty much the only person who ever has to implement this is setuptools. Everyone else is opting in. On the other hand, every single front-end must implement this functionality.

There is zero reason to make this any more general than it absolutely needs to be. The proposal of a fixed __build_backend__.py can be simply implemented like this:

if build_backend_name == "<bootstrap>":
    import sys
    sys.path.insert(0, '')
    build_backend = __import__('__build_backend__.py')
    sys.path[:] = sys.path[1:]    # Not required by the spec
else:
    build_backend = __import__(build_backend_name)

No fuss, no muss.

Frankly, if the implementation gets too complicated, the simplest thing is to change PEP 517 to say that backends must be built from a wheel and be done with it. --no-binary :all: would just have to be changed so that it doesn’t apply to the PEP 517 build requirements.