PEP 517 Backend bootstrapping

pganssle · January 30, 2019, 5:55pm

To be clear, and I agree with Donald and Paul here, the point of it not having to be easy is just that it has to be possible. This is not a normal thing that you’ll want to do, and as long as we’re not saying “you have to use setuptools”, I think we’re honoring the spirit of not picking a single winner for the backend.

I think the reason it doesn’t have to be particularly easy is that this is something that only PEP 517 backends will or ever should use, and only PEP 517 backends that want to build themselves. Even if we just add CWD to the PYTHONPATH as part of the frontend, writing a build bootstrap is fiddly and annoying. From a practical perspective, PEP 517 build backends will probably use either setuptools or flit to build themselves, or even intreehooks if that’s the semantics they want.

cjerdonek · January 30, 2019, 8:21pm

Can something be done to prevent or discourage this aside from making it more difficult to do (even if just a little more difficult)? Like, is it possible to detect if it’s being done for something other than the intended case or ensure that it’s only being done for intended use cases?

pganssle · January 30, 2019, 9:12pm

At this point I think we have a general outline of what the options are, I think they boil down to something close to the original list I had:

A way to specify a single module and/or file that exists in the source tree should be used as the backend.
A way to specify that that the root of the source tree should be added to the PYTHONPATH when searching for the backend.
A way to add arbitrary paths to the front end’s PYTHONPATH.

I think most people didn’t like #3, so I think we probably want to go with either 1 or 2, then we can start working out the details of what that will look like.

We never really finished our PyPA governance discussions some time ago, so I’m not sure exactly how we plan to decide on this? We can fall back to either Paul’s authority as BDFL-delegate for Python or Dustin’s authority as BDFRN, but maybe we can start with just trying for a rough consensus?

I’ll start by saying I’m in favor of approach 1. I think it’s the most tightly scoped and most of the objections to it have been to the specific syntax and the implementation details, which we can work out if it achieves a consensus. Anyone prefer 2 to 1?

Edit: Just to note, in this new list, I’ve collapsed my original list’s Options 1 and 2 into two flavors of the current list’s option 1, and the current list’s option 2 is the original Option 3, and the current list’s option 3 is @njs’s Option 4. Hopefully that clarifies rather than horribly muddles the terminology.

dstufft · January 30, 2019, 10:13pm

I’d say that it falls under Paul’s purview.

pf_moore · January 30, 2019, 10:34pm

I agree, these seem like a good summary of the 3 options.

I’m also in favour of #1. In addition to the tightest scoping, it’s (IMO) the easiest to describe in terms of being a specialised mechanism focused on building self-hosting backends.

I’m happy to take this on. I propose that we allow the discussion to continue for a few days, and I’ll sum up and make a decision (assuming we have a reasonable consensus) at the weekend. I don’t think there’s enough controversy or complexity to need longer on this, and it’s worth getting it sorted so that setuptools can progress.

(Once we have the basic agreement, we can bikeshed over exact pyproject.toml key names to our hearts’ content - but I’m inclined to leave the final decision on that to whoever steps up to write the update to PEP 517 (which I’ll do if no-one else wants to).

Any objections to that plan?

pf_moore · January 30, 2019, 10:47pm

I’m not interested myself in preventing or discouraging use of this feature at all. And I certainly don’t want to make it more difficult to use - writing backends (and particularly self hosting ones) is tricky enough already, and I don’t want it to be harder.

What I want to do is to design the feature in such a way that it’s completely obvious to people that are using it that it’s for *writing self-hosting backends". Using a suitably named file or pyproject.toml key is sufficient for that IMO.

If someone is writing a “normal” project, not a backend, then I’d hope that configuring

self-bootstrapping-backend = True

(or whatever) would be a sufficient warning sign that they are doing something they shouldn’t be…

cjerdonek · January 30, 2019, 11:28pm

Just to clarify, what I meant was – can something other than making it more difficult or obscure be done to discourage its use, but only for the purposes it wasn’t meant for. That way it can be made as easy as possible to use for its intended purpose, without also making it easy to misuse. (It seems weird and counter to things to deliberately make something harder or more obscure to use.) The kind of thing I had in mind would be a warning message if used for a wrong use case. But this assumes we have a way to distinguish the cases from where it’s being used for its intended purpose from when it’s not.

njs · January 30, 2019, 11:56pm

I think you’re both seriously overcomplicating this :-).

My proposal would be to do exactly what this code does:

def norm_and_check(source_tree, requested):
    normed_source_tree = os.path.abspath(source_tree)
    if os.path.isabs(requested):
        raise ValueError("paths must be relative")
    normed_requested = os.path.abspath(os.path.join(normed_source_tree, requested))
    if os.path.commonpath([normed_source_tree, normed_requested]) != normed_source_tree:
        raise ValueError("paths must be inside source tree")
    return normed_requested

sys.path[:0] = [norm_and_check(SOURCE_TREE, requested) for requested in REQUESTED]

So: no absolute paths (not really necessary given the next restriction, but I threw it in as a bonus), paths have to end up inside the source tree even after accounting for any foo/.././bar-type silliness, and we accept exactly the same things that sys.path accepts, so any questions about zipfiles or whatever should be directed at them. We don’t have to worry about subtle security issues, because the very next step after this is “run arbitrary user-specified code”. If there are any bizarro edge cases that we didn’t think about that make a path non-portable, then that’s fine – most projects are just going to use "." or "somepath/", and in the unlikely event that some project does find some weird corner case that makes their packaging break on some system, then it’s like any other bug in their packaging: one of their users will complain and they’ll fix it.

I agree it’s not utterly trivial, but I’m pretty sure this is the only proposal here that can be implemented in ~10 lines of code, documented in ~2 sentences, and where if you encounter it in a pyproject.toml the meaning will be instantly obvious without even looking at the docs. KISS.

ncoghlan · January 31, 2019, 8:15am

Yeah, I forgot about the option of delegating the details of the technical specification to “What os.path does” rather than having to spell them out explicitly - that does indeed make things a lot simpler when it comes to allowing paths inside the repo as config options, since the hard part has already been done for us.

I’m still wary of encouraging folks to add multiple arbitrary subdirectories to the execution environment of their build backend though - that’s really the sort of thing the build backend should be handling, and that degree of flexibility isn’t needed just to solve the bootstrapping problem.

That said, I also think there’s virtue in the conceptual simplicity of adding an extra directory to sys.path. Yes, there are alternative ways to load a module that don’t require that, but they all come with caveats and quirks as to what kinds of imports will actually work from the loaded module

As such, my next iteration on my previous proposal would be to blend it with the “arbitrary additions to sys.path” idea, and allow a single additional entry, using a name that makes the intended use of the feature clear:

[build-system]
requires = []
build-backend = "setuptools.build_meta"
backend-bootstrap-location = "."

(Note: the location part of the name was inpired by importlib — The implementation of import — Python 3.12.1 documentation, since we adopted the “filesystem location” phrasing to avoid the ambiguity between “filesystem path” and “import path”)

Such a setting would be enough to allow setuptools and other projects to bootstrap themselves, and would mean that flit wouldn’t need the intreehooks helper any more. It’s also compatible with backends that use a src directory layout:

[build-system]
requires = []
build-backend = "some_backend.build_hooks"
backend-bootstrap-location = "src"

If a particular backend wanted to disallow the use of backend-bootstrap-location for projects other than itself (by finding that entry on sys.path and removing it before running any project supplied code), then that would be entirely OK. It would also be fine for backends to warn about the erroneous usage when building from source (that would be a decision for the backend authors to make).

pf_moore · January 31, 2019, 10:40am

You may well be right - the rest of your post makes sense to me. I still worry that getting the wording “PEP-level nitpicking resistant” may be a bit more tricky than you imply, but that’s details.

It would also be possible for the frontend to warn (or even error) if backend-bootstrap-location was specified, but build-backend was loaded from anywhere other than that location. That would block use as any sort of general “add the project directory to sys.path for setup.py” feature.

But we’re now back round to the whole “option 3 is more general” problem again. If we choose option 3, it also solves the issue that the setuptools “legacy backend” problem is intended to solve - we don’t need that backend, nor do we need changes to pip, we just tell users who need the project build directory in sys.path for their setup.py to specify that in pyproject.toml. Restricting the facility on a point of principle and as a result having to add extra backends and workarounds seems silly.

But setuptools explicitly want to discourage that practice. So allowing it via pyproject.toml undermines their position. Personally, I’m inclined to agree with the setuptools guys, but I think that if we decide that option 3 is reasonable, we should embrace it, and provide it in its full generality. But that would (in my view) require @pganssle, and the other setuptools devs, to explicitly accept it. So it’s not really me (or even @ncoghlan) you need to persuade here, it’s them.

ncoghlan · January 31, 2019, 11:15am

Anything that requires changes to pyproject.toml doesn’t solve the problem that setuptools.build_meta_legacy solves, which is to provide pip (and other PEP 517 frontends) with a default backend that mimics traditional setup.py execution as closely as possible in order to allow PEP 517 builds for existing packages to work. pip’s marker for using that mode is going to be “pyproject.toml exists, but build-system.build-backend is not set”, but another PEP 517 frontend might be even more aggressive and use it for all setup.py projects, not just those that also have a pyproject.toml file.

So nothing needs to change in PEP 517 to solve the “I’m a project that expects the setup.py directory to be on sys.path” problem, since backends can already solve that problem on their own (and then frontends can choose a suitably conservative backend as their default).

The only problem that backends can’t solve without assistance from frontends (and hence changes to PEP 517) is being self-hosting. And since frontends have to trust the metadata that projects provide to say “I’m a self-hosting build backend!”, if a project wants to claim to such a thing when they’re actually not, I’d put that in the same category as monkeypatching 3rd party libraries: it will work, and sometimes you’ll have a sufficiently compelling reason to actually go ahead and do it, but it really isn’t considered a desirable practice (vendoring dependencies would be another practice that falls into a similar category - it’s usually not a good idea, but in some situations it’s better than the other available options).

pf_moore · January 31, 2019, 11:57am

True. But the issue that triggered the question “should the setuptools backend be semantically identical to running setup.py” was about having the project build directory on sys.path. As far as I know, there’s no other motivating use case (yet…) for needing a legacy backend in setuptools. And if we’d had option 3 in PEP 517, we’d have likely just said to use that, and not discussed the semantics of the setuptools backend at all.

It’s a bit of a digression, though. But I still do want @pganssle to approve if we’re to accept option 3. My personal preference remains option 1, with a not-limited-to-self-hosting-backends option 3 as a possible alternative (but one that would likely need a longer incubation period before approval, to understand the non-backend implications).

ncoghlan · January 31, 2019, 2:19pm

The key point that I left out of the message with the backend-bootstrap-location proposal in it, is that I started out intending to have that message be in support of Option 1 and suggest a possible syntax for that approach, but actually trying to spec out the details convinced me that @njs has a valid point regarding the relative ease of explaining Option 3.

The starting point for that exploration was the location of setuptools.build_meta in the setuptools repo: setuptools/setuptools/build_meta.py at main · pypa/setuptools · GitHub

With my proposed Option 3 spelling above, it’s fairly easy to follow how the bootstrapping is supposed to work: the repo root directory (".") gets added to the front of sys.path, then the setuptools.build_meta backend gets imported the same way it would for any other package that declared it as its build backend. This is also how @takluyver’s intreehooks helper backend for bootstrapping flit already works, so we have prior art for it being a viable bootstrapping option.

We also know from the open pip 19.0.x PRs that it isn’t that hard to add a sys.path[0] injection feature to the pep517 support library (I wouldn’t do it the way I did in those PRs as a public API, but that’s just a matter of replacing an environment variable with a command line argument and a magic prefix with a normal function parameter)

But what happens if we try to specify the location of the build backend directly? How would it be spelled in pyproject.toml? What are frontends actually supposed to do with that information? How would folks introspecting project metadata determine which build backend is actually being bootstrapped? (that one’s not a functional requirement, it would just be nice to have)

Suppose we denoted a bootstrapped backend this way:

[build-backend]
requires=[]
bootstrap-build-backend="./setuptools/build_meta.py"

Should a frontend run that file as a script? Should it run it with runpy.run_path()? Should it import it as a module, using the appropriate version dependent incantation to do so?

What are the implications for how build_meta.py is written? Will explicit relative imports work? Will setuptools.* absolute imports work? Will __name__ be "__main__", or "build_meta" or "setuptools.build_meta"?

There are (somewhat) reasonable answers available to all of those questions, but I didn’t think any of them were as elegant as the idea of just switching to a slightly more constrained version of Option 3 such that you specified where in the tree the backend implementation could be found, specified build-backend as normal, and then the backend itself executed in just the same way as it would for any other project.

The other thing I realised is that the “What if non-backends use the backend bootstrapping option to add an extra in-tree path to their hook execution environment?” concern with Option 3 applies just as much to Option 1, as either way you can make a custom in-tree backend that includes from my_real_build_backend import * as one of its lines and runs arbitrary code (including sys.path adjustments) before and after that, and that’s something that intreehooks already allows with PEP 517 today.

pganssle · January 31, 2019, 2:47pm

I think the main difference between @ncoghlan’s version of Option 3 and Option 1 is that Option 1 requires the bootstrap backend to be located in the tree. The thing I’m worried about is someone creating a pyproject.toml that looks like this:

[build-backend]
requires=["setuptools", "wheel"]
build-backend="setuptools.build_meta"
bootstrap-build-backend="."

For anyone other than setuptools, this would just look like an option to modify the PWD semantics of setuptools.build_meta, and I suspect people will figure out that it works this way and start abusing it, knowingly or unknowingly.

I actually don’t care if people add their source tree to their sys.path during the build process, I just want them to have to be explicit about it. The main problem I have with the option 3 proposal is that this is an arcane option that almost no one needs to know about. If we make it something that accidentally “fixes” the problem of importing from the source tree, the vast majority of people who use it will be using it the wrong way (since almost no one needs to actually use this). The consequences of this are not so dire, but I think it’s entirely avoidable.

I am a lot less worried about people writing custom build backends that are thin wrappers around setuptools.build_meta because frankly that’s a lot more work than just adding sys.path.insert(0, '') to your setup.py file, which is the preferred way to allow in-tree imports in setup.py.

leorochael · January 31, 2019, 8:05pm

This!

I just came here to point out that option 2 (Adding either CWD or project root to PYTHONPATH) would force self-bootstraping backends to always put their files in the project root.

That would be unfortunate, IMO.

cjerdonek · January 31, 2019, 9:04pm

What if we only permit paths to be added when the requires list is empty? That way it could only be used for the leaves of the dependency graph, which is exactly the bootstrapping case, IIUC. Or are there other use cases where it would be needed?

bernatgabor · January 31, 2019, 9:49pm

Is this a valid use case? What if we say that the backend must resolve to a wheel (or at least it’s transient dependencies must all resolve to wheel)? When it’s not appropriate for the build backend to use a wheel? Just thinking if this edge case is a problem worth solving given that we may inadvertently open Pandora’s box?

Otherwise I’m inclined for @pganssle option 1.

njs · February 1, 2019, 4:42am

Yeah, OK, on consideration I think I like this version better too :-).

For spec language, how about: “This option is a string, which must specify a relative path inside the source tree. The build frontend will interpret this path relative to the source tree root, convert into an absolute path, and prepend it to sys.path before it attempts to import the build backend.”

For this particular feature, I don’t think we need to like, include BNF to specify the exact path syntax or anything. If there is some theoretical edge case where two build frontends have slightly different definitions of “absolute path”, then… meh, who’s going to notice or care?

I’m having a hard time understanding what you’re worried about here. Can you elaborate? Like, I get the general principle that people do weird stuff and that makes setuptools hard to maintain. But in this specific case, I’m having trouble seeing how it would cause any issues. Yeah, people sometimes stick stuff into sys.path when running their setup.py scripts. Yeah, they could use this as a weirder way to accomplish that same thing. It would be weird. But it seems unlikely to be a popular option, the semantics are simple and unambiguous, setuptools doesn’t particularly have to care, and it honestly seems barely noticeable compared to all the other weird stuff people do in setup.py scripts. What’s the harm?

Actually I guess I can think of one case why a project might want to use this, while using setuptools, without actually being setuptools: if they have a vendored version of setuptools in their source tree that they want to use. That’d be pretty unusual, but it seems like a reasonable thing to support.

I’m really confused about your goal here, and the word “fixes”. Obviously, setup.py scripts will always run with the source tree root on sys.path, right? Changing a basic thing like that, in software like setuptools that’s in the so-legacy-that-every-bug-is-a-feature-now phase of its lifecycle, is obviously a non-starter… right? So there’s nothing to fix?

pf_moore · February 1, 2019, 9:10am

Um… See pip 19.0 fails to install packages that import to-be-installed package from CWD · Issue #6163 · pypa/pip · GitHub and the corresponding setuptools issue Add PWD to sys.path as part of the PEP 517 build · Issue #1642 · pypa/setuptools · GitHub

Basically, setuptools want to do precisely that in the transition to the PEP 517 backend. If you weren’t aware of that, the discussion stemming from two issues will probably help clarify some of the comments here…)

takluyver · February 1, 2019, 10:19am

Flit has build dependencies and would use this bootstrap mechanism once it was available. I could probably make it possible to bootstrap with no dependencies, but I don’t think it’s necessary. It’s only an issue if any of the dependencies use Flit for packaging (and even then, only if you need to install everything from source). At present, Flit’s dependencies all use setuptools. I deliberately made the zipfile36 backport without Flit packaging because I wanted to use it for Flit.

Of course, this does mean that bootstrapping Flit, if you completely rule out installing anything from a wheel, relies on bootstrapping setuptools first. This is OK as long as somewhere there’s a ‘root’ of the build-dependency graph which doesn’t need anything else to bootstrap it.