I’ve been doing a lot of thinking about this, and talking to a few folks about it who have a lot of experience with this type of problem in a variety of build systems with 10-20+ years of history behind it as well. Trying to figure out how they’ve solved this problem in the general case, and where our solutions meaningfully differed and why.
Historically we’ve had setuptools acting as the only build backend, and due to the nature of how setuptools worked, any sort of build code that existed beyond what setuptools itself provided had to be developed inside of setup.py
with minimal ability to use third libraries to make reusable code. This lead to a ton of custom one off code that was poorly tested and factored that ended up being cargo cult’d around from project to project and often times being slowly modified as time went on so there were tons of copies of this code but they were all slightly different, like mutating strains of a virus spreading throughout the packages in the Python ecosystem.
We developed PEP 517 with those experiences in mind, and thus we designed to counter our experiences. We knew one off code had been a huge problem for us, so we purposely made it difficult to actually go about and add this one off code to a package ever again. Obviously we didn’t make it impossible, or intreehooks wouldn’t exist, but we purposely made it as difficult as we reasonably could.
In thinking about all of this, part of me feels like maybe we swung our mental pendulums (myself included) too hard in the opposite direction and gone to the other extreme, which isn’t inherently better, it’s just different. There are legitimate use cases for one off code that our current system isn’t handling nearly as well as it could be.
For instance one such thing that this one off code allows is novel composition of existing build systems that maybe only makes sense for a single package or small wrappers over existing build systems to provide some level of customization that maybe didn’t make sense in the original build backend but do for this one particular package. Now obviously one answer to that is they can spin out this code into a custom library and depend on it. However I think that perhaps encouraging people to make single-use libraries on PyPI is making the experience worse for users, not better. Obviously general purpose backends absolutely should be distributed independently on PyPI, but I feel like things that are obviously tied to one specific package or such should not live there. We should encourage producing build-backends as libraries as the default case, but provide tooling to support the in tree case as well.
Obviously we didn’t completely block out support for this today, because the intreehooks package exists, but I think that with some fairly simple additions to the spec we can natively support it rather than requiring a sort of meta backend.
Now of course, one of the use cases for this generic facility to have an in tree build backend would be to provide a way for backends that wish to build themselves, to do so without requiring a fiddly meta bootstrapping process on the end user’s side of the equation. They would simply use whatever mechanism we implemented, and provide an implementation of themself through that which is good enough to produce a wheel and nothing more.
In thinking through all of this, and talking it over with others, I think I’ve convinced myself that we should:
- Add a
python-path
key to thebuild-system
table inpyproject.toml
, with a restriction that the path must be relative and must be resolve to a location relative to the location of the directory that contains thepyproject.toml
. - Update PEP 517 to state that build-backends that are not packaged using another build-backend SHOULD utilize the
python-path
key and ensure that they do not introduce a cyclic build graph (e.g. foo can’t build-depends on bar which then build-depends on foo).- A key thing here is we don’t prescribe how this should be done. I can think of two possible solutions:
- Provide a bare bones self building hook with zero dependencies and add the path to that using the
python-path
key (recommended). - Bundle their build dependencies as part of their sdist, and use the
python-path
key to add all of those build dependencies, and point theirbuild-backend
key to the “normal” import that they would otherwise use.
- Provide a bare bones self building hook with zero dependencies and add the path to that using the
- It might make more sense to make this a MUST, the SHOULD means that if a project is truly against putting in any effort to support automatically building from “zero” they can just omit support for it completely and still be complying with this PEP. However the case of just vedoring your dependencies is fairly simply to implement and makes the experience a lot more consistent for end users with less footguns and less likely leading to having users randomly hitting either the frontend or the backend with reports saying “hey this didn’t work”.
- We should probably mention that front ends SHOULD reject cyclic dependency graphs when building from source.
- A key thing here is we don’t prescribe how this should be done. I can think of two possible solutions:
- Update pip so that
--no-binary :all:
does not mean “disable PEP 517 and never use a wheel at all”, but rather means “don’t use any existing binaries, and produce a new wheel file locally to install from”.
I think that this solves not only the bootstrapping problem, but also brings our stance back to a more moderate location that makes it easier for projects that should have one off build code do that while still generally encouraging people to package truly generic build code as libraries on PyPI. Since the “one off build code” solution even takes the same shape programmatically as the “library from PyPI” case it provides a much simpler path for someone to start off with a one off build backend, then if it ends up growing to something that is more generally useful, extracting it out into it’s own library and distributing it with minimal changes.
This also allows backends to weigh the trade offs of how they might implement these constraints. If a project wants to expend minimal effort to do so and do not believe that from source is particularly important, they can simply just dump all of their build dependencies into a _vendor
directory in their sdist at the cost of a larger sdist (but then, hopefully most people are installing from wheels anyways). If a project is willing to put in more effort, they can implement a minimal build system (either by making their actual build system able to operate in a minimal dependency mode, or adding a small one off one).
Of course, if a build backend wants to side step this by just depending on another build backend to package themselves, that is completely reasonable as well.
I realize that this is a bit of a reversal from my earlier position, but that comes from talking to people with long standing build systems that pretty much universally said that sometimes it’s needed to have some custom one off logic specific to that package, and realizing how crummy it would be if every project like that needed a custom build hook on PyPI. Once I accepted that idea, then I realized we had two use cases where this feature would be useful, which lead me to reverse my position and aim for the more general solution.