Two extensions/clarifications of PEP 517?

thejcannon · May 28, 2024, 4:22pm

Background

I have two related extensions/clarifications I’m interesting in making to PEP 517, although I am fully aware that there is much history here. Instead of putting a PEP forth (even for discussion) I’m interesting in looking for possible support and general feedback. I really want this to be a collaboration instead of a flogging . If all seems well, I can author the PEP.

Both of these extensions/clarifications stem from PEP 517 acknowledging backend wrappers in the “In-tree Backends” section. (Unfortunately after this mention, they aren’t mentioned again, and the reader/implementor is left to guess to what degree they are supported.)

The specific use-case I’m hoping to unlock is an out-of-tree build backend wrapper. Such a wrapper is incredibly useful for a (usually corporate) monorepo of Python packages. Easily obvious use-cases such as adding a the “Private” classifier to avoid accidental upload, or by enforcing dependency conventions/requirements. ^[1] The “out-of-tree” part here is mostly allowing “updirs” in backend-path (however once you allow that, you really are allowing any out-of-tree path). [Reminder that PEP 517 explicitly calls out that this is a useful situation, albeit for a single project]

Such a use-case could be supported by uploading our backend wrapper to an internal cheeseshop (or worse, PyPI). However, that’s a very sub-optimal solution and goes greatly against the spirit of monorepos and their tooling. [Another reminder about PEP 517 admitting this is useful ]

Lastly, note I don’t think any part of this discussion should involve making it easier for build wrappers to exist. Like this semi-related proposal)

1: Clarify/encode the responsibilities when it comes to custom wrappers

(This is kind of an extension of this discussion: Nobody is following the metadata_directory promise in PEP 517)

FWIW such wrappers exist in the wild as packages (out-of-tree) such as setuptools-ext · PyPI. Additionally at least one build backend (hatch) has “build hooks” which are related but not the same.

There’s some misunderstanding/disagreement of the role of “final” build backends when it comes to build wrappers (usually around wrappers altering metadata).

2. Clarify/encode the support of out-of-tree “backends”

(Related to the original in-tree backends discussion PEP 517 Backend bootstrapping)

It seems consensus in the original discussion gravitated around in-tree due to simplicity and a lens on bootstrapping backends. (Although I could be wrong, it’s a very long discussion). AFAICT support for out-of-tree backends was never explicitly proposed, and therefore never discussed.

Open questions

1. Is there a way we can encode support for backend wrappers in a backwards-compatible way?

My gut says yes. It will likely involve wheel editing in certain cases, but that’s not that painful (example code). This likely looks something like clarifying the following quotes with the responsibilities of the parties in a world where wrappers exist:

If the build frontend has previously called prepare_metadata_for_build_wheel and depends on the wheel resulting from this call to have metadata matching this earlier call, then it should provide the path to the created .dist-info directory as the metadata_directory argument. If this argument is provided, then build_wheel MUST produce a wheel with identical metadata.

The hook MAY also create other files inside this directory, and a build frontend MUST preserve, but otherwise ignore, such files

If a build frontend needs this information and the method is not defined, it should call build_wheel and look at the resulting metadata directly.

I think this loosely looks like:

If build wrappers want to provide prepare_metadata_* hooks, they MUST respect that the wrapped backend might not support it. In this case, they follow the frontend route and create a wheel, modify it, and return the relevant adjusted metadata.
If a build backend receives a metadata_directory in build_wheel, they MUST respect it by using it verbatim in the resulting wheel (or error if doing so would be fundamentally incompatible with the backend). However, for backwards compatibility build wrappers SHOULD handle the case where backends don’t exhibit this behavior and either error or re-adjust the wheel. (Depending on how nice they wish to be to their users, or mean to their wrapped backends if you consider who might receive the the backlash of the error)

2. Is there a technical reason not to support arbitrary `backend-path`s?

I totally agree with the decision to keep support limited and use the lens of build-backend bootstrapping. But without that lens (and fully acknowledging that there are potential valid use-cases that involve out-of-tree backends/wrappers) is there a technical reason to disallow out-of-tree paths?

To be clear you could easily accomplish this in a spec-compliant way (AFAICT) while going squarely against the spirit of the PEP with an outoftree build backend that just adjusts sys.path from a config and forwards the call.

You could enforce such things without a backend-wrapper, however that would require the enforcer being run. That’s not always the case when people are developing/iterating ↩︎

steve.dower · May 28, 2024, 5:19pm

Thanks for specifying this up front. It clarifies the ask significantly, and without it, the rest of your post comes across like a massive misinterpretation of the original proposals.

There’s no discussion of wrappers because from the POV of PEP 517, a “wrapper” is just a backend. That backend may delegate some or all of its work to another library that is also a PEP 517 backend, but it is deliberate delegation, rather than a feature. If the “wrapper” doesn’t look like a regular backend, it’ll cause an error.

Really the only thing the PEP could say on wrappers is “it is against copyright to use someone else’s backend as part of yours, so you’d better write it all from scratch” which is a silly thing to say Usually what people want here is a plugin for a specific backend, which is a backend feature, not a PEP 517 feature (and as I said, your scenario is not “usually”).

There’s also nothing about out-of-tree backends because they’re all “out of tree” - they come from PyPI (or another index). “In tree” only really occurred because we noticed that the mechanism (an importable name) combined with the default working directory would allow a backend to be importable from sources. After a bit of discussion, we decided there’d be no harm done, and it may even be useful, so we didn’t do anything to forbid it. But it’s not a feature, and so the question of “why doesn’t this other even-less-related thing exist” doesn’t make much sense.

Now, what we really want here is a way to inject processing before, during, and after wheel building, for a specific (set of) build(s), intended for use by the person building rather than for public redistribution (i.e. by the main project), regardless of which backend is going to be used by the project.

The obvious place to do something like this is in the build frontend, which is already chosen and configured by you, the builder, while the backend is chosen and configured by the project. So what I think you’re really looking for is additional extension points to a frontend, probably build · PyPI, or perhaps a custom tool based on it.

Have you considered these options? Would they not work for your scenario? Overriding package’s own build instructions (i.e. which backend to use) is certainly against the spirit of the PEP, but modifying the result via the frontend is fine. Just don’t expect upstream publishers to switch to your frontend for their public releases.

ofek · May 28, 2024, 5:55pm

I’m assuming custom extension modules are involved based on who you are but if not you can use Hatchling for doing dynamic stuff during builds such as file modifications or metadata updates.

I plan to add built-in extension module building sometime in the fall or winter.

pf_moore · May 28, 2024, 6:29pm

As @steve.dower mentioned, from a PEP 517 point of view, you’re talking about an out of tree backend (which happens to use another backend as a library), not something that’s explicitly a wrapper.

But the key issue here is that the PEP 517 model revolves around the idea that you can build a sdist, and then build a wheel from that sdist. You can also build a wheel direct from a source tree, but the two operations are presumed to be equivalent. The problem with an “out of tree” backend is that it won’t be present in the sdist, so it’s not possible to build from the sdist. So if you want your proposal to be viable you’ll need to explain how building a sdist would work. It’s not relevant whether you build from sdist as part of your monorepo workflow - doing so is required to work by the broader packaging workflow, and any acceptable packaging standard must address that.

More generally, if you want to see this proposal accepted, you need to ensure it works outside of the monorepo context. The security implications of a sdist (or source tree) that can pull arbitrary code from elsewhere in the user’s system need to be reviewed. The fact that this prohibits build frontends from copying the source tree into a temporary directory (something that pip actually did, until fairly recently) needs to be made explicit. And so on…

kknechtel · May 28, 2024, 8:42pm

…Wait, it doesn’t do that any more? But isn’t that needed for build isolation?

pf_moore · May 28, 2024, 8:56pm

It became the default in pip 21.3 - “In-tree builds are now the default”. Isolation is about the python environment (what packages are installed) not about protecting the build against left over stuff in the source tree (which is the backend’s responsibility).

Copying is still a valid strategy, but it’s slow (especially if there’s extra stuff in the source tree, like a development venv). Pip stopped doing it because of performance, not because it’s incorrect.

thejcannon · May 29, 2024, 1:40am

That isn’t what I want. I very much do want to customize the metadata packaged with my project, regardless of frontend.

Only in the sense that I want to support wrapping maturin (or any build backend, but especially maturin)

Ok, that’s not something I quite gathered from the PEP or from skimming the discussion. Can you help me find where in the PEP that’s laid out? The closest thing I could find was:

Some backends may have extra requirements for creating sdists, such as version control tools. However, some frontends may prefer to make intermediate sdists when producing wheels, to ensure consistency. If the backend cannot produce an sdist because a dependency is missing, or for another well understood reason, it should raise an exception of a specific type which it makes available as UnsupportedOperation on the backend object. If the frontend gets this exception while building an sdist as an intermediate for a wheel, it should fall back to building a wheel directly. The backend does not need to define this exception type if it would never raise it.

Which makes me think build_sdist is more of an “optional” hook, where you can just unconditionally raise UnsupportedOperation if you want. To be fair, it seems the most common workflow pip install and pip install -e (as well as uv flavors) all seem to be OK with my backend not defining build_sdist at all.

So I guess question #3 is now, "is build_sdist actually required? I could see how, if it’s defined the turning the result into a wheel and calling build_wheel should produce similar results. But if it isn’t defined…?

Isn’t the next operation performed using the arbitrary code from the backend to perform the build? There’s nothing stopping build backends today from reading from arbitrary paths, so I don’t think this introduces any additional security concerns.

pf_moore · May 29, 2024, 6:33am

My apologies, it’s so uncommon that I’d forgotten that build_sdist is optional. But you don’t require backends that are located outside of the source tree to not support building sdists (and such a requirement would be unenforceable anyway) so my point remains.

The point is that when the build happens, the path may no longer even point to the backend code it’s supposed to. Again, please remember that there is nothing to stop people using the proposed feature outside of the monorepo context.

steve.dower · May 29, 2024, 9:13am

In that case, you want to build your own backend. I did.

The separation between “I am the project developer” and “I am the person building it” is pretty clear. If you are the former, choose a backend that does what you want. If you are the latter, choose a frontend that does what you want.

Either way, most of the questions/proposals in the original post don’t make much sense, which suggests the right answer is we need to be explaining the PEP 517 model better.

pf_moore · May 29, 2024, 10:20am

I think the main thing that might be worth clarifying is that PEP 517 is based on a model where the “source tree” is the highest level construct that we consider, and is self contained. And yes, this means that the model does not cater for monorepos.

We can qualify that - monorepos mostly seem to work OK, after all, if you just think of them as a bunch of source trees that are managed together - but IMO everyone would be served better by making that key fact explicit. The important thing is that monorepo support isn’t something that “just needs a couple of tweaks to work”, it’s a redesign of the underlying model.

oscarbenjamin · May 29, 2024, 11:20am

You can also have a developer frontend for your backend. When using meson as a build system you can use meson-python as the build backend and spin as a developer frontend.

The three parts are:

meson is the build system that does the actual building like running Cython, C compilers etc and works like configure/make.
meson-python is the PEP 517 build backend that is invoked by PEP 517 frontends (pip et al) to build wheels or do editable installs.
spin is the developer frontend so for development you do spin test, spin docs etc.

Here meson-python and spin are both just wrappers around the meson build where one provides the interface needed by PEP 517 frontends and the other provides a convenient interface for development. Both meson-python and spin just translate all commands into meson commands and are specific to the fact that meson is being used. In particular spin mostly does not use meson-python and instead runs meson commands directly because PEP 517 does not provide all of the things that are needed for routine development.

What you probably want in a monorepo scenario is something like spin that is designed for monorepos, knows the monorepo structure, and can work with all of the build systems used for each subproject. Hypothetically this could be a tool called monospin and then monospin build could ensure that the build backend from the monorepo is used when building a wheel etc.

It is possible that extensions of PEP 517 would be useful to facilitate making frontend tools like spin or monospin that are not tied to a particular build system in the way that spin is tied to meson and meson-python.

thejcannon · May 29, 2024, 2:17pm

They might not make much sense to you, but stating they simply “don’t make much sense” is quite dismissive (and the others on the thread seem to be understanding it alright). If you would like to ask for clarification I’d be happy to oblige, but otherwise I’m left guessing which parts don’t make sense to you.

I do understand the model. In my OP I laid out that I want to wrap any existing backend that my projects use and augment the metadata. Whether it’s called a “backend” or a “backend wrapper” I’m on the “project developer” side of things. I do agree though that the spec is ambiguous on how backends should behave in a world with such “backend wrappers” (This post by @pf_moore does a decent job of summarizing how the model is ambiguous).

thejcannon · May 29, 2024, 2:54pm

That makes sense, and probably shores up my own mental model w.r.t. PEP 517 (although admittedly disappointing for this use-case, I’d prefer explicit consistency over implicit assumptions).

As a corollary, because I do think the monorepo workflow is underserved, where would you lie on a PEP that codifies a bit of monorepo workflow-ing for frontends akin to Cargo/rye’s workspaces?

pf_moore · May 29, 2024, 5:47pm

One place I don’t think the spec is ambiguous, though, is that “in tree backends” are very clearly just like any other backend, but located in the source tree of the project. The fact that an “in tree backend” might call another backend as a library (i.e. act as a “backend wrapper” in whatever sense you care to use that term) doesn’t alter that fact.

And this is where I think your proposal doesn’t make sense in the context of PEP 517. I’ll give a couple of examples to try to make it clearer to you where the disconnect lies.

You’re reading way too much into what the PEP actually says here, which is

Project-specific backends, typically consisting of a custom wrapper around a standard backend

Note that the PEP describes these as Project-specific backends and only mentions wrapping as an implementation detail. And it explicitly says “project-specific”. While you could claim that I am reading too much into that term, I’d argue that the important point here is that it’s linked to the one project (i.e., the one pyproject.toml) and not shared.

Again, you’re using “wrapper”, which is nothing more than an implementation detail. So people like me read that statement as referring to “an out-of-tree build backend”. And that’s just a normal backend, served from a package index or wheelhouse, and installed by the build frontend when the build environment is set up.

You say that hosting your backend on an internal index is “a very sub-optimal solution and goes greatly against the spirit of monorepos and their tooling”. I can’t debate that with you as you haven’t explained the issue, you simply state it as self-evident. Maybe it is to you, but to people like myself who only have a vague understanding of what a monorepo is, much less what the “spirit” of the idea is or what specific tooling designs are required in order to be a “proper” monorepo, it’s simply an unjustified opinion.

This leads onto a much larger topic, that’s not really what we should be discussing here, but which is in a very real sense a prerequisite for this discussion, which is what exactly a monorepo is, how it works, and what requirements it places on tools. And leading on from that, how do existing packaging tools need to change in order to fit that design and those requirements.

So the way things look from my perspective is that we have two mechanisms at the moment:

Write a normal backend (that maybe wraps a standard one) and host it on a local index.
Write an in-tree backend, and copy it into every project that needs it (maybe via git submodules, or something similar, if you want to avoid manual copying).

You’ve dismissed both options as “not what you want” but rather than explain the constraints you’re working under so that we can see why those two options aren’t appropriate, you simply propose changing an existing mechanism. And the change you propose seems risky and problematic under the model that we are familiar with - which for better or worse is the model that PEP 517 was designed to support. I’m trying very hard not to say “our model is right and yours is wrong” - but given the relative popularity of the monorepo model vs the project directory model, I do think the PEP 517 model addresses the majority need here. Maybe there’s something we can do which addresses both models, but it won’t be by means of incremental changes based on incompletely understood constraints…

pf_moore · May 29, 2024, 5:48pm

I have no opinion because as I said in the other message I just sent, I don’t know what constraints a monorepo imposes. But please don’t try to explain here, as I think “supporting monorepos” is a big enough topic to warrant its own thread / proposal.

rgommers · May 29, 2024, 7:22pm

…

Agreed. I was halfway through typing up a reply to the first quote when I remembered the second one. My reply was basically going to suggest pretty much that, so let me post it below anyway (perhaps spelling it out helps):

The constraints at pep-0517/#in-tree-build-backends that an in-tree backend needs to live inside the source tree are strong, but not that strong for a monorepo I think. Perhaps something like this would satisfy the current need?

# in pyproject.toml
build-backend = 'monobuild.py'
backend-path = ['.']

# In monobuild.py:
# set a path here if needed to access `out_of_tree_location` from elsewhere in the monorepo
from out_of_tree_location import build_wheel

Something like that should adhere to the letter of what PEP 517 says, while only needing one .py file with a few lines of code to put into any Python package in the monorepo. That’s not all that much worse than backend-path = ['../out/of/tree/location/'].

Since it seems fairly straightforward to work around the issue with the approach above, maybe it’s not required to update PEP 517? On the other hand, I also don’t see a major problem with making such an update. E.g., amend it to say that packages published to PyPI must not contain an out-of-tree path in backend-path, but that it is allowed locally to support the monorepo use case better.

Something like that would still meet what I think is the intended goal as stated in the PEP (“The first restriction is to ensure that source trees remain self-contained”) for published source trees (e.g. an unpacked sdist). I think I agree that a PEP doesn’t really have much business prescribing how private VCS repos are to be arranged - a monorepo seems to not have been thought about here, and seems like something that ought to be supported.

steve.dower · May 29, 2024, 7:46pm

This is generally true of all packaging PEPs. They are for interoperability, not for correctness.

Which means you can do whatever you need to in order to make things work, just don’t expect others outside of your scope to play nice with it.

“We’re all consenting adults here” is the principle at work. Just because you can do it, means you actually can do it. Putting rules/definitions/directions in a PEP takes away that principle and forces everyone else to do something for you.

EpicWink · May 29, 2024, 7:56pm

Even better, write a tiny in-tree build backend which copies the rest of itself (I think the OP called it the enforcer) into the build directory on every invocation, then uses that. This makes sdists self-contained, and minimises duplication between projects in the monorepo.

Specifically:

in build_sdist: you would always copy your wrapper then invoke its hook
in build_wheel*: you would copy your wrapper if missing then invoke its hook, then delete the copy

pf_moore · May 29, 2024, 8:34pm

I don’t see a significant benefit, and enforcing that condition, while it’s not that difficult, seems like it would be annoyingly fiddly (as the index would have to unpack the sdist, find the pyproject.toml, and parse it - for every sdist upload, just in case).

I’m not going to fight over it, but given that there seem to be plenty of acceptable^[1] approaches, making such a change doesn’t seem like the best use of people’s time.

to me, at least - see my earlier comments about not knowing what constraints monorepos impose ↩︎

brettcannon · June 26, 2024, 9:15pm

Do note that at this point PEP 517 is a historical document and what packaging.python.org holds are the actual details.