Discuss PEP 662: Editable installs via virtual wheels

layday · June 8, 2021, 12:28pm

What is there to tell the frontend what to do? If the frontend must use ill-defined rules and heuristics that it will accumulate with time to handle the output of the backend then I don’t think that is a good solution for editable installation.

bernatgabor · June 8, 2021, 12:55pm

I’d argue for the ill-defined part, but accepting that, I need to say you always have ill-defined rules. The only question is where it lives. Can some part of it be on the front-end side or do you mandate everything to be on the backend said. editable itself is a set of ill-defined rules. There’s no clear right and wrong thing to do there, and a lot of it is the best effort, where @pf_moore is the BDFL on saying what’s acceptable or not acceptable (there already been instances of such decisions, where Paul decided that a method/request is out of scope for the project; which as a maintainer he’s completely in rights to do). editables is not a perfect solution, has a set of limitation (e.g. namespace support, resource file support, extensibility, etc) that another backend might not want to follow.

There’s no universally good solution here, just a set of trade-offs where some users might value some traits more than others, and might be willing to prioritize X over Y. This is where this PEP comes into the picture. It does not mandate an implementation but allows both the frontend and the backend to take ownership of how it works part. PEP-660 only allows the backend to own that part.

PEP-660 moves the set of ill-defined rules onto the backend (currently a set of such lives in editables, but I’d imaging we’d have more of those as more backends adopt it, and their use case starts clashing with what editables considers ok and in scope). This PEP still allows the backend to continue holding all the cards but gives chance for the frontend to take charge of some of these rules, for example using symlinks instead of always forcing the copy of files. I think this is why @pganssle was saying that the virtual wheel approach offers everything PEP-660 does and more. Backends that want PEP-660 can continue using PEP-660. Backends that don’t want/care to be in charge of generating import hooks can instead just return the files to the frontend and allow the frontend to do the magic.

takluyver · June 8, 2021, 1:01pm

I tend to agree with @layday that this kind of flexibility is not especially desirable. I think any spec should clearly lay out what the frontend and the backend are each expected to do, and minimise the overlap which could be done by either - in my rough sketch, the backend says what should be importable, and the frontend makes that happen.

Maybe we leave some wriggle room for clever tricks to achieve other ends, but I don’t think saying “this bit can be done by either the frontend or the backend” makes for a good spec.

bernatgabor · June 8, 2021, 1:11pm

And that’s what this PEP proposes, nothing more. However, note that the backend can say hey frontend make importable these files, and those files don’t contain the actual business logic, but just some import hook logic that then manifests those files from the source tree at import time. There’s no way to enforce for the backend that it actually returns actual business logic source files; for one because there’s no way to check/enforce it, and second because there’s no reason not to allow that flexibility for those advanced use cases - e.g. when you’re merging source files during build or if you want dynamic logic to execute at import time (to achieve filtering some modules from source tree or on-demand compile your c-extensions if they are out of date, etc.).

pf_moore · June 8, 2021, 1:24pm

OK, I’ll give my PEP-delegate ruling here, since I was mentioned

In my view, @takluyver is correct - a successful PEP will need to lay out clearly what the responsibilities of the front and back end are.

Currently, I believe PEP 660 is reasonably OK on this - the front end must install the provided wheel, and the backend must ensure that the wheel it provides, when installed, implements the “editable” semantics. Exactly what “editable” means is not well defined, and I’ve been pushing for PEP authors to take the opportunity to clarify this, but so far the community consensus seems to be “we don’t want to pin this down but we’ll know it when we see it”. To the extent that PEP 660 makes implementing editable semantics entirely the responsibility of the backend, I can live (reluctantly) with that.

The “virtual wheel” proposal, though, is falling short at the moment. It’s not at all clear what backends are expected to put into the list of files they provide - consumers have no way of knowing whether they are getting raw data about the project files, or pointers to code that implements an editable implementation. You can’t have it both ways - either the front end or the back end needs to be responsible for implementing the “editable install”, it’s not reasonable for the back end to supply the implementation and expect the front end to know not to use their own implementation but do a dumb copy. Conversely, it’s not at all clear what front ends have to do - are they expected to expose just importable modules, or headers and supporting files as well? Letting the backend maybe provide these files just confuses the issue - is the front end allowed to ignore them if it doesn’t implement editables in a way that supports them? And claiming that “editable” is vaguely defined doesn’t cut it in this case, as the “virtual wheel” proposal seems to share the responsibility for implementing editable semantics between the backend (which lists the files involved) and the frontend (which implements the mechanism). So there needs to be a shared definition of what editable means, and the PEP has to offer that.

OK. I’m out of time to respond further. But to be clear, this is my view as PEP delegate on what standard of clarity a successful PEP must include - it’s not directly a commentary on either of the proposals. I’ve supplemented it with my view on where the two proposals currently are as far as reaching that standard is concerned, but that’s intended just as guidance. Don’t feel that you have to respond to the points I’ve made - whatever is in the PEP that ultimately gets submitted for approval is all that matters here.

dholth · June 8, 2021, 1:29pm

I understand that the virtual wheel proposal expects a set of paths that you could pass to a wheel builder, so that the builder could create the archive. So 100% of the .py files etc.

a/
a/__init__.py
a/b
a/b/__init__.py
a/b/extension.so
example.dist-info/*

Then the front-end would be expected to

For each directory in the mapping, create a real directory in purelib (site-packages).

For each file in the mapping, create a symlink to the source file placed in the real purelib directory.

If you’re using a build system where the mapping would be easy to produce (so not setuptools or enscons), then this simple-strict case seems easy enough.

But then you create a new file and wonder why it’s not importable. Or maybe you wonder why it’s importable when you run python from your src/ directory and not importable when you run python from elsewhere.

Either you are happy because you avoided a bug during distribution. Or you look for a less strict option.

Less strict option A:

Installer finds the topmost target directory and symlinks it to the topmost source directory. Ignore all contained files.

What do we do when some of the files contained in the destination are not contained in the topmost source directory?

…

layday · June 8, 2021, 1:41pm

Well, several points.

First, a slight misinterpretation - what I am concerned with, assuming that the backend is allowed to bake editable-ness into its output, is not what editables or any other editable mechanism gets up to but how it communicates that to the frontend so that the frontend knows what to do with its output.

Second, a PEP like this one can in fact define what the rules that govern editable installation are. We could produce a spec that allows either the backend or the frontend to produce editable output, provided the means through which that is accomplished are part of the spec.

However, and third, I think we agree that, as a general rule, separation of concerns is a good thing. If we are not able to delineate the backend’s versus the frontend’s responsibilities, to me, that means that the problem space is not well understood and our aim should be to understand it better.

takluyver · June 8, 2021, 2:02pm

I’d allow & encourage the backend to specify a source directory (a package, i.e. the directory that contains __init__.py if it’s a normal package), rather than having the installer identify it from a list of files. Then the frontend just symlinks that.

bernatgabor · June 8, 2021, 2:03pm

They’re responsible for exposing what the backend feeds to it. This includes Python modules, header, and supporting files within paths. Note though that the importable modules are not necessarily the business logic, they can also just be the import hook stuff that editables provides.

I think you can. Generally, the frontend will expose as editable what it gets. Note, however, because the frontend can feed non-business logic content to the backend, but instead of some import hook magic, the backend may take over. Note in this case the frontend is still in charge of making the import hook installed in editable mode, but the business logic is provided by the backend otherwise.

This is the point. Primarily the frontend is in charge, but the backend may take over by virtue of being a man in the middle and being able to generate import hooks. Note though only the frontend can create files in the target interpreter, the backend cannot (but can work around this via import hooks).

Not always purelib but also possibly platlib, include, data.

This is where the backend would be expected to document clearly the expectations for the user. Note however if escons/setuptools wants to achieve today’s status quo (and doesn’t want to get fancier), it can also just return /a/project -> "", which would mean essentially insert the project root at the root of the target interpreter (either via pth or symlink mechanism). After that, same as today, new files added would be automatically discovered. Changes impacting the dist-info folder would require a reinstall, same as it does today. The frontend can also go a third route and generate an import hook that does the editables magic to dynamically discover new files (this likely would be the solution that works all the time, but more complicated to implement). Importantly the frontend can expose multiple editable modes, and not make itself the choice, but allow the user to choose. Think of it as you the developer can do pip install -e . --mode=symlimk|editables|pth, where e.g. pth would be the default.

Well depends. Note here moving the responsibility just on the backend means banning using symlinks… which often are a good and cheap solution. So doing this separation of concerns is not cheap.

dholth · June 8, 2021, 2:04pm

Does that work for namespace packages without __init__.py?

Yes @bernatgabor I know it would include all wheel categories but I’m trying to figure out if you understand how to implement it for purelib

takluyver · June 8, 2021, 2:19pm

I think so: the backend would just switch to listing the contents of the namespace package - which may include subdirectories - instead of the package directory. By listing a directory, you’re essentially claiming that that directory is solely part of this distribution, and won’t be shared with any others.

It gets a bit complicated where you have a namespace package inside a regular package, e.g. if you want plugins to go into a foo.plugins namespace, but the parent foo package is just your own code. You would have to list the contents of foo rather than just foo itself in that case.

pganssle · June 8, 2021, 2:25pm

I agree with @takluyver on this, and it’s basically what I’ve been proposing the whole time.

As I’ve mentioned in a few places, I agree that the main thing I disagree with in PEP 660 is the way it leaves it up to the backend to effectively do the installation. Whether the information communicated from the backend to the front-end is via a wheel file or a data structure that mimics a wheel is immaterial to me.

EpicWink · June 8, 2021, 2:28pm

Taking a step back, the purpose of a standard is to allow parties to make assumptions. If the backend makes one assumption about what an editable installation is (eg new files are not automatically importable), but the frontend makes another assumption (eg new files should automatically be importable), then forget about implementation, you have a conflict.

The three (that I can think of) solutions are:

allow front-ends to define an “editable” installation (eg what I interpret the virtual wheel idea to be)
allow back-ends to define an “editable” installation (eg PEP 660)
explicitly define an “editable” installation in a PEP

Things being ill-defined across boundaries, introspection, open-ended edge cases, does not a standard make

pganssle · June 8, 2021, 2:34pm

bernatgabor:

PEP-660 moves the set of ill-defined rules onto the backend (currently a set of such lives in editables, but I’d imaging we’d have more of those as more backends adopt it, and their use case starts clashing with what editables considers ok and in scope). This PEP still allows the backend to continue holding all the cards but gives chance for the frontend to take charge of some of these rules, for example using symlinks instead of always forcing the copy of files. I think this is why @pganssle was saying that the virtual wheel approach offers everything PEP-660 does and more. Backends that want PEP-660 can continue using PEP-660. Backends that don’t want/care to be in charge of generating import hooks can instead just return the files to the frontend and allow the frontend to do the magic.

To be clear, in the other thread I was arguing that in the “virtual wheel” approach, the backend has the same amount of information it has in the PEP 660 approach, meaning that any problems that can be solved by the backend in PEP 660 can also be solved by the backend in a virtual wheel approach. I also will say that I think it’s a bad idea for backends to do these sorts of hacks for the same reason I didn’t like PEP 660 in the first place — it’s bad separation of concerns.

I would say that we should put some non-binding language into the PEP strongly discouraging the use of anything other than strict listings of what needs to be installed (and in fact strongly discouraging the use of “expose this directory” over a listing of the files included in the directory). In my mind, it’s OK to allow backends to put in weird hacks if they have some problem that really can only be solved by the backend, but that they should not be the norm and front-ends shouldn’t be expected to support them — if you use a weird hack it’s on you.

dholth · June 8, 2021, 2:44pm

@pganssle, who tells me he has muted me so perhaps he won’t see this reply, has been clear that he would like to change the expectations of setup.py develop users so that ‘automatically pick up new files in a module’ does not happen. Maybe Paul has made broken releases from his own editable distributions in the past because they were not strict enough? But we know that anyone who already likes setup.py develop would hate this feature.

Again this suggests to me that we need a different name for a new ‘preflight / same as the the real install in 95% of cases instead of in only 90% of cases’ feature.

If the virtual wheel proposal knows exactly the collection of symlinks etc. that should be installed to affect an editable implementation that would be better than simply asserting that the installer could “figure it out” given the mapping of source files to internal-to-wheel paths, and that the status quo that many people like is intolerable.

We don’t want setup.py to actually do the installation because we want pip to guarantee that uninstalls will work, and also because copying files is harder than it looks. But it’s 100% fine for the build system to determine which files are installed in the normal case.

pf_moore · June 8, 2021, 3:07pm

That sounds like you’re arguing that “editable” is in essence strictly defined as “exposes a fixed set of files, modifying the content of any of those files will be immediately visible to the Python environment, but adding new files or deleting files requires a reinstall, and directory structures are not exposed except by inference from the exposed filenames (so you cannot expose an empty directory, such as a placeholder namespace package)”

That’s fine, I am genuinely pleased that someone is attempting to address the issue of actually defining what an “editable install” means.

However, I have significant concerns that this is incompatible with the current de facto meaning of “editable installs”, as delivered by the setuptools setup.py develop. I think that this definition needs to be supplemented with a backward compatibility clarification and a migration plan, explaining how people relying on the existing semantics of setup.py develop should migrate to the new mechanism.

We’ve done this before, moving from dependency_links to PEP 508 direct URLs. That was a protracted and difficult, but ultimately successful, transition. I don’t see any reason why we couldn’t have a similar transition here, but IMO it’s the responsibility of the PEP authors to discuss the transition process if they are proposing that sort of definition. And it’s certainly not a foregone conclusion that such a transition is justified in this instance.

In addition, it’s not clear to me whether you’re suggesting that editable installs can include files that aren’t importable Python modules, like headers, data files, etc. Clarifying that would be important as well, because AFAIK no-one has any valid implementation strategy for exposing (for example) headers, apart from symlinks which are not universally available.

bernatgabor · June 8, 2021, 3:09pm

Yes This is the primary goal of this PEP, however, note my next point below.

Because the backend is the one providing the information there’s no way to not allow this. Even if the PEP explicitly disallows this, the frontend has no way to enforce or detect it. A backend is a man in the middle. It can always circumvent the frontend if it wishes so. I’d say as long as it does create new files at runtime on the frontend side we’re ok to allow it to do whatever it wishes.

I think we disagree enough on what editable is that this would never pass and would be way to restrictive.

So my goal here is to standardize 1 via this PEP, but allow 2 as an escape hatch for the backend.

pganssle · June 8, 2021, 3:21pm

This is not exactly what I’m saying, particularly not the “exposes a fixed set of files…” part.

I’m saying that it’s not up to the back-end to do the exposing, it’s up to the backend to tell the front-end what the package looks like. I originally wanted it to be exposing the fixed set of files, but very, very early on it became extremely clear that some people care a lot about the ability to get new files picked up without re-installing, so I’ve always expected this to be a possible installation mode.

My thinking on the matter is that an editable installation is one where at least the subset of files specified in the mapping are exposed to the interpreter for installation. Front-ends are free to expose more than that, including by simply adding each directory containing a file to expose to the path in the appropriate way (or symlinking in directories, or whatever).

It may be that there’s a more optimal way to communicate between backend and front-end to make the “install and pick up new files” experience smoother (e.g. allowing backends to specify an optional “directory structure hint”, possibly with inclusion and exclusion globs), but honestly in the short term I suspect that in the majority of cases a front-end whose implementation is to generate a .pth file or symlink or import hook that adds the directory common to all files as importable on the path is going to be good enough for all or nearly all of the current use cases for editable installs.

pganssle · June 8, 2021, 3:26pm

Yeah, I think what @EpicWink and @layday are saying is that it doesn’t matter if the front-end can enforce it. In general these PEPs are about allowing the front-end and the back-end to make assumptions about what the other is doing. If the PEP says, “Backend you can’t generate a .pth file and try to do your own thing to generate an editable install” then the front-end is free to assume that the backend isn’t doing that, and any backend that does do that knows that they are out of spec.

I agree with this general sentiment, but I also think that occasionally backends will be in a position to address certain edge cases with some sort of hack of this nature, so I think it’s best to acknowledge that it’s possible and recommend avoiding such techniques unless there’s no other option, and assume that front-ends will support it on a best-effort kind of basis.

bernatgabor · June 8, 2021, 3:35pm

I don’t know of any other way sadly. I think it’s better to propose some way to support it though than not supporting it at all, so I’d still go with that recommendation. If the frontend can’t satisfy the backend’s request it should fail the installation. Furthermore, considering symlinks can be enabled on Windows, they might be considered universally available. A frontend can in this case fail with an error saying editable mode requires working symlinks, check out here on how to enable it for Windows. This essentially makes the project require symlink for editable mode if it provide include/data files; which I think is fine, and many project developers would be fine with it.