Standardising editable mode installs (runtime layout - not hooks!)

pf_moore · May 5, 2020, 9:38am

Once again, the discussion on editable mode seems to have started going in circles. We now have at least the following threads on the subject:

I’ve reached the point where I’m convinced that the problem is not about designing hooks to communicate between backends and frontends, but rather the more fundamental question of what “develop mode” or “editable mode” actually is. So I’m going to reboot things yet again (yes, I know, sorry!) to address that question first.

What is “editable mode” right now?

The first, and most key IMO, point is that users are not asking for a new feature here. What they want is for PEP 517 to support the same functionality that they had in the “legacy” mode.

In terms of implementation, that consists of a .pth file to expose the user’s source directory to sys.path, plus some extra machinery to expose metadata and to put built C extensions somewhere that is importable.

What is wrong with that approach?

Honestly, it seems like not much. Users seem to like it. However, in the process of debating the implementation, the following issue was pointed out:

Specification of editable installation

Hmm… I did not realize this was the case, but it actually seems to be an implementation detail that leads to a pretty serious bug, because it seems that it is basically just adding the foo/ directory to the python path, which means that it is not correctly excluding packages, so if I have a src like this:
src
└── foo
    ├── __init__.py
    └── bar
        └── __init__.py
And my setup looks like this:
setup(
    name="foo",
    version="0.0.1",
    package_dir={"": "src"},
    packages=find_packages(where="src", exclude=["foo.bar"]),
)
Then when I do pip install . , python -c "import foo.bar" correctly throws an ImportError , but after pip install -e . , I am able to successfully import foo.bar .

That came from here - discourse doesn’t seem to be linking it properly.

It appears from what @pganssle said here that this wasn’t something that was reported by users, but was spotted as part of the discussion. So while he’s describing it as “a pretty serious bug”, it doesn’t appear to be bothering users particularly. It’s also specific to the setuptools exclude= feature - I don’t know if other backends have anything similar. And conversely the current behaviour means that people can add files to an editable install without needing a reinstall, which seems like something that people probably do.

What would a standardised runtime layout for editable mode look like?

Two basic approaches have been suggested:

A .pth file based approach, essentially like the current one.
Using symlinks to reference the user’s development tree from site-packages.

In both cases, we seem to be clear that the .egg-link mechanism for exposing metadata isn’t needed, and we should just put metadata in site-packages, the same as we do for non-editable installs.

The .pth based approach needs a bit of fleshing out (naming convention, project-version.pth seems reasonable) and someone needs to clarify how C extensions get exposed (they could be installed to site-packages as they need a rebuild/reinstall, or they could be exposed via the .pth file, I guess). I don’t have enough expertise with C extension development to comment, TBH. It also needs to discuss how it handles (or doesn’t handle) the “excluded files” issue above.

The symlink approach needs to describe what happens when a system doesn’t support symlinks (Windows 10 without developer mode enabled, or earlier Windows systems, and Unix on some filesystems). It also needs to discuss changes in semantics, like the fact that you need to reinstall if you add or remove files.

There’s no reason a standard couldn’t allow both mechanisms (or even additional ones). But because of the semantic differences, users would need a way to determine which approach is in use for a project (this could simply be a metadata value, though - we don’t need to over-engineer it).

How does this help?

In my opinion, getting agreement on how we want editable installs to be implemented at runtime is the only way we’ll move forward with implementing editable mode support under PEP 517. What we’re seeing at the moment is a classic issue of implementation-defined behaviour, where the mechanism for editable installs isn’t standardised, and so trying to standardise things on top of it becomes impossible.

Having a standard runtime layout would stop people stressing about who implements the functionality¹, as it ensures that no matter whoever implements it will do so in the standard manner.

Why not just finish standardising the hooks?

Because we’re stalled (again). And the discussions keep focusing on implementation, not on interface. But hooks are interface, so we get no conclusions by discussing implementation.

And even if we did standardise the hooks, we’d still have to sort out implementation. And leaving that as a front end implementation choice just ignores the issue - pip may be the only front end right now, but we want to implement a standardised approach. On the other hand, if people want it to be a front end choice, then we can handle that by just having everyone else say here that they are OK with the pip devs writing the standard!²

¹ Except for the more basic reasons like separation of concerns and maintainability.
² But don’t then scream if we choose an implementation with semantics that you don’t like

pf_moore · May 5, 2020, 9:50am

While the above summary is intended to be an objective statement of the problem, I’d also like to add my perspective as a frontend developer (and whatever other hats I happen to wear )

Personally, I think that the .pth file approach is perfectly fine. It matches the semantics that users know, and it’s relatively simple to implement. I’d strongly advocate for pip to use this as the default implementation method. I have a certain level of concern that a symlink-based approach could cause more support problems than we expect, particularly on Windows, and so I’d prefer pip not to use it, but I’d be OK with it being an option, I guess, if the other pip developers wanted to. I’d be strongly against any proposal that didn’t allow pip to implement a .pth approach.

I don’t believe that the “excluded files” issue is a major problem. It’s clearly a case where editable mode behaves differently than “normal” installs, but I’m completely fine with documenting that and telling people not to rely on editable-mode tests to validate production behaviour. Frankly, setuptools has a lot of ways for users to build stuff that does bizarre things (implemeting custom command classes lets you do anything) and I don’t think there’s a hope of standards supporting all of that. Setuptools install lets you put files in /etc (and people did!). Wheels don’t allow that, and yet the world didn’t collapse when we switched to using wheels…

bernatgabor · May 5, 2020, 10:40am

I’d like to second @pf_moore and also advocate for the .pth choice. It’s the more backwards compatible path.

I view develop installs a mechanism that allows mapping the source tree files directly into the python path; rather than needing to copy those into a wheel and then install that.

Mapping an entire folder feels unpopular as the backend now has no saying on what’s available and what’s not for import. That being said they are mechanisms for the backend to escape this, that could solve the bug pointed out by @pganssle. From top of my head to exclude files the backend could:

install an import hook that disables discovering excluded packages,
use symlinks to create a proxy tree, that does not link in the excluded folder/files.

So my proposal would be, that a PEP-517 develop interface would:

return an absolute path to be injected onto the sys.path at interpreter startup (this is what setup.py develop does at the moment),
return a single python file, a bootstrap script that will be called at the interpreter startup and will allow the backend to change the interpreter setup enough so it works for any advanced use cases (handle exclude paths, merge files on import, etc).

This way backends (e.g. flit) that do not provide advanced features can just use the first part. While backends supporting more exoteric features can take advantage of the bootstrap script to make their use case work. Thoughts?

pganssle · May 5, 2020, 11:57am

I do not know if anyone has complained about this, but it would not be surprising if they hadn’t because this is something you’d only start to notice if you were using a src/ layout. Historically, most people have been using a “source-in-root” layout and work in the repo root, which puts the entire source tree on the path anyway.

For people in that situation, I believe the most common use case for editable installs is to expose the entry points (in which case new entry points won’t show up anyway).

The issue I’m describing is actually one of the main motivating factors behind the move to the src/ layout, which makes it much easier to avoid pitfalls that work from the repo root but won’t work in a real install. This is not a theoretical problem, I’ve found multiple real-life issues in deployed software with this (including in dateutil and pytype).

It’s also not unusual to see editable installs being used even in testing environments, and people frequently use -e when installing git repositories (though I’m not quite sure why this is), so there are many common use cases where this is likely to cause issues in production software when there’s a mismatch between the two install modes.

To be clear, it’s not specific to exclude, it’s specific to anything where there isn’t a 1-1 mapping between folders ↔ packages. setuptools.find_packages has include and exclude, but also you can specify an arbitrary list of files instead of using find_packages at all. Even without this, other backends may not package up the .pyc files that live in the local directory (though this is likely to be less of a concern, practically speaking).

If the front-ends aren’t given the required information (i.e. “this is all the stuff that needs to be installed”), then it’s impossible for a front-end to realistically solve this problem, which also means that it will not be possible for end users to decide on the behavior they want.

I think you’re right that if we’re forced to just return a list of folders, the best solution is to go with an import hook or a proxy tree, but that of course means that the decision as to whether to retain the old behavior by default for backwards compatibility reasons is out of the hands of pip. New-style editable install builds hitting setuptools will simply always return something with the new behavior.

I’ll also note that I think doing this buys us nothing, because the only argument against the “virtual wheel” approach is that it’s hard to separate “get a list of stuff to install” from “install the package” in distutils right now (hard enough that I wasn’t able to do it in a few somewhat distracted hours; I don’t think it’s intractable in any way). If we use these approaches to solve the issue, setuptools will still be blocked on getting a list of everything that needs to be proxied.

dholth · May 5, 2020, 12:15pm

As an aside easy_install is able to add all the paths in a multi-line .pth file instead of many single line ones

bernatgabor · May 5, 2020, 12:38pm

All that the front end needs to know is what files need to be installed within the site packages folder. In my proposal this would be the pth file and the boostrap scripts. It does not need to know what is not installed, aka the files inside your source tree, the actual modules. All those being available to import is a side effect of installing the pth and the bootstrap script; that makes those available.

Well, this is setuptools problem, not develop install modes in general. Just because something is hard to achieve in setuptools/distutils should not block progress for other tools that don’t suffer this issue (flit, poetry). Setuptools in the meantime can provide the same functionality as it had now. It’s buggy but works. And add the correct functionality later on, once it figures out the details within.

This makes uninstall kinda hard as the RECORD structure of wheel does not allow specifying section of files. I prefer having separate pth files for this reason as @pf_moore proposed earlier.

pganssle · May 5, 2020, 1:09pm

Sure, but I am not comfortable locking in to a buggy implementation when there’s every possibility that when someone has the time to take a crack at a non-buggy implementation we’ll find there were minor things we could have changed that would have made things much easier.

That is why I’ve said we shouldn’t standardize before we have a setuptools proof of concept, because if setuptools doesn’t support this, it creates a ton of additional pain, and setuptools support is the hardest thing to do that’s in the critical path.

bernatgabor · May 5, 2020, 1:17pm

There’s been a year given for setuptools to do this. The fact that no one managed or stepped up to do this yet I say is reason enough to not wait on setuptools anymore. There’s no real guarantee that if we stick to this point we won’t wait another year, and then another. We’re also pushing on other tools like flit/poetry the pain caused by setuptools (distutils is effectively deprecated at this point, so I would not consider it for this question). IMHO the pth + bootstrap script is flexible enough to allow any crazy use cases to be implemented by the backend. And is explicit enough for the front end to be very precise on what it needs to do, and only expose files he cares about: the files that need to be injected into platlib/purelib folders. Just to reiterate considering that the source files are not installed at all within the purelib/platlib folders there’s no reason to put them into this virtual wheel. They’re not installed, they’re just available as a side effect of other files installed (pth, etc).

dholth · May 5, 2020, 1:46pm

In setup.py develop, C extensions work by doing an in-place build, which means that the compiled artifacts go under e.g. src/beaglevote/_beagleaccel.so (under egg_base). Otherwise the artifacts might just go into build/.

Setuptools could create and return a path to a tree of symlinks in build/ that excluded the tests. Could it move the tests out of the main setuptools/ package? These would be able to work under the current setup.py develop add-a-.pth file strategy and wouldn’t require extra communication with pip.

pf_moore · May 5, 2020, 2:22pm

@uranusjr pointed out to me today in a separate conversation that the case he’s seen this is with package_data, where an editable install sees the package data, but it’s missing from the manifest, so the final wheel isn’t what got tested. That seems like a much more common scenario to me (package_data is in my experience more common than excludes). He also pointed out that he stopped using editable installs because of this - so I can see that the frequency of this issue might be masked by people just avoiding it.

Having said that, I still think that we don’t have to solve this bug right now, we just need to implement a mechanism that doesn’t stop us from fixing it.

I think you’re being optimistic if you expect front ends to give users a choice. I would certainly expect pip to just choose an implementation strategy and use it. Or did I misunderstand you?

This thread is not about the hooks so it’s not a matter if backends being forced to do anything. What matters right now is how we decide to implement a mechanism for mapping part of a source tree into Python’s import machinery. I see a few suggestions here:

A set of symlinks in site-packages, pointing to individual files in the source.
A .pth file in site-packages, pointing to a directory in the source that has a symlink tree set up.
A .pth file in site-packages pointing to part of the source tree, and an import hook that excludes unwanted parts of that source tree (I’m not 100% clear how that hook gets activated - to be decided).

I’m discounting a plain .pth file as (a) it’s a subcategory of (3) and (b) it doesn’t by itself address the “package data/exclude” issue.

Note that this says nothing about backends communicating with frontends. All I care about in this thread is how we expose a set of files on sys.path.

Again, note that the question for here is “how do we lay stuff out in site-packages to point to files held in an external source tree?” We’ve solved the “don’t allow a buggy implementation” issue by conceding that just pointing to a single directory isn’t acceptable. We now need to agree how a solution that points at individual files would look.

Please let’s not get bogged down in the question of who does what work to implement that solution at this stage. This is a shared issue, and I’m trying very hard not to get sucked into debating what’s hard for pip and what’s hard for setuptools.

Whoa. OK, I just got sucked in. If you’re going to make that assertion, please provide a robust implementation of a symlink farm in pip, that works on Windows and on Python 3.6. I’m currently being very tolerant of suggestions that a .pth based solution is not the only option, precisely because I want to focus on discussing options and designs rather than implementations. But right now I’ll point out that I believe that any solution involving symlinks is flat-out impossible as a general resolution, because os.symlink() needs admin rights in Python 3.7 on Windows (even if Windows itself allows symlink creation without admin).

Please - we’re once again focusing on the frontend/backend separation. The purpose of this thread was to discuss how we make the imports work, not about hooks or who does what part of the process. We’ve already got way too many threads covering that.

pf_moore · May 5, 2020, 2:29pm

Let’s take front ends and back ends out of the equation, for a while. Here’s a challenge for someone.

Provide an implementation (any implementation you like!) of a program that takes a directory laid out as follows:

src
└── foo
    ├── __init__.py
    └── bar
        └── __init__.py

and places one or more files into a location on sys.path, resulting in import foo working, but import foo.bar failing. Editing foo\__init__.py should not need any changes to what’s in site-packages to be visible to Python.

The program needs to support Windows, Linux and Mac, and work on Python 3.6+ That’s pip’s supported platforms as of early 2021. If you want support older versions than that, feel free to add Python 2.7 and 3.5

Note that the remit here doesn’t involve any builds, or any backends or frontends. It’s purely about how to implement an “editable install”.

Once we have a proof of concept for how to actually implement this, we can debate whether the semantics it provides are what we expect users to want (I’m personally still bothered by the idea that adding foo\main.py as a result of refactoring some code out of __init__.py would break the installed version, but that’s a much more complex discussion, so let’s ignore it for now).

If multiple people want to propose different solutions, so much the better. But let’s work out what files we’re trying to put into the target before we carry on the argument about who tells who to do what…

pradyunsg · May 5, 2020, 4:21pm

Off topic, but I’m super excited about this.

bernatgabor · May 5, 2020, 4:46pm

Here you go: https://github.com/gaborbernat/pkg-include-exclude-poc

Running bootstrap_ed.py does the demonstration (python3.4+, though we can provide a similar implementation for Python 2.7 too via PEP-302).

What we would need is https://github.com/gaborbernat/pkg-include-exclude-poc/blob/master/bootstrap_ed.py#L1-L29 this section to run at interpreter startup (e.g. pth triggered).

https://github.com/gaborbernat/pkg-include-exclude-poc/blob/master/bootstrap_ed.py#L5 adds the working directory to the sys.path, this can be done via a pth file too though.

PS. Note how the implementation know very little about paths. It only know the modules to inlcude/exclude. I believe this would mean setuptools does not need to construct the list of files needed, so could use the modules names the user already passes in. Also the soluton allows similar handling of resources via importlib.resources.

pf_moore · May 5, 2020, 5:28pm

That’s excellent, thanks! So just to be 100% explicit, if the only thing I could do were to put some files into site-packages (the use case we’re looking at) I’d need to put this support code there somehow, and a .pth file that added the target directory and ran this file?

Ultimately, that means that a build frontend would have a small runtime support module that would need to be installed to provide editable support, plus a .pth file. Sounds reasonable to me.

Let’s see what the people with particular use cases think - would this do what they need? (Also, if someone wants to propose an alternative mechanism, feel free to do so!)

bernatgabor · May 5, 2020, 5:43pm

Basically. Also the support files are controlled by the backend allowing more advanced use cases if one needs it.

dholth · May 5, 2020, 5:56pm

You may have found a use for the strange first-line-of-.pth-can-contain-code feature.

I propose the alternative mechanism of not solving the problem since this problem doesn’t bother me very much. But this is a solution.

One use case was to avoid importing https://github.com/pypa/setuptools/tree/master/setuptools/tests on accident. Would this also prevent the tests from being imported during testing?

sbidoul · May 5, 2020, 5:59pm

Quoting myself from the other thread:

I can do that trivially with a symlink. Is that feasible with such kind of bootstrap code?

bernatgabor · May 5, 2020, 6:06pm

Already use similar mechanisms for virtualenv, so if you’re using virtualenv you’re also already using this. So far no one reported any issues with it. @dholth note with this you can also solve your use case too. So I consider this the more generic solution.

The backend controls the import logic. So assuming it allows you to install not at root but deeper you can more likely do it with a slight alternation of what happens inside the file finder.

pf_moore · May 5, 2020, 6:30pm

If you’re talking about the installer (backend or frontend) setting up such a symlink, I remain unclear how we expect to use symlinks in a solution that has to be portable to systems that don’t support symlinks. (I’m 100% OK with supporting symlinks that the developer has added in the sources, but I don’t see how we can expect any core editable install functionality to be dependent on symlinks).

dholth · May 5, 2020, 6:58pm

In the past people towards the core-dev end of the spectrum have complained about .pth putting paths at the beginning instead of at the end of sys.path. Which you prefer depends on your perspective.