Specification of editable installation

uranusjr · April 27, 2019, 6:29pm

This focuses on the long-term specification since the desicussion on solutions before it is covered in Pip 19.1 and installing in editable mode with pyproject.toml.

Editable install (before 19.1, at least) works by invoking setup.py develop. Here is how (Setuptools) make it work, as I understand it (also briefly mentioned here):

Creates an .egg-info in near vicinity of the package (instead of in the target site-packages).
Adds an .egg-link in the target site-packages, containing paths to look for the distrubiton’s .egg-info.
The source’s parent directory is added to a .pth file maintained by Setuptools (easy-install.pth). The listed directories are injected into sys.path so the package is importable at runtime.

There are some problems with this implementation, but I think people agree it is “good enough” as a baseline of expectation. Most of the theoretical problems can be worked around in practice without much hassle as well. IMO we should try to standardise the current behaviour (and tackling only the most obvious improvements in the process), and work from there.

What needs to be done to implement editable install with build isolation? From the top of my head:

Should we continue to use .egg-info? I believe it is totally possible to switch to .dist-info instead. This would minimise work for backends to add support (they only need to implement one db format)
Is the egg-link mechanism required? Can we just put the .dist-info into site-packages instead?
How should pip inject the module for import? .pth is not very inspiring, but I don’t think there is a better way unless we propose a new mechanism to the core interpreter.

pganssle · April 27, 2019, 6:53pm

I think when developing a standard, we should probably specify as much as is necessary and nothing more, so I think we need to cast this less as a “how do we lock in the behavior” and more as a “what is the responsibility of the front-end, and what is the responsibility of the back-end?”

So the immediate questions I have are:

What guarantees is the front-end providing to the back-end about where the build will take place?
What does the final build artifact produced by the back-end look like?
Who handles the entry points?
Who controls the mechanism by which the project is made visible to Python?

I think #1 and #2 are pretty clear candidates for standardization, and the standard may look like an .egg-link or some sort of wheel full of symbolic links or something. For #4, I think that’s actually pretty firmly the responsibility of the front-end, which means that we can likely leave the mechanism out of the standard (which is to say that setuptools doesn’t need to generate a .pth file, and pip or whoever can decide if they want to use a .pth file or some other mechanism).

For #3, I think some parts need to be standardized (e.g. do the entry points get passed to the front-end as metadata and the front-end generates the scripts, or do the scripts get built by the backend and included in the build artifact), and others (like how to get them on the $PATH and $PYTHONPATH do not.

uranusjr · April 27, 2019, 7:34pm

Personally I believe the editable install should resemble a “regular” one, to make life easier for a developer (otherwise they discover packaging problems too late—I hit quite several bumps when I started developing Python packages). So for #3, I think the entry points should ideally be generated by the front end. That is how regular installation works. More generally, the role of front- and back-ends in the editable install should be as close to PEP 517 as reasonably possible; the front-end provides a directory for the back-end to build things in (probably metadata-only since the actual source is editable), and takes over from there.

sbidoul · April 28, 2019, 12:47pm

Thanks for opening this thread.

IMHO yes

For these does it then sound reasonable to ask the back-end to create a partial wheel containing (at least) .dist-info metadata? The front-end can then install it in a pretty standard way.

That part sounds the more complex to specify. If it’s the front-end that is responsible for it:

how does the back-end communicates what to make visible?
where should these things be inserted? (one can imagine use cases where an editable install provides only part of a namespace, while not even providing the namespace root itself; or cases where the back-end provides a mix of namespace packages, modules and binaries)
how does the front-end achieves the insertion (symlink, .pth, something else)?

So for #4 we either need a very open-ended standard to cope for many different use cases, or somehow leave the responsibility entirely to back-ends. Or some middle ground: have the front-end cope with simple/obvious use cases, and have some sort of post-install hook to the back-end for complex cases.

Some additional considerations:

For use cases such as pip freeze to continue working with editable installs, the front-end will need to record the local directory that was originally requested for --editable (pip freeze currently discovers it through .egg-link).
We must not forget the uninstall part.
If we want to allow -e to continue working with setuptools as it does today, some sort of escape mechanism must be provided, as that mode would not create .dist-info at all (the post-install hook could do the job).

pganssle · April 28, 2019, 3:32pm

Well in my original response I was suggesting that none of the specifics about #4 need to be specified at all. I don’t think I’ll be able to understand the edge cases without starting in on an actual implementation for this, but my thinking is that the interface would look like this:

The build backend exposes an API such as build_editable, which returns package metadata and some kind of manifest that maps the contents of a virtual package to locations on disk.
When a frontend makes an editable installation, it will make that package visible to Python as if the package described in the manifest were installed as a normal package, in such a way that editing the mapped locations would update the installed package.

I think we can probably say:

Package metadata (version, etc) is generated at build time, and will not be updated dynamically.
New modules added to the package will not be automatically installed without a rebuild.
Entry point scripts will be generated at build time and will not be updated dynamically (though to the extent that they import from the package, their behavior will change dynamically).
Extension modules are generated at build time, and need not be updated dynamically.

We need to standardize in some detail the thing that build_editable returns, but I’m not sure we need to standardize what the frontend does with it. Whether it is accomplished with path manipulation, hard or symbolic links, etc, can probably be left up to the front-end.

Can you clarify what you mean by this?

uranusjr · April 28, 2019, 3:52pm

So say I have a top-level package foo with foo/__init__.py, adding foo/bar.py does not make foo.bar available? Some users would be unhappy with this (since it works with the current implementation).

pganssle · April 28, 2019, 4:45pm

Hmm… I did not realize this was the case, but it actually seems to be an implementation detail that leads to a pretty serious bug, because it seems that it is basically just adding the foo/ directory to the python path, which means that it is not correctly excluding packages, so if I have a src like this:

src
└── foo
    ├── __init__.py
    └── bar
        └── __init__.py

And my setup looks like this:

setup(
    name="foo",
    version="0.0.1",
    package_dir={"": "src"},
    packages=find_packages(where="src", exclude=["foo.bar"]),
)

Then when I do pip install ., python -c "import foo.bar" correctly throws an ImportError, but after pip install -e ., I am able to successfully import foo.bar. I certainly wouldn’t want the standard to prevent any front-ends from fixing this bug, though I don’t want to discount people who need this as part of their workflow.

Perhaps we can have some MAY language, like “frontends may add top-level directories to the python path, even if the result would expose modules that would not be installed by a regular installation process.” Possibly with some moderating should language to urge front-ends to make “editable installs require rebuild to add new modules” the default choice.

cjerdonek · April 28, 2019, 6:12pm

Couldn’t both of these be satisfied with an auto-build functionality that watches for changes to the appropriate directories?

pganssle · April 28, 2019, 6:55pm

In a world where new modules are only added after a re-build, auto-building doesn’t get you the same behavior, because right now even without re-building modules are included in an editable install that are not included in a normal install. Plus auto-build would change the current behavior pretty dramatically, since it would do things like re-build extension modules and update the metadata, both of which would probably have a bunch of other side-effects.

Anyway, I think we should put aside the possibility of an auto-build daemon at least for the moment, because it would be a pretty big expansion of scope, and would be very difficult to do it in a reliable way.

sbidoul · April 28, 2019, 8:21pm

Given the trend in Pip 19.1 and installing in editable mode with pyproject.toml, I’m asking myself how, when we flip the switch of the new spec, -e can continue implying “setup.py develop” for the setuptools backend, while letting projects opt-in for the new spec behaviour when they decide to.

sbidoul · April 28, 2019, 8:44pm

The exclude use case is interesting. It illustrate the difficulty of standardizing the back-end/front-end interface when it comes to specifying what to make available.

Leaving to back-ends the responsibility (or the possibility) to make the code visible in the target python environment could open the door to specialized back-ends providing “editable” features adapted to different development workflows, including some that we cannot foresee today. It would also reduce the complexity of the editable spec, as well as simplify front-end implementation.

pganssle · April 28, 2019, 8:54pm

It should not imply that, and that’s not really the goal of the standard, anyway. pip is not the only front-end to be concerned with, and setuptools is not the only backend.

Tools are not forbidden from being both backends and frontends, but I think it’s very clearly not the job of the backend to make the package visible to the system, since that is pretty much the definition of installation.

I also don’t think it would reduce the complexity of the editable spec for backends to usurp some of the front-end’s role, that sounds more, not less complicated. At the end of the day, I think we will need to say very little about how the frontend makes the package visible to the system, just the fact that it’s the frontend’s responsibility, and some general properties that an installed package should have (importable, etc).

sbidoul · April 28, 2019, 9:18pm

I would also prefer to keep the pure separation of concerns between back-ends that build and front-ends that install for editable too. From what I’ve read so far, I’m just having doubts there is an exhaustive enough knowledge of the editable use cases to specify such a pure interface today.

uranusjr · April 29, 2019, 3:44am

I mentioned the use case when it popped up in my head, but it does seem like a bug more than a feature with the way you put it, especially since I said in my opening post the editable install should resemble a regular one if possible

It may be possible to move to require a rebuild to add new modules in the new standard. Most people (I believe) use editable with the expectation to convert the package to non-editable at some point. I think people will understand if we communicate this clearly in the spec (PEP?) process that it was never the intention to allow it, and the new behaviour may even help prevent bugs in the conversion. Some might still get annoyed, but cue mandatory xkcd

cjerdonek · April 29, 2019, 5:15am

I think this is one of the things we should discuss and make a decision on then, along the lines @uranusjr said (making editable more similar to the regular case). With a new spec, would the goal be to standardize and replicate what we already have, or to add enough to support doing what we think is “right” (even if not done by the existing pip)?

The idea I had in mind was more of an incremental rebuild (only rebuild what is necessary). @njs mentioned a similar idea in a later comment in the other topic. I’ll quote from part of that comment here:

bernatgabor · April 29, 2019, 2:30pm

I would keep editable installs along the line here’s the python provisioned, here you generate output, do what you want. Maybe only mandate the meta data generation, but make that extendable too.

Here’s an use case that would be solved with a different kind of editable install: imagine you want to break up a big module in multiple small modules (easier to develop, and separate). A potential solution would be to make the builder concatenate together packages into modules.

An editable install for a such module could be just installing a new importer (via a pth file), and the new importer doing the concatenation on the fly in memory at import time. This could be extended for meta-files too, the importer could check do I need to re-compile some files? If yes call it, and then allow the import to pass through. This would allow backends to do on-demand lazy builds without a third party tool that needs to keep checking stuff in the background.

xafer · April 29, 2019, 8:13pm

An other requirement that I did not see mentioned:
the frontend needs to be able to uninstall the editable installation without the need of the backend (ideally “pip uninstall” should not require to download anything).

This means that whatever the backend does, the frontend should be able to undo it.

The RECORD file provides a way for the frontend to know what files to delete, but for the case of setuptools current mechanism and its use of easy-install.pth, this is currently hardcoded in pip uninstall code.

jriddy · May 3, 2019, 3:51pm

Could we just have the frontend call the metadata hook before uninstall? I agree that pip uninstall shouldn’t download anything, but it don’t see why it shouldn’t be able to ask the backend to update the metadata.

I guess this would require that pip cache a wheel of the backend system to do this in isolation, so that’s a no go either. But it looks like having to run the editable install every time you add a pkg/module is going to be a requirement to support the uninstall use case. I suppose that’s unavoidable.

pganssle · May 3, 2019, 7:14pm

I think this is another good reason to use the separation I proposed above: The backend should not be modifying the system at all. It should create some files and/or return some data saying what the frontend should be installing, and then the frontend should do the install.

dholth · May 7, 2019, 7:45pm

Here’s what enscons does https://bitbucket.org/dholth/enscons/src/d4f3912829f31db8cf6c5fd5aeb82915a03be95b/enscons/setup.py#lines-4

It looks like the newest wheel removed wheel.paths though, so this is no longer working.