Third try on editable installs

pganssle · April 24, 2020, 6:41pm

What does “a ‘simple’ proposal” mean in this case? The one outlined in my link, or one that does not meet all the requirements I mentioned?

You are welcome to do what you want, but I don’t want you to feel that I’ve misled you into thinking that I’m suggesting that all we need is a proof of concept for any proposal. I will definitely not be in favor of any proposal that does not include a mechanism for front-ends to include only the files that would be installed in a non-editable install (e.g. one where only folders are listed or something of that nature).

dholth · April 24, 2020, 7:39pm

Of course I will implement one of my own proposals. This would solve the problem of “no hook for an editable install” without addressing the “editable installs aren’t 100% accurate” problem and without requiring a distutils refactor. Setuptools would be able to stop putting egg-links in site-packages because pip would take care of doing an equivalent job.

Flit’s implementation suggests that both kinds of editable installs can be useful. I would be happy to see a second or an extended hook for the “only the files that would be installed” feature as an option. I don’t think enscons or distutils will easily yield that feature. I understand if you feel that implementing the simpler option would kill the enhanced option.

steve.dower · April 25, 2020, 1:32pm

From my POV: pth is way more predictable and easier to clean up without destroying your source directory (plus you don’t suffer from arbitrary code execution if you don’t put any arbitrary code in there! I generally do editable installs manually with a .pth file and/or special modules with __path__ overrides)

And the way to detect whether symlinks will work on Windows is… call os.symlink and handle the OSError. Hardly worth a library. (Virtualenv is more complex because launching an executable via a symlink isn’t always the same as launching a copy would be, and that’s their primary need. For this, it ought to be fine, if a little more risky when it comes to deletion.)

dholth · April 28, 2020, 4:36am

Prototype-quality code:

In setuptools we add a develop --no-install command that copies the existing develop command without the install bits. It returns { "src_root" : "." } (a relative path from pyproject.toml to where the .pth file should point. An absolute path would also work.)

Without the extra reformatting: https://github.com/pypa/setuptools/compare/master...dholth:redevelop?expand=1

In pip we go ahead and literally re-use the “install unpacked wheel” function. It already called the build backend’s “generate .dist-info metadata” hook. We call the new hook and put a .pth file.

The vendored pep517 is modified to add the new hook; it takes care of turning the relative path from the build backend into an absolute path.

bernatgabor · April 28, 2020, 6:32am

On a second scan this kinda does what we agreed on with the difference, that allows to inject a single folder as is to the interpreter. There’s no way to allow filtering/merging/on the fly building of the files to expose in any way. Before we agree on this being a standard I’d like to address allowing those.

pf_moore · April 30, 2020, 9:41am

Yeah, this seems like a good start. The obvious example of something that needs to inject more than one folder is setuptools itself (which installs both setuptools and pkg_resources).

sbidoul · April 30, 2020, 10:57am

I have a use case where, for reasons, the source layout cannot match the installed layout. Typically in the source I have src/myplugin, and in target I want it to be installed in a specific namespace package, such as mytool/plugins/myplugin. A custom build backend can do that at build time, and it would be nice to have such capability for editable installs too.

dholth · April 30, 2020, 12:23pm

Here’s the commit that added the develop command to setuptools back in July 2005. https://github.com/pypa/setuptools/commit/e5eac13db392f851f15e014a1c20debb22da89b2 . It works about the same today but it also supports 2to3, a compiler that tries to convert Python 2 to Python 3. If you are using 2to3 it would point the .pth file at a build directory and you would have to rebuild to see your changes. pip install -e works by calling setup.py develop.

The prototype hook does the same thing as setuptools’ own develop command and is built with the old develop code. The difference is that pip creates the .pth file and copies the metadata into site-packages (instead of linking to metadata in the source checkout) and there is no .egg-link file. This would be less convenient if you had develop-installed your source into more than one environment and needed the metadata to update.

If this hook was used in setuptools it would return the parent directory of setuptools and pkg_resources. Both would be available after an editable install. Take a look in site-packages at the current develop-installed setuptools.egg-link and setuptools.pth or any other develop-installed package to see where it would point. Note it’s a file that adds a directory to sys.path, not an os symlink.

I’ve probably been using the setup.py develop feature since 2009. I can keep the one or two packages I’m actually developing in a checkout and update them that way. Including in production. Those packages may never be installed non-editable. We just want the dependencies to be installed and for our own package to be on sys.path. In the same way that a Django project may be used without setup.py with the difference that we have one.

If you use the develop command or pip install -e to prepare packages for pypi you can make the mistake of forgetting to test the installed package. You might depend on packages in the root of your checkout like setup.py (one reason a src/ directory is recommended) or other files that might be left out of the install or distribution. I’ve made a couple of broken releases this way, making a second release to fix the problem.

I think this is the motivation for wanting a tree-of-symlinks feature as an option for a new editable install feature. You would still occasionally make mistakes related to the difference between an editable and regular install, but you would be more likely to catch problems with setup.py’s error-prone MANIFEST.in.

This is a different feature than the add a .pth file strategy develop has used. The new feature could be useful if you intend to distribute your package and if you might forget to test the 100% installed package. It is not needed if you just want your checkout to be on sys.path.

The 2to3 support in the current version of the develop command offers a hint. The prototype hook doesn’t have to return the root of the checkout or ./src and pip does not know whether it did. pip could pass a prefer_inplace=False flag to suggest the build system produce a tree of files or symlinks, say, in a ./build/ directory, that gets added to sys.path.

takluyver · May 2, 2020, 1:46pm

I’m happy to try to make a PoC in flit - this would be about the simplest case, so it’s a good starting point to look at the question. But I’m still a bit hazy as to what exactly I should start implementing.

The proposal from last year appears to list every file in the package individually, which seems to be an extra complication for all the practical ways I can think of to implement editable install, where we arrange to add an entire package to sys.path.

As a concrete example, I expect that if I add a new submodule in my package, that’s available in an editable install without needing to re-run an install step. That breaks if each file is individually symlinked into a new directory.

dholth · May 2, 2020, 3:55pm

Please enumerate when traditional setup.py develop falls short. This has been missing from the discussion. Are there some packages with a very strange layout that we should use as an example? When would using a src/ layout to separate setup.py / tests from the main files not be enough?

IMO the next relatively-convenient layer of complexity would be to (optionally?) include the equivalent of package_dir = { package_name : folder } and py_modules = [ … ] which is the one you always forget when you have a bare python file at the root. https://docs.python.org/3/distutils/setupscript.html#listing-whole-packages

How would the develop installer handle namespace packages? If there is only one ‘flit’ you could add a symlink to site-packages. But if someone comes along with a ‘flit.flap’ you would want to avoid having to revise the installation e.g. by putting a real ‘flit’ directory in site-packages and symlinking everything inside both develop-installed ‘flit’ directories to that directory. I would handle that by building a sensible per-develop-install tree somewhere and then putting a .pth file in site-packages.

steve.dower · May 3, 2020, 8:19am

If you want a real life project to try this on (which you probably don’t, at least not this one, but it is a real thing), the Azure SDK for Python and the Azure CLI projects use namespace packages extensively. And development is a pain as a result (you’ll see a mix of approaches due to people “fixing” problems and later realising their fix made it worse).

Built-in namespace packages should work fine with just sys.path additions, except for the times when they break completely. But that’s just how those things work - you (and everyone else) just have to be careful.

dholth · May 3, 2020, 9:51am

https://packaging.python.org/guides/packaging-namespace-packages/#creating-a-namespace-package

Three approaches, this documentation explains it pretty well. In setuptools you declare then in metadata otherwise everyone coordinates what happens or doesn’t happen in __init__.py

There is a forth “approach” which is to avoid them and use _ instead of .

https://packaging.python.org/guides/packaging-namespace-packages/#creating-a-namespace-package

I’m more looking for examples of real life packages that can’t use setup.py develop apart from namespace package fragility.

pganssle · May 3, 2020, 7:47pm

The reason we explicitly chose to list each file independently is that one of the biggest use cases for editable installs is if you want the benefit of a src/ layout where the version of the library that Python sees is what will get installed, but you also want to make changes on the fly without a lot of extra installing.

We decided that a front-end doesn’t need to fix this bug (I personally see it as a huge bug and I think people add new files to their packages rarely compared to how often they want to change something in an existing file and have it updated), but if we don’t give front-ends the full list of files to install, they have no choice to fix the bug.

It’s also not terribly complicated for front-ends who want to implement an install mode that just adds the root directories to the python path. You can use the set of parent directories, or you can try and be more clever about it (like using os.commonpath).

If there are lots of complicated edge cases in finding the root directory but the backend has it easily accessible, I don’t think it’s a big deal to also provide the source roots as metadata in the “virtual wheel”.

pf_moore · May 3, 2020, 8:07pm

Please excuse me not having gone back and found all of this myself - the discussion is very sprawling and hard to follow - but what exactly is the “bug” you’re referring to here? And also, what is it a bug in? The current setup.py develop implementation in setuptools? I hope you’re not suggesting that a newly-added file being visible is a “bug”?

I don’t use editable mode much, myself, but I’m pretty certain that when I do, my expectation is that any changes I make to the source are immediately visible, without needing a “reinstall”. Otherwise what’s the point of an editable install? And yes, that would include adding a new file (think refactoring the code to split out a new submodule). Rare or not, I don’t expect to have to remember complicated rules about what changes don’t need a reinstall.

If that’s hard to implement, then so be it. Behaving how the user expects is the key here.

If people’s expectations of what is reasonable behaviour are this different, then I think we’re not even close to “all we need is a proof of concept implementation”

pganssle · May 3, 2020, 8:28pm

The thread is not that long, I strongly suggest reading it, it would save a lot of effort recapping everything.

No, newly-added files being visible is not a “bug”, but what is a bug is the fact that setup.py develop will expose things not part of the installed set of files. So for example, both dateutil and setuptools have a test module in the package source tree that is excluded from the binary / installed distribution.

If I do pip install -e ., right now from dateutil.test.test_easter works, even though it wouldn’t work had I installed using pip install .

There’s not really any general way to solve this other than saying, “You need to build if something about the build configuration changes”, where “something about the build configuration” includes the list of files that are included in the built distribution. The suggestion that always comes up is “how about some sort of daemon”, but we determined this to be Out of Scope™.

This is already not true when you make changes to something that needs to be built like a compiled extension, or when you need to make any random change to your build configuration — anything about the build metadata, package discovery, etc.

Regardless, as we concluded in the previous thread, the solution was to make it possible for people to build a front-end that considers “added a new file” to be a change to the build configuration. You build a “virtual wheel” that says everything that’s going in the package and let the front-end decide what to do with that.

I personally will lobby hard for at least having an “editable-install-as-installed” for the same reason that I’m a big proponent of the src/ layout and for the same reason I test my packages as installed.

Admittedly, this may be very setuptools-specific. Other backends may not even allow you to configure which files are part of the installed set, and maybe there should be a flag to indicate that it’s safe to just dump the file root onto the path even in “install only built artifacts” mode.

No one said “all we need is a proof of concept implementation”. I said that the current blocker is a proof of concept implementation for setuptools, because if this doesn’t get implemented in setuptools it’s effectively dead in the water. There’s still discussion to be done after we see what the objections might be after the PoC for setuptools and pip (for example, how easy would it be for pip to get the old behavior from the new wheels?).

If you want to re-litigate this entire issue (apparently without reading the original discussion) feel free. This entire line of thinking was discussed and documented both in the original thread and during the in-person 2019 packaging summit and my estimation was that it had a rough consensus as the original direction to go in.

dholth · May 3, 2020, 9:01pm

That’s the circle. This feature of editable installs is either so harmful that it can’t be allowed in a hook, or it’s a thing that has bit me a handful of times in ten years with a simple src layout workaround, as I develop mostly never-pypi-distributed software.

On the contrary setuptools needs this less than any other build system because they are used to having setup.py, but for other systems any hook would be strictly an improvement.

The list of top level packages would be a good middle ground for me since in enscons we really don’t know what will be installed until it is installed. Otherwise we’d send pip the output of os walk to fake it.

EpicWink · May 3, 2020, 11:07pm

I had forgotten about the form of the data sent to the front-end, and had assumed that it would be the list of installed packages and sub-packages. This assumption came out of the fact that setup tools requires this list for the packages parameter.

I can see 3 options:

Add package_dir to path (current), which could mean adding unwanted packages, modules and scripts to path when not using a src/ layout
Install symbolic links to each of the top-level packages specified by the user. This allows adding of new modules and packages without requiring reinstallation, but will include unwanted sub-packages and modules, such as for testing
Install symbolic links for each file to be installed. This would require reinstallation on each new module or package added or removed

I’m guessing the possibility of letting the user (via the front-end) choose (I would say choose only between options 2 and 3) either through a parameter or a different (but similarly named) hook has already been considered, and the argument is just over the default behaviour

dholth · May 4, 2020, 1:07am

Option 4 is to make a “src” style layout by installing symbolic links to each of the top-level packages in a build directory, and to .pth-file link that build directory into site-packages. Could be done by pip or build system.

Trying to work out,

The details of how the installer would build and maintain a tree of symlinks in site-packages or elsewhere,

Whether the .dist-info directory should be copied into site-packages or left in the project and linked with an .egg-link. (Is it still true that .pth doesn’t find *.dist-info on that new path?),

Whether the symlink-per-file approach is feasible. Known to require a distutils refactor for distutils/setuptools.

Last year we’d thrown up some ideas without trying to implement them, perhaps without thinking through all the details. Now I think those old ideas aren’t so practical.

With the “virtual wheel” idea we hadn’t worked out whether you could e.g. build all your files flat, in a single directory, and then tell the installer that each individual file went into a different subdirectory. Could you re-use the same __init__.py in all your subdirectories by repeating it on the left side of the (category, source, destination) tuple?

Suppose you’ve been using develop installs for the last fifteen years and you accidentally do the symlink-per-file install, which happens to break every time you add or remove a file. Spending a few hours wondering why your new file was missing because you forgot to reinstall. We would be inventing another layer of failure modes on top of the current “different than the real install” problem. IMO you would be better off doing a real reinstall each time you save.

I can’t be the only one who uses setup.py develop for packages that don’t care about emulating a “real” install because they will never be put up on pypi. We would never know if there are more of those than the pypi kind, by definition they aren’t published.

My prototype is modeled on the setup.py develop behavior, returns { src_dir } only (egg_base, where the current .egg-link points), has pip copy dist-info into site-packages and pip adds a .pth file to wherever src_dir points. The back end could choose whether src_dir is in-place or not.

I’d like to require that the package “make sense” in that you could install it with a single .pth file. You will find that setuptools’ legacy means people are doing this already.

I’d like to be able to do the add-package-dir-to-path or top-level-packages strategy in an extensible hook. When someone is able to work out a symlink-per-file feature they can add it to the hook. Would updating a hook be so difficult that adding the easy half of the feature means the other half can never be added?

pganssle · May 4, 2020, 1:35am

The only relevant question at this point is whether we should ban front-ends from doing a symlink-per-file style installation. If we’re going to say it’s a supported use case (and I think it should be), then we must go with the “virtual wheel” approach, at a minimum (if there’s other metadata along with that, like package roots, etc, that seems fine). Virtual wheel seems like the simplest possible standard to me, because it’s super easy to understand the responsibilities of everything involved and it doesn’t mandate any specific implementation details — you don’t need to use a .pth file or symlinks, you just need to have a specific behavior.

Uh, speak for yourself? I thought through them and looked into it. They’re very practical. Every build backend knows how to build a wheel. If instead of building a wheel the backend were to make a list of everything that was going to go in the wheel and where it would come from, you have a virtual wheel. It doesn’t introduce any new concepts.

I’m having trouble imagining a situation where someone spends a few hours puzzling over an ImportError instead of like, uninstalling and reinstalling (a common solution if you seem to have a messed up installation), or inspecting site-packages.

That said, importantly this is the kind of thing that fails loudly, whereas the version where you’ve accidentally included stuff that won’t get installed in a non-development install (which is to say that even stuff never deployed to PyPI will suffer from this) will silently fail and often times you won’t detect it until you hit production if your testing uses editable installs for some reason. If you have a missing file it will just fail and tell you exactly why it failed.

I don’t think weighing the UI trade-offs of different kinds of editable installs is necessary at this stage, since that should be up to the front-ends, but I will say that I think there are a lot of ways to make “I have to re-install because there’s a new file added to the installed set” less painful. I’d like to allow front-ends the option to experiment with them.

dholth · May 4, 2020, 2:47am

Setuptools produces a wheel by changing the install scheme to a build directory and then actually doing the install, which is just about the way that setuptools/distutils does anything. Setuptools / distutils isn’t very good at telling you what it is about to do, instead you generally have to do it to find out. For example enscons asks the distutils compiler to build_ext but overrides the last step just to get flags. Now add in custom command classes that don’t know about the ‘remember where the file came from’ refactor. I think it would be hard to get this right.