Specification of editable installation

pganssle · May 10, 2019, 3:47pm

To give an update on the status of this after the Packaging mini-summit and the sprints, we have a fairly simple proposal that @pradyunsg and I are planning to turn into a proof of concept implementation, then a draft PEP.

We will add two functions to the backend interface:

prepare_metadata_for_build_editable(source_directory, metadata_directory, config_settings=None): This is a required function that populates the release metadata (dist-info) files in the provided directory. It will likely always be the same as prepare_metadata_for_build_wheel.
build_editable(source_directory, metadata_directory, config_settings=None): This function will build a “virtual wheel” by preparing all files to be installed and returning a dictionary where keys are locations of files and directories relative to the package root and values are the absolute locations of those files on disk. This does not include the dist-info metadata, which is returned separately in the prepare_metadata_* step.

An example of the “virtual wheel” format, for a package “foo” containing a single __init__.py, build_editable would return something like:

{
    "foo/": "/tmp/demo/src/foo/",
    "foo/__init__.py": "/tmp/demo/src/foo/__init__.py"
}

It would be the front end’s responsibility to make sure that at least the foo module is available to the system. Whether it accomplishes this with .pth files or symlinks or some other mechanism is up to the front-end and not part of the spec.

bernatgabor · May 10, 2019, 4:16pm

My only suggestion would be to keep the editable response extensible, just in case we need in the future. Something like:

{ 
   "version": 1,
   "mapping" : { 
      "foo/": "/tmp/demo/src/foo/",
      "foo/__init__.py": "/tmp/demo/src/foo/__init__.py"
    }
}

Just in case we’ll need to add additional content down the line.

Then again I wonder if we need the mapping, couldn’t the frontend just pick it from the RECORDS.txt from the meta . data?

pganssle · May 10, 2019, 4:26pm

I think the RECORD text file doesn’t include the location on disk, so that’s really the information that needs to be conveyed.

I think you’re right that using a more extensible format would be better.

uranusjr · May 10, 2019, 4:56pm

Where should the mapping be recorded? The dist-info already has a metadata version in metadata.json, so IMO there is no need to introduce a separate version number for the mapping format.

dstufft · May 10, 2019, 5:45pm

I’m still thinking through the implications of this proposal, but from a practical standpoint I don’t think we can make any new functions required since we already have backends that exist without that function. So frontends are going to have to cope in some way with that function not existing (likely in this case by erroring when attempting to do an editable install).

The key thing here is if it’s required we immediately invalidate all existing packages using a PEP 517 backend.

I do think we should strongly suggest that all backends implement it.

njs · May 10, 2019, 6:43pm

We are going to need some kind of RECORD somewhere, to write down which files were actually put into the environment and should be removed when uninstalling. So something that lists any pth files or symlinks, but not the actual source files.

What should that look like? Are installers expected to mutate the .dist-info/RECORD file inside the source tree to record this?

dholth · May 10, 2019, 7:16pm

I’ve thought about the abstract wheel model. If you were reading a wheel it would be something like

for each file in wheel: yield (category name, target path under category, file-like object or bytes of file contents)

e.g. (‘purelib’, ‘wheel/bdist_wheel.py’, open(f’{sourcedir}/wheel/bdist_wheel.py’, ‘r’))

If you were working with one of these you might open it with the dist name and version, and it could derive the full path for (‘metadata’, ‘zip-safe’, b’’) for example. How to refer to the metadata and in what order, and how to represent empty directories would be questions. ‘copy zipfile where applicable’ would be part of the answer. It would probably automatically do RECORD for you.

Does RECORD perform the same function for an editable install? It would just include e.g. the .pth file and not the sources I’m editing which I’d rather keep on uninstall?

pganssle · May 10, 2019, 7:25pm

To be clear, both of these functions are required for editable install support, not for PEP 517 support. The idea is that this would be a new PEP. We will not retroactively break anything because currently no build backends have editable install support, and similarly they won’t have it until they implement these functions.

I don’t think it needs to be specified in the PEP what front-ends do if this is missing, but I think the sane thing to do is to detect the existence of such a function and, if it is missing, raise an exception indicating that the backend does not support editable installs.

I think this is something we’ll get a better sense of while we’re building the PoC, but yeah I think either mutating the RECORD file as part of the installation or having the front-end write a new file like RECORD-EDITABLE would be fine solutions. In either case I think the installer has to write it.

pf_moore · May 10, 2019, 7:46pm

I think it’s important that while developing the PEP we think through the transition scenarios, and document acceptable options. The reason PEP 517 got into such a mess over editable installs was precisely because there was no documented description of what installers were allowed to do when faced with a request for an editable install and a backend that didn’t support them (i.e. all of them, at the moment ).

The description doesn’t have to be laborious, but something simple like “installers MAY fail with an error if asked to install a project that uses a PEP 517 backend in editable mode, and the backend doesn’t provide PEP xxx editable install hooks” would (a) set expectations for users and (b) ensure that we thought about these things when designing the protocol. (Note: I’m not particularly advocating for that exact behaviour to be specified - that’s just an example for the purposes of discussion here).

We could, of course, say that it’s a decision for the individual installers, but as we’ve seen, what that does in practice is leave things in a position where the installer makes a decision, and then we get the feedback and objections and the decision needs to be revisited post-release. That’s not a good experience for users or for installer developers.

dholth · May 10, 2019, 8:02pm

I’m surprised you need something more than ‘list of directories to add to PYTHONPATH’. What will pip do with the extra information?

pganssle · May 12, 2019, 2:56pm

Preferably, front-ends would expose the list of files that comprise the package and only the list of files that comprise the package. It is not uncommon for there to be things in a directory other than that which is part of the package. See the earlier discussion in this thread.

pradyunsg · May 12, 2019, 8:44pm

@pganssle I think we should note the design down into a PEP, since PEPs are basically better than a discourse comment for discussion on a design.

Anyone willing to write such a PEP and any core dev willing to be a sponsor for it?

dholth · May 12, 2019, 10:35pm

My question about this format is whether the front end is allowed to put all the Python modules in random places:

“foo/”: “/tmp/abc”
“foo/init.py”: “/tmp/xyz/xyzzy.py”
“foo/bar.py”: “/tmp/xyz/ajkfl.py”

Or for example it might put everything in a flat directory with random names for all the editable Python files. In that case pip would be responsible for assembling a coherent package somewhere using symlinks?

I would ask the front end to behave by making sure the editable files “make sense” underneath some base path, and always provide the category name and base folder for that category along with the included files. A typical set of paths used to do install the foobar package looks like:

{'data': '/path',
 'headers': '/path/bin/../include/site/python3.7/foobar',
 'platlib': '/path/lib/python3.7/site-packages',
 'purelib': '/path/lib/python3.7/site-packages',
 'scripts': '/path/bin'}

The front end would just have to say "purelib" is in "/tmp/demo/src/" and includes "foo/__init__.py". Then the editable installer concatenates those paths to do whatever it has to do.

pf_moore · May 13, 2019, 8:11am

My immediate thought is that this would be a quality of implementation detail for front ends, not something the spec needs to mandate. But maybe some sort of consumer needs a guarantee in order to do some sort of introspection? (Something like pkg_resources comes to mind). On the other hand, such tools are already dealing with .pth based editable installs, so they are in effect used to dealing with implementation-defined layouts.

I’d be inclined to leave things unspecified, at least for now, to give frontends the chance to experiment with options. If a clear “best approach” emerges for an editable install layout, we can consider standardising that later.

pganssle · May 13, 2019, 1:42pm

My understanding of the agreement was that we would create the proof of concept implementation so that we have a better sense for what’s involved, then write a PEP. I was planning on writing the PEP.

pganssle · May 13, 2019, 1:44pm

The front-end is not deciding where the modules are, that’s the back-end’s job. The format you quoted is the way that the back-end communicates the mapping between the files as they would appear in a package, and the location of those files on disk. It is not a mechanism for the back-end to tell the front-end where the files should be installed.

dholth · May 13, 2019, 2:08pm

I do understand that what flit might say to pip is that the thing we want to be importable as “foo/bar.py” is on disk as “/usr/src/foo/bar.py”, but it does not say where foo/bar.py should be installed. The more complex situation only says “$purelib is here in the source tree; when installed I expect a file to be under $purelib/foo/bar.py” but it is still pip’s job to figure out where $purelib belongs when installed. This is the compromise: the abstract categories purelib, platlib, headers, … can be moved anywhere, but the files inside those categories can’t be rearranged.

My suggestion is that if the source is badly behaved, e.g. putting all the Python code in a single flat folder while expecting them to be installed as a directory tree, then it will not be possible for the installer to build an editable install without creating a well-behaved set of symlinks to the editable source code, and the setuptools strategy of doing editable installs with a .pth file will no longer work. well-behaved = possible to use the code by adding a directory to $PYTHONPATH, or all importable code is relative to one or a couple of paths in the source tree.

As a bonus if the editable install hook produces the entire wheel data model, then you could do the real install from the same data structure without making an intermediate .whl archive.

sumanah · June 23, 2019, 9:57pm

Thanks to @btskinn and @crwilcox for note-taking during that minisummit. I’ve moved the notes about the future of editable installs from the notes Google Doc to the GitHub issue “editable mode is not supported” for better searchability.

@techalchemy @pradyunsg @pganssle as I understand it, you three are working on some combination of design discussion, PEP, and/or proof of concept, although it may not yet be certain who, which, in what sequence, and when. Maybe the 3 of you could work that out by email and let us know how/when we can help (testing, reading drafts, etc.)?

sumanah · January 8, 2020, 8:19pm

@techalchemy @pganssle @pradyunsg could you give us an update on this?

pganssle · January 9, 2020, 3:56pm

No change in status so far. Currently we have a generally good “big picture” idea of what this should look like:

backend produces a a pseudo-wheel consisting of wheel metadata plus a mapping between the locations of the files as they should exist in the package and the locations that they exist on disk
front-end uses whatever mechanism it deems proper to expose those files to Python in such a way that they will update as the files change on disk

We need a proof of concept for that for both setuptools and pip - I firmly believe that any effort at standardizing before we have a proof of concept would be wasted effort, since the standards will likely need to take into account factors that come up during the building and testing phases and if it’s so onerous to build a proof of concept before the standard is constructed, it probably won’t get implemented after the standard is constructed anyway, so the standard is dead in the water anyway.

I am personally super swamped and have less OSS time than ever these days, and what time I do have is going into getting a viable time zone proposal done in time for Python 3.9. I would be happy to have someone else work on the proof of concept for setuptools.

When I did take a crack at it before, the major issue was that distutils (and by extension setuptools) doesn’t have a clean separation between “figure out all the stuff that needs to go into the package” and “put the stuff into the package”, it just sort of assembles the package as it goes. It was a bigger job than I could easily tackle in an afternoon to convert it over to doing it in two steps. If anyone has time to work on this, I can probably fit in some time for advice and likely for review as well. Otherwise I probably won’t have time until at least after PyCon.