Standardising editable mode installs (runtime layout - not hooks!)

sbidoul · May 5, 2020, 7:31pm

@pf_moore I guess what I’m after is freedom for custom back-ends to do what they find best to make the code importable. But I realize the freedom is there via the escape hatch that executable .pth files are.

So as long as executable .pth files don’t get deprecated, I’m now convinced that mechanism is sufficient. The back-end is in full control, and all the front-end has to do is install a regular wheel. Sounds nice and clean to me.

brettcannon · May 5, 2020, 7:32pm

To be upfront, the ability to execute arbitrary code in a .pth file may not last forever. It’s a massive security vulnerability and if I get my way then .pth files will be at least stripped down to only adding paths to sys.path and imports at startup.

steve.dower · May 5, 2020, 7:33pm

I can code this up fairly easily, have done before, though I won’t have time tonight and who knows where the discussion will be at by tomorrow.

Essentially, I would install the top level folder and a custom __init__.py (even for single .py file packages) that sets its own __path__ to include the other modules (or in a more complex scenario, adds an import hook to handle the lookup) and then exec()'s the real __init__.py in its namespace.

One file, fully self contained, provided by the backend and installed just like any other wheel would be.

That said, installing both the .pth file and the optional filtering hook is just a little simpler, though it goes against the likely long-term plan to deprecate/disable arbitrary imports in pth files.

sbidoul · May 5, 2020, 7:41pm

If executable .pth are on the way out, then I’d put the optional develop hook proposal back on the table as I sketched there, with symlinks in wheel as second choice. Oops, sorry, back to the interface design.

I really think we must give freedom to back-end developers (generic or custom alike) to do what they find best for their users to make the code importable.

bernatgabor · May 5, 2020, 8:32pm

Will there be another mechanism provided that allows registering code to be run at startup? I know a lot of system are realying on this at the moment. Also technically all this can be done by using sitecustomize file too, the pth file is just an implementation detail at the end of day. Any mechanism that allows registering startup code for the interpreter will do just great. @steve.dower way also does the same the way I understand it, just in a different way (not pth, but something else).

IMHO the general idea for develop installs would be to allow the backend to register some mechanism at startup of the interpreter that will materialize the modules on import (somehow) from the local file system. This might be done by the backend itself, or some shim code it provides. The simplest use case is when delegating the job to a custom file finder that knows where the source lives. The frontend then is responsible for registering this mechanism (simplest solution IMHO at the moment is via a pth file or sitecustomize.py). I consider how it achieves these registers an implementation detail, something we might change as the core python offers newer/safer mechanisms for this.

steve.dower · May 5, 2020, 8:39pm

This is the missing piece. But as it’d be a runtime feature, don’t expect it before 3.10 at the earliest (assuming someone is offended enough by arbitrary .pth imports to come up with an approach that satisfies those of us who don’t mind them so much).

bernatgabor · May 5, 2020, 9:25pm

No worries. I think in the meantime we could go ahead with the editable install PEP as don’t hard code pth, but rather go with more opaque, register interpreter startup/initialization code for the backend. We can start out with pth now and migrate to whatever better we come up later at the front-end implementation side

bernatgabor · May 6, 2020, 12:24pm

In the spirit of striking the iron while hot, @pf_moore how can we move ahead on this? I feel like we addressed the concerns raised by people participating in this post until now.

pf_moore · May 6, 2020, 12:52pm

@bernatgabor agreed. I’m now comfortable that we can implement a solution in the front end that handles the backend providing a list of files to expose. I’m probably going to flesh out the suggestions here into a library for creating such an editable install, with the intention that it can be used in pip and any other tools that want to create one. It may well be worth backing this library up with a standard, but IMO that can be deferred until we see what the resulting installation looks like. (The layout may be something we can incorporate into the overall “editable install hooks” PEP).

Which then leads us back to the “how do we specify editable hooks” question. I’m now OK with @pganssle’s proposal, at least as far as implementing the frontend side of it is concerned. I don’t know whether the backend developers want to debate their side of things any further - @dholth has a proof of concept for setuptools, but it isn’t for the full proposal if I understand things correctly.

So next steps:

I consider this topic to have achieved its goal of defining a way of handling editable installs (I’m inclined to go with @steve.dower’s approach, because of the risk in relying on .pth files, but we know .pth files will work).
Backend developers need to do something similar, to either confirm that they can make the scheme in the proposal work, or settle on an alternative scheme (AIUI, @dholth’s prototype implements something different than the proposal, so requires either an agreement to change the proposal, or enhancing to support it).
Someone needs to actually write a PEP that gives us something to standardise.

bernatgabor · May 6, 2020, 1:00pm

I’m a bit puzzled. These two statements contradict each other as I understand. Is the expectation in your world now that:

the backend will enumerate files that must be mounted and where under the site-packages (aka practically create a virtual wheel) - in this case, the frontend will need to handle how to make that available - this would also have the downside that new files are not automatically added, and exclusion is harder too;
or what I was suggesting does not do this, instead just exposes enough to insert a path on sys.path and then register some interpreter initialization script that ensures it sets up enough importer hooks that the imports will resolve to the source tree files.

dholth · May 6, 2020, 3:55pm

Part 1: metadata and other things that might need to go into site-packages. To take care of “package==1.0 is installed” which is separate from making “import package” work.

So an existing metadata hook puts a .dist-info in a folder by itself.

If you put a .pth file in that folder too, pip might copy it into site-packages. Should pip create a .pth file for you?

I would rather put these in a folder and not have to zip it up here.

I would use relative paths wherever possible to cope with chroot.

Part 2: code that needs to be executed before the module is imported (importer hooks)

Aside: remember top_level.txt? https://setuptools.readthedocs.io/en/latest/formats.html#top-level-txt-conflict-management-metadata

pf_moore · May 6, 2020, 4:04pm

OK, what I’m saying is that your implementation and @steve.dower’s comments have confirmed that if we put some files into site-packages, we can expose files from the source directory. The .pth file approach does this, relying on the ability of a .pth file to execute code. This is slightly risky, because that ability isn’t encouraged, but it’s there. Alternatively, by putting a “dummy” version of the installed package with a __path__ attribute in site-packages, we can execute code at import time to do what’s needed.

The point for me is that it’s possible, and someone can now do the work of sorting out the details and ideally implementing a library to do the heavy lifting. I’m happy to do that, or someone else can, it’s not that critical.

The other point of this topic was to standardise the mechanism we use. I expect that to come out of implementing it, there’s enough technical details that I don’t think it’s worth trying to standardise without a reference implementation.

Those questions are about the hooks proposal, as I see it. That debate’s ongoing, and is not what I intended this thread to be about (we lready have 3 threads for that!) What I believe we’ve done here is confirm that the current glibly statement that “the front end will expose the files” can actually be refined into a proper statement for the standard. Maybe that just means that we’ve done the step that @pganssle was proposing, of writing (or at least designing) a proof of concept front end implementation for his proposal. It’s not how I thought of it, but I’m OK with it fulfilling that role.

I never had any intention here of actually writing the “editable hooks” PEP, or even of progressing that debate. I made this a side topic precisely so that it had a much mode clearly defined and smaller purpose.

Does that help reconcile the two statements that I made for you?

pf_moore · May 6, 2020, 4:07pm

I’m not quite sure what you mean here, but it sounds more like it’s about the protocol hooks would use, so I think it belongs on one of the other threads?

The front end needs to know what to expose. This topic confirms that the front end will be able to do so.

dholth · May 6, 2020, 4:26pm

@bernatgabor says

the backend will enumerate files that must be mounted and where under the site-packages (aka practically create a virtual wheel) - in this case, the frontend will need to handle how to make that available - this would also have the downside that new files are not automatically added, and exclusion is harder too;
or what I was suggesting does not do this, instead just exposes enough to insert a path on sys.path and then register some interpreter initialization script that ensures it sets up enough importer hooks that the imports will resolve to the source tree files.

But if the first part isn’t expected to include the entire package, e.g. only dist-info and a .pth, then that could work. Especially given .pth's most interesting feature. In other words no contradiction.

bernatgabor · May 6, 2020, 5:12pm

Why does the .dist-info needs to be there? The distribution discovery runs over the sys.path so if something adds the in-the source-tree .dist-info path things will work without.

I’m fairly certain that at last PyCon people were nodding on adding all files explicitly… not just some shims to make things kinda work. For it fully working (e.g. ability to exclude modules, and other more advanced features), the backend needs to manage with import hooks, so that lands us at my proposal.

dholth · May 6, 2020, 5:14pm

It used to be the case that pkg_resources had a separate path to find *.egg-info and *.dist-info and that a new .pth file would not change that, hence the .egg-link file. Haven’t checked lately.

It is helpful to have the *.dist-info/RECORD per target environment to support uninstalls without special logic.

pip does know how to uninstall setup.py develop packages. I suppose it removes the .egg-link and updates the multi-line .pth file.

bernatgabor · May 6, 2020, 5:26pm

The uninstall case makes sense

pf_moore · May 6, 2020, 6:11pm

I just realised the implications of what you’re saying here. (It came to me when I thought “why am I looking at writing code for the frontend to build a wheel?”) Sorry if this was the point you were trying to make, and I was just unable to see it.

If I’m reading this right, we don’t need an “editable hook protocol” at all. If a backend can build a wheel which, when installed, will make a project available as editable, then all we actually need is a way for the frontend to say “build in editable mode”. And at a pinch, the PEP 517 config_settings argument can do that (in practice, we’d want to add some sort of official “editable mode” flag, so that there’s a common convention for all backends).

I’m sure we’ll get some pushback from backend developers about “why do we all have to reinvent editable mode?” But that’s what this topic can do - we can define a standard “editable mode layout”, provide a library to support building it, and backends can just reuse that, knowing they are following the standard approach and won’t be left maintaining a bunch of custom code.

steve.dower · May 6, 2020, 6:14pm

I’ve abandoned about three replies clarifying this, as I’m not really having the best week and am trying to minimise posting things on the internet right now

It wasn’t the main point, but it’s an implication, for sure. The challenge is enabling those backends that really want to use symlinks, as there’s no universal way to put those into a wheel (but then, it doesn’t have to be universal here anyway).

dholth · May 6, 2020, 6:24pm

Don’t build a wheel, just build a folder with a *.dist-info and any other files you need copied to site-packages i.e. purelib or platypuslib

Remember ninja mode module initialization?

def _init_module():
   global a_name
   a_name = ...
   del _init_module

_init_module()

I love how creative some of the editable proposals are, even if it’s not stuff we’re used to doing a lot of like import hooks.