Discuss PEP 662: Editable installs via virtual wheels

pf_moore · June 4, 2021, 7:13pm

What about non-symlink implementations? Error or ignore? Is it even acceptable for a frontend to refuse to install a valid response from a backend? The point is that every type of implementation has to decide what to do when faced with a headers key in the returned data.

Actually, I just re-read the proposal again. There’s nothing in there that states that the front end is required to do anything with the SchemaPaths data that the hook returns. So it’s completely valid for a frontend to install metadata, dependencies, entry points, RECORD and direct_url.json, and nothing else.

Please tell me I missed something, or the proposal will get amended, because if that is intended to be a valid front end then IMO the proposal is just broken. The user asks for an editable install of foo, the backend reports exactly the needed files to expose to get such an install, and the front end does nothing, and yet is compliant with the spec?

I know we’re trying to leave the definition of what it means for an install to be “editable” as somewhat vague, but we have to have something here. If the backend says “this file must be exposed for the install to count as editable”, surely the frontend can’t be allowed to ignore that?

I’m not interested in this as a question of “which proposal is better”. I’m asking from the POV of “if I had to implement this PEP, what would I do?” So I don’t care about what PEP 660 does. What I’m asking is, if this proposal got accepted, does it tell me enough to let me implement the frontend side of the protocol? And I’m feeling like the answer is “no, there are too many unanswered questions”.

OK, if you want to look at it like that. The library is the key piece here, much like editables was one of the key pieces of the PEP 660 PoC (the other being integrating it into pip so people could use it).

I disagree, the library is the bridge here. A backend implementation for PEP 660 is “line up the information, call editables, assemble a wheel from the data and pass it back through the hook”. A frontend implementation for virtual wheels is “call the hook, collect user preferences somehow, pass the data returned from the hook and the preferences to the library, handle any response from the library”. There’s very little in common between a PEP 660 backend implementation and a “virtual wheel” frontend implementation beyond “they call the library to do the heavy lifting”.

I have no intention of making editables into a library for this proposal, so while someone is perfectly welcome to steal ideas from editables, I won’t be writing any sort of front end implementation/library for the virtual wheel proposal. Someone else can do that.

bernatgabor · June 4, 2021, 7:38pm

If I’d implement the frontend I’d warn and fallback to pth file, but the frontend is also good to error. Definitely don’t ignore it, fail early.

Potentially. Every frontend can make their own decision; or frontend implementations pull together and delegate the logic to a common library (as does editables for PEP-660).

Did you see this paragraph at the end of the build_editable hook (GitHub - gaborbernat/peps at edit)

The schema paths map from project source absolute paths to target directory relative paths. We allow backends to change the project layout from the project source directory to what the interpreter will see by using the mapping.

For example if the backend returns "purelib": {"/me/project/src": ""} this would mean that expose all files and modules within /me/project/src at the root of the purelib path within the target interpreter.

We probably should add a statement in the frontend requirements to state that the frontend must ensure these paths are exposed into the target interpreter

I mean you can use either symlinks or pth files to satisfy the path mapping stated above. We could amend the PEP to spell it out more explicitly? But note purposefully left it out to not mandate anything and leave this as implementation detail to the frontend. That being said if you read this part GitHub - gaborbernat/peps at edit you can see concrete examples what the backend and what the frontend would do.

pganssle · June 4, 2021, 8:15pm

“Collect user preferences somehow” is not a requirement of the frontend any more than it’s a requirement of the backend to read the config file. There is no requirement that any given frontend take any user configuration, and the user configuration could come in the form of “which front-end do you choose to use for this purpose” in the same way that use configuration in PEP 660 can come from “which backend do you choose to use for this purpose.” It doesn’t seem necessary to me that there’s a single library that does symlinks and editables and pth files and whatnot, though presumably there could be one.

An implementation for a front-end here could easily be, “Take the information from the virtual wheel, call editables, assemble a wheel from the data, then install the wheel.” I’m not sure why it would need to be any different — in a virtual wheel all the information that was going to be passed to editables is passed from the back-end to the front-end, which is free to then pass that directly to editables.

It seems unclear to me where the disconnect here is, because from my point of view it seems almost obvious that the implementation strategies from PEP 660 can be moved to the front-end with ease, and I’m not seeing any of the problems that people seem to think are there.

dholth · June 4, 2021, 8:37pm

The objection is that the virtual wheel proposal has a lot of edge cases that its authors did not appear to see. We understood some of the problems with a mapping technique and wrote PEP 660 instead. We considered several approaches while developing the PEP and came up with something practical. For example I was initially against using the literal wheel format but it turned out to work beautifully.

Problems with the alternate proposal:

Some backends, especially setuptools, do not know which files will be added to the wheel before actually doing it. (They copy into a build directory, and then zip that directory.) This is basically how other binary formats like RPM work as well. Since there won’t be a direct link between the wheel and the editable mapping, where’s the advantage?

The mapping could make no sense. What if a flat directory of Python files is mapped to an installed tree? What if the same file is mapped several times under different import paths?

How would I include a folder, exclude a single submodule, and allow anything else in my source directory, as permitted by the editables library?

The best feature of the alternative is that it breaks people’s expectations of an editable install by being very strict?

In PEP 660 the complexity is all in the backend. Instead of having to debug interactions between a complicated “generate the mapping” in the backend and another complicate “use the mapping” in the frontend. It gives control to the package author, who should be the primary user of the editable feature. We let the backend control what will be installed and pip does the copying, in the same way that we allow the backend to control what is installed in an ordinary install.

bernatgabor · June 4, 2021, 11:34pm

This is only true for now. @pganssle did try to fix this within setuptools, and definitely doable just needs someone to do the lifting for it. In the meantime setuptools can do whatever it does today, just expose the project root as is; which mean setuptools can maintain their status quo and other backends can advance along.

There’s no wheel in this proposal, can you clarify what you’re referring to; and why would it need to be a link between them?

The build backend can do this transformation during their package build today, so why would this not make sense?

The backend can do this today for a wheel during the package build, so why would this be a problem in the case of editable installs?

In the same way, the backend today generates a custom importer that does this custom and dynamic logic the frontend can use a similar technic to achieve this. This PEP does not disallow any of PEP-660.
The backend is still free to generate an import hook as a source file and forward that to the frontend to expose within the target interpreter. However, the backend now doesn’t have only that option, it can also expose the source files directly by passing the list of files explicitly.

I did not follow what you were referring here to, so if you can clarify? Thanks! (but if the backend /frontend wishes this PEP also allows them to be very strict)

This is a trade-off. Sure you have most of the complexity in the backend, but now the backend is extra complicated because it effectively must use import hooks and nothing else to achieve the editable install effect. With this PEP for simple use cases, the frontend can just use symlinks, that are much simpler to debug and understand. And the backends response is also fairly simple, it’s just a .dist-info folder and a mapping of mapping.

Note PEP-660 can hack in symlink support as pointed out:

However, this feels to me so hacky that I’d put it into the category of works by chance rather then by design and choice. With PEP-660 how would the user choose in between symlink mode and register new importer mode?

The issue is that this is not a choice for the package author to make, but rather something that’s dependent on the machine the developer is running. That might not support symlinks so the user need to be able to request via the frontend the import mode, or if their platform does support symlinks for performance and simplicity reasons might wish to explicitly request a symlink mode (assuming their frontend supports both). The argument of this PEP is that how the editable installation is achieved should be in the hand of the user via the frontend, not the build backends exclusive right.

dholth · June 4, 2021, 11:58pm

Probably we can agree that the lack of symlinks may be the most important weaknesses of PEP 660. It doesn’t look like we are going to agree on the frontend/backend split. I think that having the frontend decide how editables work will be more inconvenient than having per-build-system editable strategies.

Re. your reply, I have experience with both setuptools and pip internals. Setuptools is a pluggable system. Setuptools can do unexpected things, but I trust it to install (into the build directory that is later zipped into a wheel). Several of the hundreds of thousands of existing packages will have a weird install subclass that doesn’t work with refactored setuptools’ “where are the files coming from so I can generate a mapping” hook. Though I see you are content to continue to offer the .pth or “include a few named top level modules” strategy.

My own enscons also does not work in the way implied by the alternative hook - we do not really know what will be in the wheel until it is built and we do not maintain a mapping of source filesystem locations to installed locations. It could still provide the “named top level modules” by out of band metadata. Enscons has a PEP 660 implementation for Paul Moore’s editables==0.1 which I’ll update for 0.2.

I understood that one of the main advantages of this alternative proposal was that it would more closely match a normal install (which goes through a wheel) to an editable install. It will be difficult to achieve that with two build systems I’m familiar with - enscons and setuptools.

bernatgabor · June 5, 2021, 12:09am

Ultimately then I guess we’ll have to delegate the decision to the packaging BDFL or the steering council.

Many backends most of the time will work with both modes and having the choice to select which I want to use on the frontend side I think is valuable. The package authors generally don’t pick their packaging tool based on the editable modes they offer.

The intent is to offer editable functionality, and that can be achieved in multiple ways based on the package in question. Often in multiple ways. In this case, I think there’s value in placing the choice in the user’s hand rather than forcing it one way. As I’ve pointed out above this might allow the simple cases to use simple solutions, and the complex cases can still fall back to more complicated solutions.

It’s not about trust. It’s about I (and by extension the frontend) know my platform best and what works best on it, not the package author.

dholth · June 5, 2021, 12:18am

We’ve talked about this before. I’m hearing that the alternative proposal is also worried about being forced to use an undesirable editable mode. Surely you will be able to avoid using editable installs of packages that you don’t like.

bernatgabor · June 5, 2021, 12:20am

I can’t understand what you mean here.

dholth · June 5, 2021, 1:05am

How is it possible for anyone to care about how someone else’s package installs itself? When will you be required to install it in an editable way, instead of choosing to avoid the package or install it in the normal way, if for some reason the editable way is not to your liking?

dholth · June 5, 2021, 7:37pm

@bernatgabor

pip’s setup.py:

    package_dir={"": "src"},
    packages=find_packages(
        where="src",
        exclude=["contrib", "docs", "tests*", "tasks"],
    ),

In [11]: setuptools.find_packages(where='src',  exclude=["contrib", "docs", "tests*", "tasks"],)
Out[11]:
['pip',
 'pip._vendor',
 'pip._internal',
 'pip._vendor.pep517',
 'pip._vendor.msgpack',
 'pip._vendor.toml',
 'pip._vendor.progress',
 'pip._vendor.urllib3',
 'pip._vendor.chardet',

Should be pretty easy to pass this to Paul Moore’s editables library.

For this proposal, would I pass

{'src/pip': 'pip',
 'src/pip/_vendor': 'pip/_vendor',
 'src/pip/_internal': 'pip/_internal',
 'src/pip/_vendor/pep517': 'pip/_vendor/pep517',
 'src/pip/_vendor/msgpack': 'pip/_vendor/msgpack',
 'src/pip/_vendor/toml': 'pip/_vendor/toml',
 'src/pip/_vendor/progress': 'pip/_vendor/progress',
 'src/pip/_vendor/urllib3': 'pip/_vendor/urllib3',
 'src/pip/_vendor/chardet': 'pip/_vendor/chardet',
 'src/pip/_vendor/resolvelib': 'pip/_vendor/resolvelib',
 'src/pip/_vendor/html5lib': 'pip/_vendor/html5lib',
 'src/pip/_vendor/certifi': 'pip/_vendor/certifi',
 'src/pip/_vendor/packaging': 'pip/_vendor/packaging',
 'src/pip/_vendor/colorama': 'pip/_vendor/colorama',
 'src/pip/_vendor/requests': 'pip/_vendor/requests',

as the purelib key of the returned structure? Assume I’m not interested in exposing any non-purelib etc. parts of the package in the hook.

What should the installer place into site-packages? If it is run in no-symlinks mode? Or if it is run in symlinks mode?

bernatgabor · June 5, 2021, 7:40pm

This PEP aims to support that so you can’t make that assumption. I’ll come back with a POC in a few weeks and we can have a better conversation.

dholth · June 5, 2021, 7:53pm

See why I have questions. So we symlink the src/pip directory into site-packages; we are done or we are putting the other symlinks into the checkout. Hopefully pip isn’t a namespace package. We create an [editables](https://python.org/pypi/editables) style hook - would be easy enough to only include the mentioned packages - are all un-mentioned packages required to be excluded? We mention a bunch of .py files - now we could create a tree of real directories and symlink all the mentioned .py files only? If we only mention packages should the installer discover all the .py and C extensions currently present in the checkout and link those? If I return just { "": "src" } will the installer create a .pth file?

I’m imagining, in the PEP, detailed rules for figuring out what the returned data structure should mean and how it is possible to link it based on whether X is under Y etc? I’m imagining this being impractical to specify.

bernatgabor · June 5, 2021, 9:52pm

Not at all. This would be akin of specifying editable package as spec within PEP-660. How the frontend does the editable mode from these mappings is entirely up to the frontend and considered implement detail. The POC will naturally demonstrate a few possible ways, the same way PEP-660 editable does. But these will not be part of the standard and remain there just for illustration purposes and demonstrating that various features we’re targeting can be achieved with the spec limitations and guarantees.

That’s invalid per the spec, the keys must be absolute paths.

dholth · June 5, 2021, 10:08pm

You get the idea - we could have

normal path mappings that “make sense” together and would be installed in the same shape

just the src/ folder like in my example but with the required absolute path

just all modules (directory names) under the src/ folder

every individual .py / .so / .dll / data-in-site-packages that you might want to access (which might imply intolerable strictness)

The virtual wheel doesn’t contain the right information to do a flexible editable install, so no one’s been able to work out on paper exactly what the installer should do with various such mappings.

takluyver · June 8, 2021, 10:47am

I like the idea of leaving the ‘how’ of editable installation up to the frontend. But I’m not sure that treating the five scheme paths the same way makes sense (apologies if this has come up already - even 55 messages is quite a bit to read).

As far as I know, editable installs to date are ‘editable’ (changes take effect without needing to reinstall) only for importable modules - the purelib & platlib directories. There’s a moderately good reason for this: .pth files provide a simple, cross-platform way to do this, but they are specific to Python imports. The other components (scripts, include, data) don’t really have an equivalent:

We can symlink them, but not on Windows. You also need care in some cases to symlink at the right level (e.g. not replacing the whole of /usr/share/doc with a symlink to your doc folder).
Altering environment variables like PATH works for some parts, but it’s limited/fiddly because the changes only affect the current process & descendants, whereas all processes see filesystem changes.
There’s the ultimate fallback of monitoring the source directory for changes and copying files across. But this is a fairly complex moving piece, and in a lot of real cases the simpler thing - just making Python imports editable - is perfectly sufficient.

I’m inclined to say we should standardise editable installs as they are - only for importable modules - and accept that any other parts of the package will get installed as normal. That points towards something more like PEP 660, because the backend making a wheel is the normal installation path.

So I would propose a kind of hybrid between the two proposals: the backend makes a wheel, but instead of the wheel containing the details of how to make an editable install (PEP 660), it contains something like foo-1.0.dist-info/editable.json with a list of files/directories relative to the project root (like src/foo/) which should be made available on sys.path. As in this proposal, it would be up to the frontend how it did that.

bernatgabor · June 8, 2021, 11:18am

Yes, you’re correct. I’m planning to make this one of the amendments, and I just haven’t got to it.

You can symlink on Windows. You need a new enough variant of it and symlinks enabled. It has been added like five years ago, so as time goes by less and less people will have this issue.

Why normal? Wouldn’t both data and includes be able to handle as symlinks (at least in the majority of cases)? Also, wouldn’t you be able to inject proxy scripts for the script’s part?

I don’t think editable PEP should involve the sys.path. At the end of the day, we don’t care if something is on sys path or not; we care about the import resolves; as such, we should also make this the contract. It’s up to the frontend to decide how to ensure those file paths conclude in successful import resolves. Keeping it opaque like that, the frontend can do it via symlinks or custom importers registered or any other mechanism a frontend can dream up.

Note how this does not rejects PEP-660, but extends it because ultimately the backend will always have the last word and can take control (as defined in PEP-660). This is because it can just not expose the project files but instead pass along to the frontend files that register an import hook. There’s no way or reason to ban that.

takluyver · June 8, 2021, 11:36am

Sorry, I’ve got used to abbreviating this. I know that Windows has symlinks now, but the ‘enabled’ bit essentially kills it, as far as I’m concerned - as a software author, you can’t assume that symlinking works for arbitrary users on Windows, so general purpose, cross-platform tools can’t rely on making symlinks.

Sorry, that was lazy of me. I meant ‘make it importable’ and I forgot that ‘put it on sys.path’ is not synonymous.

You could, but the vast majority of scripts from Python packages are generated from entry points, so you’d have to regenerate them anyway. I don’t think it’s worth the extra complexity given that.

layday · June 8, 2021, 12:10pm

Practically, I can think of several ways in which this can go awry. Consider:

The backend gives the frontend pth files. The frontend decides to create pth files
for the pth files. pth files do not make valid path entries.
Or: the frontend calls dirname on the pth files and puts whatever’s contained
in the folder where the pth files reside on path.
The backend gives the frontend a sitecustomize.py. The frontend creates its
own sitecustomize.py. The backend’s sitecustomize.py has no effect.
The backend exposes a Python package using editables. The frontend exposes
the backend’s proxy editables module using editables. The original package
cannot be imported.

Conceptually, such an arrangement will be very hard to reason about, not just for the user, but also for backend and frontend implementers.

bernatgabor · June 8, 2021, 12:17pm

This would be a frontend bug.

Or just do the sensible thing as a frontend, and you know symlink/copy the pth file?

I’d expect a frontend to either handle the merging of that file or raise an error.

Why would that be the case? The backend registers a variant of editables that exposes the project files. The frontend registers a variant of editables that expose the backend proxy files.

Only if you’re assuming the front end will not take care and blindly do one thing no matter the circumstance. All of these cases can be avoided by doing a few checks in frontends.