Pip freeze, vcs urls and pep 517 (feat. editable installs)

sbidoul · April 11, 2019, 5:38pm

Hello,

I’m posting here following a conversation on pip issue #609, where Chris Jerdonek suggested I bring the topic to this forum for a broader discussion.

My use case goes as follow:

it’s about applications (not libraries)
I add top level dependencies in setup.py install_requires
when I need to use some unreleased version of a dependency (top level or not), I add a VCS reference in requirements.in (eg, -e git+https://github.com/someorg/somelib@somebranch#egg=somelib).
pip install -r requirements.in -e .
pip freeze > requirements.txt, which pins exact commits of any vcs url I’ve put in requirements.in

I assume this workflow is fairly common?

Notice the editable (-e) option to install vcs dependencies, which is necessary so pip freeze works correctly, preserving the exact vcs url (including commit sha for git) that was installed. pip freeze works in presence of editable installs because it knows about egg-links to find the source and look if it’s a vcs checkout. Given the modern state of things, relying on egg-link and egg-info is not really future proof I suppose.

My first question would be: is there an variant to the above scenario that would work today when vcs dependencies are using another pep517 backend than setuptools (eg flit)? I tried pip-tools which also relies on editable installs for vcs dependencies. A quick look at poetry hinted at a similar behavior.

Working without -e (ie implementing pip issue #609) is not trivial because it requires a new metadata to store the VCS origin and defining it’s semantics might not be easy.

So I tend to think implementing editable installs for pep517 would be a more rewarding approach (given the other use cases it enables).

From my initial research, I must say I find the approach that flit proposes for editable installs is quite elegant and simple (ie generate dist-info + symlink or pth extension). Could it be amenable to standardization with a reasonable effort? I humbly sketched a possible approach for a pep517 extension in https://github.com/pypa/pip/issues/609#issuecomment-478333485. I copy it here for completeness:

backend provides prepare_metadata_for_install_editable() and install_editable()
backend install_editable() has the choice to install with the .egg-link method (would be for setuptools only), or .dist-info + symlink, or .dist-info + any other mechanism it wants (.pth, etc)
when doing an editable install, frontends add a SRCLOCATION metadata to record the source path of the install (ie the local directory from where the install was requested)
pip uses the new SRCLOCATION metadata (and fallbacks to egg-link) to detect editable installs so pip freeze can continue working normally
when editable installs create a .dist-info, pip uninstall would work normally (no need to special case, since the RECORD would simply contain the symlinks or pth files to be removed)

I’m willing to put some time and energy into this topic (eg elaborating the proposal, doing some implementation work in pip), but since this problem has been considered hard so far I guess there are probably many hidden complexities I’ve not identified.

Looking forward to your thoughts.

uranusjr · April 11, 2019, 9:28pm

My understanding (I could be wrong) is that editable was dropped intentionally in the current spec to push the good parts through the door faster, and the intention has always been to cover it some time. Thanks for putting time and effort into this! I hope we can figure this out soon

njs · April 11, 2019, 10:39pm

Editable installs are really complicated, because:

there’s no real way to make them work 100% correctly, because you can’t guarantee that metadata/binaries/source files remain in sync with each other
so instead the goal is just to come up with enough hacks and kluges that they work “well enough”
but no-one is sure what “well enough” actually means, because there are so many subtly different use cases, and no-one fully understands how it works now

This makes it hard to make progress

One thing that would help though is to better understand what exactly -e is doing now and why people are using it, and ideally disentangle some of those uses.

In this case, it sounds like you don’t actually care about the install being “editable” (i.e., you’re not going in and mucking around the with VCS checkout in your live environment); you’re just using -e as a magic flag that means “I want pip freeze to work correctly”.

Maybe we should rephrase this as: how can we make pip freeze work correctly for VCS installs that don’t use the -e flag? That seems like a natural thing to support, and it would let us make progress on your problem without getting stuck in the full editable installs morass.

cjerdonek · April 11, 2019, 11:26pm

Yes, I want to echo this. There are two different problems being mentioned here: (1) adding to PEP 517 support for editable mode, and (2) getting freeze to work better for VCS urls in general.

I think (2) has a similar issue to (1) though in that it’s not necessarily clear what the “right” url should be. I also think some of pip’s behavior around this is broken or at least not necessarily what we want, so I’d prefer if we don’t perpetuate those parts further. To provide one example, if someone installs a remote VCS url in editable mode, and then proceeds to “edit” the install by making a commit, then pip freeze’s output won’t necessarily be correct since the commit sha won’t necessarily exist in the remote repo. But maybe this will be easier in the non-editable case since we don’t really have to support the dependency being edited in that case.

pf_moore · April 12, 2019, 8:01am

I think there’s actually a fundamental design conflict here, that we’ve worked around in a lot of cases - but maybe we should be looking at facing it head on.

Pip treats projects as unique based on the combination of project name and version. The resolver, dependency checks, all that sort of thing work based on that pair of items. But VCS urls, and various other places where we let people point at “this precise bunch of source code, as it is right now”, don’t have that equivalency - it’s possible for the code to change but the version stay the same (and that’s inherent in the view that you’re installing “a specific bunch of source code”, rather than a version of a project).

Refactoring pip internally to make a clear distinction between these two types of project might help rationalise some of this behaviour (e.e. pip freeze), as well as giving us a better handle on what we need to make editable work more cleanly.

sbidoul · April 14, 2019, 12:08pm

Thanks for the feedback so far. How can we progress?

I totally agree -e being used for pip freeze to work with vcs urls can be seen as incidental. Should it be deprecated? I don’t know. It’s useful today, that’s for sure.

This is indeed the exact topic of pip#609 where I come from, and my first objective. As @cjerdonek wrote, there is no clear way to implement that yet, however. And since editable installs for pep517 are needed anyway, I figured going down the editable install path could be easier – at least it’s a path I can visualize – and a way to kill two birds with one stone, so to speak.

From the comments so far, I don’t yet really understand why extending pep517 for editable installs would be such as morass.

It’s true that editable installs have quirks. @njs mentioned the desynchronization of source and metadata. That one sounds easy to explain and understand, and there is probably no way to fix it anyway? Are there others?

How could we collect those use cases?

OTOH, from a pep517 point of view, is there a need to actually specify in complete details what editable installs must do, beyond the obvious use case of -e? Can’t it just say to backends (as I suggested in my proposal above), please install dist-info metadata for the source code in this directory, and make it so that this source code gets executed in place instead of being copied to site-packages? The exact means to achieve that could be left to backends, as long as they provide minimal dist-info metadata.

Anyway, if the conclusion is that it’s still premature to standardize pep517 editable installs, then I’m happy to put my effort on pip#609 (ie pip freeze without -e): the only thing to decide for that is where does pip record the vcs url of the install.

for git/hg when not using -e, it’s the url used to install, with any branch/tag/ref part replaced with the corresponding commit id.

pf_moore · April 14, 2019, 4:40pm

When we developed PEP 517, there was so much debate over other aspects of the proposal, that there simply wasn’t energy left for dealing with editable installs. To be honest, I’ve no idea if they would be particularly hard to support - no-one has really looked into the issue yet.

What needs to happen is for someone (you?) to open up a discussion, draft a proposal, and then try to gain a consensus on a design. I’d suggest that it be framed as a new PEP. One key thing that would help the discussion would be to get some data on how backends other than setuptools would handle editable installs - it’s too easy otherwise to define a standard that simply replicates setuptools’ behaviour and which other tools can’t implement.

Another area where I can imagine some debate is around the fact that PEP 517 decoupled build and installation - the existing PEP 517 interfaces don’t include a “install” option, the process is intended to be that the backend gets asked to build a wheel, and then the frontend installs it (installing a wheel is defined in the wheel spec, and is pretty straightforward for frontends to implement). This allows front-ends to implement non-standard installs (like pip’s --target) without needing any co-operation from the backend. Editable installs will either have to design a similar mechanism, or the PEP will have to address the question of how the backend and frontend communicate where the project is to be installed.

njs · April 14, 2019, 6:49pm

From that thread:

So we’d need some sort of metadata in dist-info. Perhaps a VCSORIGIN field?

* it would be specified to be a frozen VCS reference (a git commit, not branch or tag)
* it would supersede the version for the purpose of pip freeze
* it would supersede the version as wheel cache entry key

This would need some tweaks probably, but the basic idea also seems PEPable, and would make a lot more sense to me anyway. (It’s bizarre that -e has this special meaning in requirements files that has nothing to do with editing anything.)

Maybe a piece of metadata to record the requirement that installed this package, if any? So it could also capture pinning information, etc.

cjerdonek · April 14, 2019, 11:18pm

My main advice would be to keep the scope narrower if possible rather than try to solve multiple things at once.

Here are some questions and things that come to mind when thinking about these topics, depending on the scope. Some relate to editable mode, some to pip freeze, some to installing from a VCS, and some to a combination. Note that I’m not asking anyone to answer these directly. They’re just things to consider when drafting a proposal and that could warrant answers depending on what the proposal covers:

Adding support for editable mode to PEP 517 in general would mean supporting both VCS urls and directories. If a directory comes from cloning a VCS url, would installing the directory in editable mode know about that, or would it only use the information that it’s a directory?
If a directory installed in editable mode is also a VCS repository but isn’t from a clone, would pip freeze reflect that it’s a repository (e.g. showing the commit hash)?
If a directory is installed in editable mode that has a remote VCS url associated with it, would re-installing that directory in editable mode reach out to the remote URL to check for updates (in-place update)? What if the directory has changes that would get overwritten?
If VCS source information is stored as metadata, would that information be stored as opaque strings (e.g. decipherable only by the front-end), or would the format of those strings also need to be standardized? For example, pip uses VCS urls that include a part for the subdirectory: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support
If the form of the string is standardized, would it need to be standardized separately for each type of VCS, since each type of VCS can use different ways of representing a particular commit?
What if there is other information needed to reproduce a VCS download (e.g. whether to retrieve particular submodules) – would that be stored somewhere?
If a package is installed from a VCS url, would there be a requirement to get the commit ID and/or history, or could it just download that revision as a snapshot?
I raised a use case a few years ago in the context of editable mode. I think it would be worth knowing about when considering any new implementation: https://mail.python.org/pipermail/distutils-sig/2016-March/028478.html
Basically, if a directory is installed in editable mode, I think it would be useful if any metadata files / directories didn’t have to be stored in the directory itself.

A couple final comments:

To simplify any discussion, it seems like the behavior of pip freeze should be decided separately from how to store any metadata. (The pip freeze use case could still be used to inform what should be stored as metadata.)

Lastly, if a “vcs origin” is stored as metadata, I think the original requested information should be preserved (e.g. branch, tag, or ref), and not store only the commit ID that it resolves to. pip freeze could still have the behavior of outputting a commit ID though even if the original branch / tag / etc. information is preserved.

njs · April 15, 2019, 1:34am

I guess there are three pieces of information that might be useful to record:

Why was this package installed? What did the user actually request?
- Examples:
  - request for an exact version: pip install pkg==1.0.1
  - unconstrainted request for the latest version: pip install pkg
  - request for package to match some range of versions: pip install pkg>=1.0
  - request for specific file or directory: pip install ./pkg-1.0.1.tar.gz, pip install .
  - implicitly by requesting some other package, that pulled this in as a dependency: pip install some-other-package
  - request for the head of a given git branch: pip install git+...
- Potential uses: retroactively constructing a Pipfile from a virtualenv. Detecting when a package was only installed to satisfy a dependency that’s no longer relevant, so the package can now be auto-removed. (This is a nice feature that some other systems support, like apt.)
What was actually installed?
- Examples: A file from PyPI, with a fixed name and hash? A file from somewhere else? If the user requested the HEAD of some git branch, then which exact revision did that turn out to be?
- Potential uses: pip freeze. Retroactively constructing a Pipfile.lock from a virtualenv. Trying to figure out what heck you actually have in your environment.
What dependencies does this package satisfy?
- This is what the current Name: and Version: metadata tell us.

Right now pip doesn’t track (1) at all, and conflates (2) and (3) – basically assuming that any package can be uniquely identified and recreated given just the (name, version). (Except, apparently, for the one special case of pip freeze when looking at editable installs of VCS URLs.) As @pf_moore noted, this conflation has caused other problems in the past.

We don’t necessarily have to track all of this information. (I think apt autoremove just tracks a single bit for each package, “was this installed manually by the user or not?”.) But I think this is all the information that exists, so any operations have to be defined in terms of it somehow. And it’s certainly useful to distinguish these three things when thinking about how pip could work.

pf_moore · April 15, 2019, 8:40am

Note that this information is multi-valued (this is something that has affected other dependency tracking systems in my experience). For example:

User installs foo, which depends on bar. Bar gets installed with a “why was this installed?” saying “dependency of foo”.
User installs bar manually. Nothing happens, as bar is already present (but the metadata should be updated to say “also, user requested an install”).
User uninstalls foo. Uninstalling bar is now wrong, as the user wants it independently.

Note that after 2 but before 3, the user might want to say “I no longer want bar for itself”, so that at (3) it can be uninstalled. We don’t even have a way of requesting this in pip at the moment.

I’m not saying that all of this complexity needs to be addressed, but we should be aware if we’re only implementing a partial solution (and in that case we should definitely err on the side of not uninstalling things or otherwise making decisions based on the data unless we’re sure it’s OK to do so).

sbidoul · April 15, 2019, 10:08am

Thanks a lot for all the valuable insights.

That’s interesting. Introducing an install concept in the pep517 interface that is all about building might therefore not be a good idea. Or it should then be a completely different interface, where the frontend does little more than passing through to setup.py develop or flit install --symlink.

That, with the recognition that pip freeze working with -e is incidental and not designed, are solid enough arguments to get me back to focusing on pip#609 only.

pip freeze > requirements.txt is indeed the use case I want to focus on for now, and sounds tractable to me. For this use case, the only missing piece is the VCS origin (with commit id only).

I think attempting reconstruction the a Pipfile from a virtualenv is an interesting but completely different use case that can be addressed independently. (BTW, a REQUESTED metada entry is defined in pep 376, and is generated by flit install, but not by pip, although there are some mentions of it in pip’ source code, while pip generates a metadata file named top_level.txt)

I now think it should be opaque, ie specific to pip, since it is the (pip) frontend which knows how to interpret and download the VCS url, and therefore how to re-do the exact same download.

That is not supported in pip requirements today AFAIK, so this can be addressed later if needed.

Some packages need the history to compute their version (with setuptools_scm for instance), so the history should be downloaded by default. That part of pip would therefore not change. The VCS origin containing the commit id could be a key for pip’s wheel cache though. This would be good for performance and an answer to the frequent request to introduce support for git --depth in pip.

So the proposal would be pip specific, implementing pip#609 to enhance the behaviour of pip freeze in presence of vcs urls (no PEP required in that case I assume).

a new metadata file pip_vcs_origin.txt created by pip when installing a vcs url in non editable mode (behaviour of editable installs would not change)
containing a frozen VCS reference (ie the original url, with the branch/tag/ref part replaced by the commit id)
if present, it supersedes the version during pip freeze
(cherry on the cake) during installation from a VCS url, the wheel that is being built is cached using the content of pip_vcs_origin as cache key

If I do a pip PR in that direction, would it get traction?

cjerdonek · April 17, 2019, 3:08am

Where would this file be stored? Also, if information needs to be stored, does it have to be stored in a new file, or can it be stored using an existing mechanism?

Personally, I don’t think we should lose what branch or tag etc. was requested. The commit id can be obtained on the fly anyways, which is what pip does currently with editable installs.

This also makes me wonder if it would be sufficient simply to store a marker / flag with the meaning of, “this was installed from a VCS url” (kind of like with editable installs), because things like the remote VCS url and commit id can be obtained by inspecting the repository itself. This is also how pip does it with editable installs.

In addition to the above, if this is going to be a pip-specific feature, it seems like there is value in having the underlying mechanism / code path be shared between editable and non-editable installs as much as possible. This would be both for ease of maintainability and also for consistency in behavior between the two cases for the end user.

njs · April 17, 2019, 3:30am

I’m not a pip maintainer so I can’t give you an official answer on what they’ll accept. But as a general rule, pip tries not to define its own proprietary metadata formats. There’s a general goal that it should be possible to replace pip with another compatible tool, just using PEPs.

Only for editable installs where the VCS checkout itself is added to sys.path. If we want to decouple VCS origin tracking from editable installs then we need to track that information for regular installs too, where the checkout isn’t available.

sbidoul · April 17, 2019, 7:20pm

I don’t see how this is possible. Or it would break the current behaviour of pip freeze of editable installs. As quirky as the current implementation can be seen, I don’t think we should break backward compatibility there.

I did some more digging and I found the source_url spec in the withdrawn pep426.

So the way to go would be extracting the source_url part of pep 426 into a new pep that proposes a new source_url.txt metadata extension? I could try.

sbidoul · April 18, 2019, 7:59am

OTOH, it seems all current installers delegate the installation of vcs url to pip.

So a discussion on standardizing that today could be premature, lacking concrete requirements for alternative implementations?

uranusjr · April 18, 2019, 9:33am

Personally (as a Pipenv maintainer) I think VCS installation is much too complicated to implement, and falls into the “nobody actually knows what to do, let’s just standardise whatever pip is doing right now” category. (If it should be standardised at all—personally I’d rather see it disappear, it’s such a pain to deal with, editable or not.)

pf_moore · April 18, 2019, 10:05am

I’d agree with that - although it’s used by a lot of users, so I don’t imagine it’ll ever actually happen

Back to reality, I think any work on this would have to start by documenting, in a “draft standard” style, pip’s current behaviour - pip’s current docs are great for using the feature, but not so good for understanding the details. And given that it needs to be done anyway, I don’t think the additional work of proposing that behaviour as “the standard any front end must follow” is unreasonable (the fact that pip is the only actual frontend isn’t new - it’s the same issue for every standard we’ve developed). If there’s any debate over details of pip’s current behaviour, then that is when the process gets more complicated, but on the other hand it’s something that’s probably needed anyway (whether for “writing a standard” or “tidying up pip’s behaviour” isn’t that important TBH).

sbidoul · April 18, 2019, 10:59am

IMHO, locking vcs dependencies is a very important feature for integrators.

Editable installs is a different matter. My understanding is that pipenv and other tools support editable because it’s the only workaround for freeze to work today. If frontends don’t support editable, it’s not a huge problem, as long as backends provide an alternative way to insert the code in sys.path.

@pf_moore, I can attempt to draft a PEP, documenting how pip interprets vcs urls, documenting the pip freeze use case, starting from the source_url section of pep426 to make it broad enough to support the pip install and pip freeze scenarios. I’ll look at how other languages (Ruby, …) deal with that topic.

That pep would not talk about editable installs at all: this can and should be addressed separately (as discussed above in this thread).

I understand pep drafted by muggles must have a sponsor? Is someone willing to sponsor this and guide me a little bit in the pep process?

cjerdonek · April 18, 2019, 11:59am

The VCS part of pip’s code base (along with freeze) is the part I’ve been working on and focusing on the most over the past couple years – gradually simplifying it, refactoring it, making it more malleable, etc. in order to be able to add new features and fix more bugs. This has been limited primarily by the availability of reviews from other pip maintainers. There is a lot that can (and IMO should) still be done to improve things. (Again, this is limited primarily by the availability of reviewers.) But things are in much better shape than they were before.

If pip’s behavior is going to be documented, I would rather that be done to inform the development of standards and make sure any standards can support pip’s behavior, rather than documenting things with the aim that other tools should implement things the same way. There is a lot I think we wouldn’t want to perpetuate.

Can the standard be limited just to the information that should be stored in the metadata and how that information should be stored, rather than e.g. what the urls should look like when installing and freezing?

(Meta note: I’m starting to become a little afraid of posting to this thread, because Discourse gave me a warning about posting too much the last time I posted something, even though it didn’t seem like I was posting much more than other people.)