Pip - new resolver, finding compatible package version through backtracking, and how to save users diskspace

This is inspired by this GH issue, and our work on providing users with understand of what’s happening with pip is backtracking.

With the new resolver, while pip is looking for a compatible set of depenedency package versions to install, it needs to (currently) download each version in order to work out if it is compatible or not.

As @pradyunsg, @pf_moore, and @uranusjr mention in 8713 above, pip has no current way to know if the package version it’s about to download is compatible.

Suggestions of including dependency information in metadata would take a lot of development time. This option would be preferred.

From the perspective of the user, even if they do not understand why pip needs to do this (and we can’t expect them to understand, or even need to understand), they will understand these packages are no longer required - so why does pip not clean up after itself? This is a reasonable request - pip currently does that after an install anyway.

As an interim, I’d like to suggest the following -

At the end of the (successful) installation process, I’d like pip to do one of these (1. most preferred to 3. least preferred):

1. Prompt the user “Do you want me to remove the packages I downloaded in order to find the compatible one?” Y/N/Not sure

  • Y: deletes the packages pip had to download during that install process
  • N/Not sure: does not delete them
    Most preferred as it informs the user of a potentially important decision. (If most people say "I don’t care about these packages, just delete them, then option 3 below would be preferred)

2. print a message for the user to explain how they can remove those packages?

pip prints a message

To save your storage space, you can delete the interim package versions by typing: rm (-rf) /path/to/where/the/packages/are-stored/

Not ideal, however it still saves a user in this situation from loosing disk-space to not longer needed packages.

3. automatically remove the “incompatible” package versions

Not ideal either as the system has done something the user may not want to have happen.

Easiest to implement would be 2 (essentially printing a message).

Thoughts?

One immediate thought - deleting the downloads means we’ll need to download them again next time. Keeping them in the cache speeds up future installs.

Why isn’t this just the same as any other part of pip’s cache management?

Can we can 100% the user will need to download them again? As mentioned I am looking at this from the humans perspective not from pip’s perspective.

Because it saves the situation where pip is downloading multiple versions of large packages, as mentioned in that issue.

No. It’s like any cache, it’s paying a cost upfront for potential future savings. From a human perspective it should be transparent and just mean that “things are faster the second time”. But like any tradeoff, sometimes a user wants to intervene (“where did all my disk space go?”) so having a manual override (pip’s cache management commands) is important. We do have that, though.

I thought that was a user being concerned that we did the downloads at all. Looking at a load of older versions isn’t avoidable, it’s how the new resolver works (as explained in that issue). But caching means that we may be able to serve the downloads from cache and avoid the web requests. So I see removing the cache entries as being actually detrimental to the user’s concern, rather than the other way around.

There are possible solutions here. Regrettably, they mostly need more work than a “quick fix”. One possibility I’ve considered is caching (package name, version) -> dependencies metadata. That would potentially be a lot less data to retain than the full downloads. But again, like any cache, it only helps on subsequent runs. And it’s non-trivial to implement, so I won’t get to it in the next week or two, certainly.

1 Like

Prompting the user is a no-go, as plenty of pip runs are done through CI or in other batch/noninteractive situations.

Can we can 100% the user will need to download them again?

There are plenty of situations where a user would be deleting & recreating the same virtualenv repeatedly. For example, if they’re using tox and they also like to keep their local repo checkout clean. Or even if they’re just using nox, which recreates all its virtualenvs on every run by default.

pip can be told to “be quiet” so that can be addressed.

There’s a lot of ifs in there!

@pradyunsg reminded me the improvements needed for pip cache. That seems to be a better place to address these issues.

1 Like

I think you will need to invert this and say that pip can be asked to prompt; too much breakage to make what you’re proposing the default scenario.