Where is my cached Pip wheel coming from?

I was trying to investigate some ways to automate venv creation and updating, when I noticed something odd on my system. If I create a new venv:

$ python -m venv example_venv
$ example_venv/bin/python -m pip install --upgrade pip
Collecting pip
  Using cached pip-23.3.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-23.3.2

Wait, cached? But there doesn’t appear to be anything like that in the cache:

$ example_venv/bin/python -m pip cache list | grep pip

shows nothing. I know that my system Python uses a hacked ensurepip and that it’s installing wheels from /usr/share/python-wheels, which gets copied to share/python-wheels within the venv. But that only contains the original Pip wheel, not the one for the upgrade:

$ ls example_venv/share/python-wheels/ | grep pip
$ ls /usr/share/python-wheels | grep pip

Where/how else does Pip cache wheels? How am I avoiding a fresh download for each new venv?

You could brut-force search for the .whl: find /usr /var /home -name 'pip*.whl'?

I couldn’t find it anywhere that way (and a bunch of variations). The closest I found was wheels for other versions in other virtual environments (especially ones that have virtualenv), and the .dist-info folder for the installed upgraded Pip in the new virtual environment as well as in a built-from-source 3.11 (where I sudo upgraded it as part of my experiments - the system Python is 3.8, and I’m not touching it until it’s time to upgrade the OS entirely).

Use strace to see what files pip accesses?

It looks like it’s copying the wheel out from an extension-less file in the http sub-directory of the cache, which file appears to embed the length-counted wheel along with some HTTP metadata.

The cache directory also contains a http-v2, documented like:

Changed in version 23.3: A new cache format is now used, stored in a directory called http-v2 (see below for this directory’s location). Previously this cache was stored in a directory called http in the main cache directory. If you have completely switched to newer versions of pip, you may wish to delete the old directory.

I assume that when I got the 23.3.2 wheel the first time, it wasn’t from an environment running 23.3.1, therefore it was put in the old cache.

pip cache list, even on 23.3.2, apparently only shows wheels from the wheels subfolder of the cache (which are stored as actual .whl files), not from http nor http-v2. On the other hand, it seems that the HTTP cache (either version; http-v2 apparently just separates out HTTP metadata from the actual downloaded files) might contain any kind of files previously downloaded from PyPI, including sdists, READMEs, JSON-formatted listings of available versions etc. It’s not clear how Pip knows what wheels are available in the cache; but obviously it does. (Based on the subfolder names, I assume this is using some kind of database library.)

My understanding from the documentation is that wheels only contains locally-built wheels (i.e., from sdists retrieved from PyPI), not wheels that were directly provided by PyPI; and those are only in http/http-v2 instead. But I would argue that pip cache list ought to list the downloaded wheels as well. After all, is it not the purpose of the command to indicate what is cached and can thus be installed without a download?

Shouldn’t you have to activate your venv before installing things?

I have noticed myself that installing things I used before into a new venv often mentions a cache. This seems very reasonable behaviour to me.

On my machine (Windows) C:\Users\Jeff\AppData\Local\pip\cache\wheels.

“Activating a venv” just manipulates some environment variables so that the prompt changes, PATH changes and a deactivate command exists. Installing to the venv can be done just as easily by running the venv’s Pip using the venv’s Python, which in turn can be done by explicitly specifying the path to that Python and using -m pip instead of running Pip directly. That’s how I’ve done it in the example.

I think you’ve missed the point of the thread. The cache directory apparently contains wheels (or at least, binary blobs that include wheels) in other sub-folders, and Pip knows how to use those wheels for installation; but pip cache list only shows what’s stored in the wheels subdirectory (i.e., wheels that were built locally, not downloaded).

I know, but I would not feel safe in treating what I did next as representative of behaviour within the venv until these were set. It is, however, all magic to me.

Possibly. But you seemed to expect the cache only to be in a sub-directory of the venv, while I find it to be in a user-specific place outside the venv, and (in pip cache list) to contain things I installed in other versions of Python, and a long time ago. YMMV.

I never activate venv’s and they always work without surprises.

The only reason to activate, as far as I know, is to be able to type “python” or “pip” and have the venv versions run.

1 Like

I know, but I would not feel safe in treating what I did next as
representative of behaviour within the venv until these were set.
It is, however, all magic to me.

I too use venvs extensively but never “activate” them. Most
Python-based tools I use for day-to-day work (and there are dozens)
are pip-installed into individual venvs and then I maintain a
symlink farm to their entrypoints from a ~/bin directory that’s
added to my default $PATH at login. It works just fine, really.

1 Like