Why does Pip need an internet connection to install from a local cache?

Normally, installing Numpy in a new venv looks like this for me:

Collecting numpy
  Using cached numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Using cached numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB)
Installing collected packages: numpy
Successfully installed numpy-2.0.0

This happens much faster than my internet connection would ever possibly support, so I’m confident it’s indeed using a local cache.

But if I’ve disabled my wifi or it has blinked out (as seems to happen quite a bit these days), installation fails, which looks like:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a70e20>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66d35180>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a712d0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a71480>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a71630>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
ERROR: Could not find a version that satisfies the requirement numpy (from versions: none)
ERROR: No matching distribution found for numpy

So I presume that normally it’s trying to do some kind of dependency resolution and check what version of Numpy is supported. But couldn’t it at least try to see whether any of the wheels in the local cache would work? That seems a lot more useful than retrying a connection that failed with “Name or service not known” (i.e. we couldn’t even reach DNS).

Similarly, if my Internet is down, and I’m trying to pip wheel . (or pip install -e .) a local project that is pure Python and has no dependencies, I don’t think I should need to disable build isolation, just so that Pip will use the Setuptools in its own environment. My ~/.cache/pip is full of wheels that satisfy setuptools>=40.8.0 by now, I’m sure. Surely it could just use one of them? Or at least consider looking at the corresponding cached metadata?

3 Likes

I don’t know if there are other causes, but surely if there was a release on PyPI of the same version with a newer wheel build tag, pip would be supposed to pick it up.

(You can use --find-links to explicitly install from a directory of saved wheels.)

1 Like

Right, I just figured that out with my own research. (And I also figured out that I can tell it not to try to check PyPI.) But the cache isn’t a suitable directory: e.g.

pip install --no-index --find-links `pip cache dir` numpy

fails, with or without Internet. (It was also rather annoying doing the research, because it seems like everyone else out there is trying to avoid using the cache, rather than forcing it! I know I’m not the only person out there who sometimes loses Internet access…)

FWIW, I did manage to build a functional “wheelhouse” by copying wheels out of the directory, which could look like:

$ mkdir wheelhouse && cd wheelhouse
$ find `pip cache dir` -name "*.whl" -type f -exec cp '{}' . \;

Not very pleasant, though. It also didn’t end up having everything I thought it would (i.e., things I’ve recently installed in other venvs).

1 Like

Because the requests to https://pypi.org/simple/<package_name> are never cached intentionally: pip/src/pip/_internal/index/collector.py at 4d09e3c004857b3ba766cb93cdea8e1fc3c3c677 · pypa/pip · GitHub

However, the package files are cached as you see in the logs.

1 Like

I guess I’ll turn this into a feature request on the issue tracker, then.

1 Like

What I do is run a devpi instance on my laptop, which handles the caching of both PyPI and any other index servers that I access. This works really well if you’re building containers as well, where pip doesn’t have a local cache (as it’s a new container).

Have you tried uv? I can’t say if it works off line, but it’s put a lot of effort into cacheing.

uv has --offline flag which I think (not sure here) might even be used automatically as fallback if only using pinned dependencies that cache has.

This is specific to pins though uv pip install numpy will always want network to check versions (maybe explicit --offline flag will allow staleness though). uv pip install numpy==2.0.0 where 2.0.0 is in cache is where it’s safeish to skip.

1 Like

The feature request already exists from 2020 :slight_smile:

It has 17 thumbs up matter of fact. Unsurprisingly, no progress has been made on the issue though.

1 Like

Pradyun also mentioned the security implications once, IIRC. As in, what if a project in the dep tree was yanked, removed or found to be malicious. It needs to consult the current state of the index to react to such revocations.

1 Like