Why does Pip need an internet connection to install from a local cache?

kknechtel · June 30, 2024, 9:03pm

Normally, installing Numpy in a new venv looks like this for me:

Collecting numpy
  Using cached numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Using cached numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB)
Installing collected packages: numpy
Successfully installed numpy-2.0.0

This happens much faster than my internet connection would ever possibly support, so I’m confident it’s indeed using a local cache.

But if I’ve disabled my wifi or it has blinked out (as seems to happen quite a bit these days), installation fails, which looks like:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a70e20>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66d35180>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a712d0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a71480>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8c66a71630>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/numpy/
ERROR: Could not find a version that satisfies the requirement numpy (from versions: none)
ERROR: No matching distribution found for numpy

So I presume that normally it’s trying to do some kind of dependency resolution and check what version of Numpy is supported. But couldn’t it at least try to see whether any of the wheels in the local cache would work? That seems a lot more useful than retrying a connection that failed with “Name or service not known” (i.e. we couldn’t even reach DNS).

Similarly, if my Internet is down, and I’m trying to pip wheel . (or pip install -e .) a local project that is pure Python and has no dependencies, I don’t think I should need to disable build isolation, just so that Pip will use the Setuptools in its own environment. My ~/.cache/pip is full of wheels that satisfy setuptools>=40.8.0 by now, I’m sure. Surely it could just use one of them? Or at least consider looking at the corresponding cached metadata?

jeanas · June 30, 2024, 10:15pm

I don’t know if there are other causes, but surely if there was a release on PyPI of the same version with a newer wheel build tag, pip would be supposed to pick it up.

(You can use --find-links to explicitly install from a directory of saved wheels.)

kknechtel · June 30, 2024, 10:31pm

Right, I just figured that out with my own research. (And I also figured out that I can tell it not to try to check PyPI.) But the cache isn’t a suitable directory: e.g.

pip install --no-index --find-links `pip cache dir` numpy

fails, with or without Internet. (It was also rather annoying doing the research, because it seems like everyone else out there is trying to avoid using the cache, rather than forcing it! I know I’m not the only person out there who sometimes loses Internet access…)

FWIW, I did manage to build a functional “wheelhouse” by copying wheels out of the directory, which could look like:

$ mkdir wheelhouse && cd wheelhouse
$ find `pip cache dir` -name "*.whl" -type f -exec cp '{}' . \;

Not very pleasant, though. It also didn’t end up having everything I thought it would (i.e., things I’ve recently installed in other venvs).

frostming · July 1, 2024, 3:19am

Because the requests to https://pypi.org/simple/<package_name> are never cached intentionally: pip/src/pip/_internal/index/collector.py at 4d09e3c004857b3ba766cb93cdea8e1fc3c3c677 · pypa/pip · GitHub

However, the package files are cached as you see in the logs.

kknechtel · July 1, 2024, 10:01am

I guess I’ll turn this into a feature request on the issue tracker, then.

aragilar · July 1, 2024, 10:36am

What I do is run a devpi instance on my laptop, which handles the caching of both PyPI and any other index servers that I access. This works really well if you’re building containers as well, where pip doesn’t have a local cache (as it’s a new container).

JamesParrott · July 1, 2024, 9:42pm

Have you tried uv? I can’t say if it works off line, but it’s put a lot of effort into cacheing.

mdrissi · July 1, 2024, 9:52pm

uv has --offline flag which I think (not sure here) might even be used automatically as fallback if only using pinned dependencies that cache has.

This is specific to pins though uv pip install numpy will always want network to check versions (maybe explicit --offline flag will allow staleness though). uv pip install numpy==2.0.0 where 2.0.0 is in cache is where it’s safeish to skip.

ichard26 · July 4, 2024, 1:28am

The feature request already exists from 2020

github.com/pypa/pip

pip install “offline mode”

opened 06:10PM - 15 Apr 20 UTC

uranusjr

type: feature request UX

**What's the problem this feature will solve?** I started wondering whether thi…s is possible when reading #8043. When pip initialises an isolated build environment, it currently access package repositories to get the build tools. This can however be cumbersome when the user does not have Internet access. **Describe the solution you'd like** An `--offline` flag that makes pip use only the network cache to install things. This includes sdists, wheels, and also the index/find-links pages. This would solve the network-less bootstrapping problem as long as the user has installed all the build tools before (and has not cleared the cache). The flag would also need to be passed to PEP 517 hooks. **Alternative Solutions** Currently the user can use `--no-index --find-links` and point pip to the relevent wheels. But this needs the user to plan ahead and download the wheels. pip’s network cache does not expose the file names, and is not practically usable as a `--fine-links` source. As a hack, the wheels of setuptools and wheel are actually available in stdlib’s `ensurepip` package. But this does not work with other PEP 517 backends. **Additional context** This probably needs some works with improving pip’s network cache, and might even be impossible (I haven’t poked at the cache implementation). But I feel this would be a good feature if it’s possible to implement.

It has 17 thumbs up matter of fact. Unsurprisingly, no progress has been made on the issue though.

webknjaz · July 4, 2024, 2:27pm

Pradyun also mentioned the security implications once, IIRC. As in, what if a project in the dep tree was yanked, removed or found to be malicious. It needs to consult the current state of the index to react to such revocations.