Why does pip reach out to indices when a local file is available?

When running pip install foo==0.1.0, and pip finds /local/path/to/foo-0.1.0-py3-none-any.whl, why doesn’t pip by default go ahead and install it? At least in my case, it still seems to be using network traffic and time by going out and querying indices when it would seem like the default behavior might identify this as the BestCandidateResult and short circuit checking elsewhere.

Some context: r/learnpython/comments/nkv159/why_does_pip_search_indexes_when_a_local_wheel_is/ (sorry, new user 2 link limit)

Tox was taking literally hours to run my testing suite on my speedy new M1 Mac, even though the project is laughably small and has minimal dependencies; in debugging, it looks like my local devpi server is running extremely slowly for some reason, roughly 20 seconds per package.

Regardless, I was surprised that devpi mattered at all; I had wheeled my dependencies prior to running my test suite, and I could see with pip install -vvv that it was finding these wheels. However, it was still querying the indices I’ve set up in my pip.conf, and for some reason it was still downloading the wheels from my devpi server even though a local wheel was available (still trying to figure this out).

When a specific version is requested, isn’t finding a compatible wheel on the local filesystem about as good as it gets? I know I can set --no-index and get this behavior, but it seems like having the default be that pip searches indices for a file that it has already found locally seems odd.

Would it be reasonable to refactor get_applicable_candidates + relevant logic into a generator (instead of the sorted greediness) that looked through local filesystems first, and if a specific version is requested and a matching candidate is found, short-circuits the subsequent network requests?

Just imagining something like:

def get_candidates() -> t.Generator[candidate]:
    ...
    for candidate in local_candidates:
        yield candidate
    for candidate in remote_candidates:
        yield candidate

def find_best_candidate() -> candidate:
    candidates = []
    for candidate in get_candidates():
        if specific_version_requested() and matches_requested_version(candidate) and is_local(candidate):
            return candidate
        candidates.append(candidate)
    return max(candidates, key=self._sort_key)

There are two parts in your question:

  1. Should pip not reach the Internet at all if a local file satisfies the requirement?
  2. Should pip always install from a local file instead of a remote one if they are identical?

The answer to the first one is no, because pip does not know if it’s the best file available. There are a lot of things to consider besides versions: wheel platform tags (a platform-specific wheel is preferred over none-any because the former indicates platform-specific optimisations). The only way pip can determine whether a file is indeed the best is either you specify the file by URL, or after inspecting every possible sources. So short of you specifying the package by a file:// URL, the only way to block network access is to remove all non-local indexes.

The answer to the second one is… maybe? Pip does not currently prioritise files from any sources when the files (or rather the file names) are identical. pip is intentionally not promising an order for security and other reasons, but there is an deterministic internal ordering (pip has to choose one somehow), so yeah it makes sense to order local files in front of remote ones as an implementation detail. I guess there’s simply nobody nobody ever raised that issue before. Feel free to propose an enhancement PR on it.

Thanks for your time and response!

Why is that? It looks like the _sort_key returns:

return (
            has_allowed_hash, yank_value, binary_preference, candidate.version,
            pri, build_tag,
        )

So wouldn’t a candidate that maximized the value of each of these by definition maximize (or at least be equal to) the max of the sorted call that is currently used?

Thanks for pointing out the priority for platform-specific tags, but I’m sure you can imagine using that as an example instead of my -none- example above. If there is no hash specified (or if the hash matches? I haven’t played with this yet), a platform specific wheel exists locally, and the version matches, why shouldn’t that short-circuit?

pip is intentionally not promising an order for security

That makes sense, thanks.

Yes, but the maximum of version and build tag components are both infinite, so such candidate does not exist. (Note that more than one version can match ==0.1.0 according to PEP 440.)

Fair enough. I had been explicit about this being relevant for the situation that a desired version is specified, but didn’t realize that build tags are apparently specifically intended for the circumstance of tie-breaking if a version is identical.

That seems pretty damning for this idea.