Why doesn't `pip download` accept mixed binary/source-code downloads?

On some occasions, I saw myself needing to prepare wheelhouses, for distribution on environments that have neither internet nor any PyPI-compatible index access. As per pip’s documentation, a wheelhouse can be built using either pip wheel or pip download. Generally, I do these steps separately, because it’s easier to distribute, use and distinguish what is the package itself from what is the dependency of said package. As most of my packages are Pure-python, I’m worried mostly with the dependencies themselves (transitive or otherwise).

If you issue a pip download with no platform tags, pip uses the current environment’s platform to determine which built distributions to download. It appears it gives preference to them, and if not found, a source distribution will be downloaded instead. Then, all you need to do is do a pip install with the list of all distributions. As per the documentation, follows an example:

PROJECT_ROOT="$(pwd)"
cd "${PROJECT_ROOT}"
mkdir -p dist/wheelhouse
pip3 download -d dist/wheelhouse .

Then, to install:

# Just an example. Could be done some other way as well (e.g. user/global install)
VENV_DIR="$(pwd)/venv"
python3 -m venv "${VENV_DIR}"
cd "${PROJECT_ROOT}/dist/wheelhouse"
"${VENV_DIR}/bin/pip3" install --no-deps --no-index --force-reinstall ./*

Problem is, most (if not all) the time, environments differ drastically from one another. Developers may be on Windows, Mac or Linux, and CI runners are mostly Linux. On the final (app) execution environment, it may even be ARM-based. So, only executing pip download without platform tags is insufficient. It’s not possible to focus on all environments, so for the most part I worry about the final environments the most (this means Linux, and x86_64/ARM).

For most of the projects I worked on, we implemented some sort of “package release train,” with the intention of providing a specific list of packages in specific pinned versions. This is extremely useful when provisioning lambda layers, EMR clusters and whatnot, and accelerates deployments by a great deal – on top of the previous restriction of no internet access, etc. But I need to enable the whole platform tag set. Updating the pip download, this is what I imagine I could use, considering I know with certainty what is the final environment’s platform:

# Again, this is just an example.
# Why isn't this possible?
# More than one wheelhouse will be created, one for each combination of platform flags.
PROJECT_ROOT="$(pwd)"
cd "${PROJECT_ROOT}"
mkdir -p dist/wheelhouse
pip3 download \
  -d dist/wheelhouse \
  --implementation cp \
  --python-version 37 \
  --abi cp37m \
  --platform manylinux2014_x86_64

Alas, this gives me an error:

ERROR: When restricting platform and interpreter constraints using --python-version, --platform, --abi, or --implementation, either --no-deps must be set, or --only-binary=:all: must be set and --no-binary must not be set (or must be set to :none:).

I would prefer to download only binary distributions, but the reality is that not all upstream dependencies provide one for the platforms I need. Also, most of the time, I’m unable to fix --only-binary=:all: or --no-binary=:all:, because some may be provided as source-only, others as binary-only, etc.

Regardless, why does it matter to pip that it downloads only binary distributions in this specific case? I imagine there’s absolutely no issue if I distribute both in the wheelhouse, as I won’t install them locally, and I know for sure I’ll be able to install the source distribution in the environment the wheelhouse is destined to be installed. Even [pip install --find-links]( pip install - pip documentation v22.2.2 (pypa.io)) hints at this kind of support:

If a URL or path to an html file, then parse for links to archives such as sdist (.tar.gz) or wheel (.whl) files. If a local path or file:// URL that’s a directory, then look for archives in the directory listing. Links to VCS project URLs are not supported.

So, my question is: Is there any way or plan to support the download of mixed source-binary distributions in the case of a platform-specific resolution, via pip download (or even pip wheel)?

Source distributions have no concept of a platform/abi/implementation, so specifying those things for a source distribution download makes no sense.

@pf_moore I know, but it shouldn’t matter anyway for this use case. All I want to do is prepare a weelhouse, with source distributions if no built distribution is available for a specific package.

Pip is already capable of resolving to the source distribution when necessary, in the host platform; all we need to do is enable it to do the same for a cross-platform scenario – which I suppose shouldn’t be that difficult.

Maybe, but that doesn’t mean people won’t file issues about it. :wink: I suspect there’s a question of what do users expect? If you specify those flags, do users expect everything to work on e.g. an air-gapped system? In that case you can’t make that promise with an sdist as people quite possibly don’t have the build setup as appropriate. But if people just expect best effort then that aligns more with your desires.

I’m puzzled by your suggestion here. A wheelhouse (the clue’s in the name :slightly_smiling_face:) should only contain wheels and is platform-specific. Yes, you can use pip download (with the --only-binary :all: option) to download wheels, but including sdists is a very different matter. And contrary to what you seem to be doing, I don’t believe the pip documentation ever suggests including sdists in a wheelhouse. If nothing else, you’d need to include build backends in the wheelhouse to support building the sdists, and pip won’t automate that for you at all.

I can see why you might want to create a directory of mixed sdists and wheels in the way you suggest, but it’s not really an operation that’s directly supported by pip. You might find that pip install --dry-run --report will give you the information you need to script a solution, though - the report file is machine-readable JSON and gives URLs for everything pip would download.

I suspect there’s a question of what do users expect?
(…) In that case you can’t make that promise with an sdist as people quite possibly don’t have the build setup as appropriate.

What I suggest doesn’t really change what those commands already do.

Expectations could be established, and clarification could be provided, in the documentation – just like what it’s already done for wheelhouses, and for the installation commands in general. Better yet would be to issue a warning at the time of resolution/download, be it immediately (ie. when using platform flags without --only-binary etc.), and/or when a source-only distribution is found for a dependency.

Or, if there’s still opposition to that at a conceptual level, a new flag could be introduced, to allow for download of “mixed-mode” distributions, and/or in substitution to --only-binary/--no-binary, e.g. --allow-sources.

Overall, changes in both expectation and code should be minimal, but the benefits would be immense.

If you specify those flags, do users expect everything to work on e.g. an air-gapped system?

Ultimately, the user is the one in control of the environment. That definition does not come from Pip, but from the user, so it’s not desirable to assume too much, IMHO. Again, in a normal installation, Pip already uses sdists when a bdist is not found, and it already expects a proper build environment is in place.

Back to the “partial” wheelhouse, the user need only be aware that there’s an sdist in there. This could be resolved with warnings, or by the users themselves (with a simple `find -name '*.tar.gz``, etc.).

Pip’s job is to be a frontend, and manipulate/install packages. User’s job is to provide proper environment for installation. Trying to assume or control the user-side of things too much unnecessarily hurts the user himself, and makes perfectly valid use cases impossible (or hard) to implement.

But if people just expect best effort then that aligns more with your desires.

That’s what I expect of Pip, in general. No software is a silver bullet. :stuck_out_tongue:

I’m puzzled by your suggestion here. A wheelhouse (the clue’s in the name :slightly_smiling_face:) should only contain wheels and is platform-specific.

Can we introduce a new term, then? I suggest “partial wheelhouse”, or “disthouse”. Regardless: the command to generate them is the same (pip wheel/download), the command to install dists is the same (pip install), and the difference of house types is only in semantics, honestly.

I can see why you might want to create a directory of mixed sdists and wheels in the way you suggest, but it’s not really an operation that’s directly supported by pip.

Can we introduce said support, then? I’m sure it can be a valuable tool for users. It will greatly reduce the amount of boilerplate one needs to write, at least for those use cases I presented. What do you think?

You might find that pip install --dry-run --report will give you the information you need to script a solution, though - the report file is machine-readable JSON and gives URLs for everything pip would download.

Thanks for the suggestion, I didn’t think that was a possibility. I’ll investigate that!

But I still think this is a bit too much, and will require quite a lot of additional scripting. It could be made simpler, as I suggested.

IIUC, this whole discussion is about the following, but in a trenchcoat?

It is a tractable problem for sure, assuming we’re able to do the stuff discussed at Resolution for TargetPython != current python evaluates markers against current python. · Issue #10050 · pypa/pip · GitHub (this is a link to a comment from the issue mentioned above, with relevant discussion after it).

IIUC, this whole discussion is about the following, but in a trenchcoat?

Well, it is not. (But please do correct me if I’m wrong in my understanding.)

If I understood correctly, the issue you linked proposes an overhaul on how Pip treats/understands compatibility tags and environment markers (as per PEP-508) internally, as well as a redesign of related flags in the CLI. Seeing that I touch on the topic of cross-platform resolution/tags, I do see how this issue could be related, especially in regards to how markers should be treated in the presence of --platform, etc. Not all environment properties can be represented or inferred by only a compatibility tag or flags, and there could be some dependency checking for a very specific non-present environment properties somewhere in the dependency chain (e.g. platform_release, platform_version, etc). Environment values could (and would) be different as well, and this would lead to a different target resolution when compared to one that would be done in a proper, native environment. In practice, though, it is highly uncommon to check on anything that can not be safely inferred from a compatibility tag, at least in my experience – although I don’t have much experience yet with Python’s ecosystem, I admit.

But again, Pip can resolve dependencies in a cross-platform manner today, and it works reasonably well. I’m not asking for any changes regarding that. I just wanted the restriction on sdists to be removed. This does not require any changes to how Pip does resolution today – and it may even make resolution more flexible, when it would otherwise just fail.

Despite the issue you linked being tangentially related to what I asked, I’m very interested in the development of that proposal. I have many scenarios where such a thing would be useful (I’ve written quite a few “version lock” scripts already, I admit!).

Having to lock/pin versions for a specific target environment is someting developers themselves do, or introduce to the project, somehow. At least in general, e.g. when a new dependency is introduced. And not all developers have ready access to all the environments where the software will be run. Version locking should not require full-blown installation and pip freeze… Having a way to programmatically update some constraints file (e.g. one per compatibility tag), using just PyPA tools (pip, and even pip-tools, etc., ie. without requiring a “project manager”) would be awesome. I rather prefer to stay with setuptools (and some small companion scripts only when necessary), to be honest.

But again, this is not what I’m advocating for, in this discussion. But it IS a topic that interests me a lot. So, thanks for sharing! :slight_smile:

Feel free to create a PR for that, if you think it would be useful. I don’t promise that it will be accepted, as I still don’t completely understand how having a sdist in the downloaded set of files, but none of its build depenedencies, would help. But that might be easier to thrash out when there’s working code to discuss.

Please understand that the issue here (at least for me) isn’t technical - yes, the behaviour you’re suggesting could be implemented. My concern is the scope question. While pip doesn’t have an easily defined scope, we do try to be careful to keep things focused. We’re only a small maintenance team, and we already have difficulty managing the wide range of use cases pip supports, so we tend to be cautious about accepting new functionality that doesn’t have a clearly understandable use case, or which appears to be only relevant for one user[1].


  1. Or a fairly small number of users. ↩︎

Fair enough. I understand your point of view.

I’m not familiar with Pip’s codebase, but I’ll study it and try my hands on a PR, as soon as I get the time. Thanks for your input!