Bootstrapping a specific version of pip

Hi, I’m a developer for the Spack package manager. Spack is a general-purpose package manager, similar to Conda in the sense that it can build both Python and non-Python libraries. In the past, our Python library installation procedure was basically:

$ python setup.py build
$ python setup.py install --root=...

This made it easy to install libraries like setuptools/wheel/pip without having to rely on an existing pip installation. However, we were recently informed that direct invocation of setup.py is now deprecated.

In order to convert our Python build system to use pip instead, we first need to figure out how to bootstrap pip. I’ve read through pip’s installation instructions, but I don’t see a way to specify which version of pip gets installed with either ensurepip or get-pip.py.

Spack is designed for air-gapped systems without internet access, so we need to be able to download the appropriate source code ahead of time. We also need a stable checksum for any downloads. When looking at get-pip.py, I don’t see a version-specific URL, so I assume the checksum of this download changes after every new release?

TL;DR: what’s the recommended way to bootstrap pip if I need a specific version (reproducibility) and a stable download checksum (security)?

1 Like

Can you use the releases from pip · PyPI ?

Yes, I can download the source code directly. But how do I install that source code without running python setup.py install? There seem to be hints in the get-pip.py GitHub that suggest I can pass a directory containing the source code to install offline, but how do I download a stable version of get-pip.py whose checksum won’t change? The GitHub has tags for specific versions, but the README says:

You should not directly reference the files located in this repository and instead use the versions located at https://bootstrap.pypa.io/.

If you are ok with using a pre-built wheel you can call pip from inside the wheel to install the wheel. Otherwise you’ll have to orchestrate setuptools and wheel in a manner similar to nixpkgs/default.nix at 21.11 · NixOS/nixpkgs · GitHub. get-pip isn’t intended for packagers.

1 Like

This is news to me! Can you share more details?

~ $ python -m venv --without-pip venv
~ $ source venv/bin/activate.fish
(venv) ~ $ curl -O https://files.pythonhosted.org/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1683k  100 1683k    0     0  1899k      0 --:--:-- --:--:-- --:--:-- 1897k
(venv) ~ $ python pip-21.3.1-py3-none-any.whl/pip install pip-21.3.1-py3-none-any.whl
Processing ./pip-21.3.1-py3-none-any.whl
Installing collected packages: pip
Successfully installed pip-21.3.1
(venv) ~ $ python -m pip --version
pip 21.3.1 from /Users/me/venv/lib/python3.9/site-packages/pip (python 3.9)

AFAIK this is what get-pip does as well with some extra steps to install setuptools and wheel.

1 Like

That wheel installation method is very interesting. We generally try to avoid wheels in Spack and always build from source, but since pip is pure-Python I think it would be okay. Does this method require that setuptools is already installed, or does pip vendor a copy of setuptools?

You could always unpack the Pip wheel manually: PEP 427 -- The Wheel Binary Package Format 1.0 | Python.org

No as setuptools is a build tool and since pip is already “built” into a wheel there’s no need for setuptools. What it’s doing is executing the zip file to unpacking itself and copy the files to the proper locations.

Nope. All of pip’s vendored code can be found at pip/src/pip/_vendor at main · pypa/pip · GitHub .

Note that treating a wheel as an zipped application (which is what this is doing) is an accident of the fact that wheels are (currently) zipfiles. Future wheel formats may not support this. But it does work now, and as noted get-pip uses this method, so it’s unlikely to “just stop working”.

But of course the wheel format is explicitly designed to be easy to unpack “by hand”, so if all you want is to bootstrap pip from a specific wheel, that’s also an option.

1 Like

But pip uses setuptools at run-time…

It looks like they vendor pkg_resources (which comes from setuptools) so maybe that’s all that is needed? I’ll give this a shot and see if I can successfully install things with pip if setuptools is not installed.

You will be able to install wheels, definitely.

If you want to install from source, the first thing pip will do is build the source, which it will do by setting up an isolated environment, installing the build tools in there, and then doing the build. That step of installing the build tools is where pip might want to download setuptools - the build tool is specified by the project, and traditionally has always been setuptools, but alternatives like flit exist and are used by some projects.

If you want to build Python projects from source on an airgapped system, you will always have problems, as projects can require arbitrary packages to be installed for the build (and before PEP 518 was developed, the only way to know what was needed was by reading the docs). Without knowing how you handle this in your existing process, it’s hard to advise here.

To give an example, suppose package X ships just a sdist. To build that sdist, you need setuptools and wheel, because it uses a setup.py file. But in that setup.py file, it imports numpy. And it also requires Cython as part of the build process. It also uses setuptools-scm to build. How does your existing build system ensure that the right versions of setuptools, wheel, setuptools-scm, numpy and Cython are available for the build, without requiring those packages to be installed system-wide?

But all of this only happens on the build machine (where you build the Spack packages). It should not be needed at install time, at which point I’d expect that you should only be dealing with prebuilt binaries.

Great questions!

In Spack, like Conda, we encode a list of dependencies for a package. Spack has a concept of build-time dependencies (that need to be in the PYTHONPATH when building the package), and run-time dependencies (that need to be in the PYTHONPATH when importing the installed package). For things like setuptools and Cython, these tend to be build-only deps, while things like numpy tend to be both build- and run-time deps. Like Nix, all packages get installed to a unique hashed prefix, allowing us to do things like build numpy with multiple different compilers and BLAS/LAPACK libraries.

So basically, finding dependencies isn’t an issue (although it’s tedious for whomever writes the build recipe), the only issue is bootstrapping the minimum set of Python libraries (pip, setuptools, flit, poetry, etc.) needed to build other Python libraries.

OK, so you probably want --no-build-isolation. You can then manage all of the build dependencies yourself as you do at the moment.

And to go back to your original question, no pip doesn’t need anything extra installed, not even setuptools. You only need setuptools in the sense that it’s a build dependency, pip never uses it in any other way (apart from the vendored pkg_resources, which you’re already aware of and which is part of the pip wheel so you’re covered there).

Each revision of get-pip.py for a pip release[1] has a corresponding tag: Tags · pypa/get-pip · GitHub

Honestly, I think that should be a much better way than anything described here so far (using a detail of the wheel spec, using the detail that pip is built with setuptools [2]). The only situation where these tags might be off, is on versions where we had bugs in the get-pip.py script which needed fixing. :man_shrugging:

I’d prefer we get issues for those cases, so that we can fix those tags for all users.

I’d like to elaborate on this: pip will work just fine, without setuptools or wheel in the environment. What won’t work, is that certain projects have been written with the assumption that setuptools will be installed in the environment pip is run in. This is such a common assumption that every supported mechanism to install pip will also install setuptools and wheel. To be clear, this is for situations where a project is being built via the “legacy” mechanisms, i.e. without any of the build environment isolation. The “modern” way of doing things via a pyproject.toml file has a better story of separating build and runtime environments; at the cost of needing to expose that complexity to users.

In Spack, if you’re certain that you’ll get the build-time dependency declarations correct (or want failures when they’re not), don’t give pip any runtime dependencies and make sure setuptools/wheel don’t end up in the build environment unless explicitly specified.


  1. At least, since I automated the release process for that. ↩︎

  2. Which will generate the script shims incorrectly, last I checked. ↩︎

Correct me if I am wrong, but get-pip will not install the copy of pip that it bundles. If you want to install a specific version of pip offline, how will get-pip help you with that?

If you are uncomfortable with treating the wheel as a zipapp, you can simply unpack it and run pip from disk - the net effect is the same - you just call pip to install the wheel you’ve just unpacked. To me, that’s the most straightforward way to install pip.

So I’ve dug into this a bit further. I don’t love the wheel approach, so I was trying to do what Nix is doing and build everything from source. In order to build and install pip from source, you need the source code for pip, setuptools, and wheel. In order to run pip install, wheel is also needed (unless there’s a way to tell pip to just install without building a wheel).

Nix (and get-pip) seem to take the approach of installing pip/setuptools/wheel at the same time to the same installation prefix. We try to avoid this in Spack because it prevents you from changing the version of one package without changing the others too.

I think what I’m going to try is:

  1. Install pip by adding the source code for pip/setuptools/wheel to the PYTHONPATH
  2. Install wheel using pip by adding the source code for wheel to the PYTHONPATH
  3. Install all other Python packages by adding a build dependency on pip and wheel (and any other deps)

Does that seem reasonable? I’m sure poetry/flit will need their own bootstrapping procedures, but I think these 3 steps should be sufficient for all setuptools-based packages. For example, setuptools itself can be built with pip/wheel without adding the source code for setuptools to the PYTHONPATH first.

You’re right, but note any of those packages can change at any point their build backend or build dependencies, so while that approach might work today it’s not guaranteed to work tomorrow.