How do wheels find their dependencies?

sajid · February 3, 2020, 4:41pm

Hi Python-community,

I’m trying to understand some things about installation from wheels.

[1] How does a wheel find the dependencies it needs to run? I’ve built a wheel which depends on C/Python libraries (which get listed in the wheel as requirements) but the final wheel does not have the locations of the dependencies (I inspected the wheel with wheel-inspect).

[2] Following up on the above, are these dependencies linked to at run time after a wheel is installed or during the installation phase ?

[3] Is there any difference in installing a wheel (from a .whl file) using pip vs python setup.py install

PS : I’m trying to understand this because I’m figuring out how the tensorflow build process works so as to integrate it into the spack package manager. Bazel builds a wheel using some versions C/Python libraries viz. numpy/absl/… which is then installed via python setup.py install in the current spack recipe for py-tensorlfow. I’ve had to hack the spack compiler wrappers and remove automatic include paths from packages in the build dependency so I want to verify that everything works as I expect it to and the wheel that is built and installed links to the libraries we expect it to link to.

Thanks in advance for the help!

brettcannon · February 4, 2020, 1:07am

Details can be found via https://www.python.org/dev/peps/pep-0427/ and https://packaging.python.org/specifications/core-metadata/.

sajid · February 5, 2020, 2:43am

@brettcannon : Thanks for pointing out the documentation for wheel format!

As per the above documentation wheels have a list of required packages that are required for the wheel to run (along with version and platform tags). If I understand correctly wheel is a built package distribution format (as opposed to sharing the source code directly) and it’s the job of the installer (or package manager) to ensure that the dependencies are present when installing the wheel. What I would like to understand is how a package manager for python like pip or a build system like setuptools does the job of dependency management.

Coming from a mindset of compilation of C libraries, perhaps I have the wrong perspective when I’m looking for the equivalents of static and dynamic objects (dependency resolution at compile time or runtime ?) and linker (how does pip/setuptools link a wheel to its dependencies and ensure that dependency versions satisfy the requirements set in wheel metadata). If this works differently for python packages, is there a document that describes this ?

Thanks again!

uranusjr · February 5, 2020, 5:47am

pip (and other package managers) implements a “dependency resolver” that discovers those entries at install time. Without getting into details (and the peculiar state this topic is in), this basically recursively download a package, look inside to know what it needs, and download new packages to look at.

For Setuptools it’s more complicated. Historically it also has a similar mechanism (Setuptools started out as both a build system and package manager), but recent developments had deprecated the mechanism, making Setuptools more like cc, requiring the caller to provide an environment, and tell it how to build the project.

A package is installed like this (extremely simplified and generalised):

pip fetches the package, looks at its dependencies, make sure they are all installed before continuing.
pip provides Setuptools what are needed to build the package, and asks it to build.
Setuptools tells pip the build result.
pip installs the build result to the target location.

Make a very rough analogy to C:

Something like Conan, cpm, Buckeroo, etc. that helps you ensure dependencies to build a project…
Similar to CMake, collects environment information to call the compiler with the proper flags.
Similar to cc. Build the thing. There’s no linking for Python source code, but this would do that if needed (based on what 2. tells it).
Basically just copy things to the right location, like make install if you build manually.

Does this make sense?

p.s. Don’t think too hard on details; there are fundamental differences between compiled and interpreted systems the analogy is far from perfect.

sajid · February 7, 2020, 3:41pm

@uranusjr: Thanks a lot for explaining this in great detail! I have one last question, is this dependency resolution logged somewhere ? When pip/setuptools decides that package A shall use numpy-version@directoy and scipy-version@directory, does it save this somewhere (perhaps in some sort of log file so that these libraries get imported at runtime)?

I’m guessing that it looks for requirements in PYTHONPATH before downloading and installing dependencies, is this true ?

uranusjr · February 7, 2020, 4:27pm

It works the other way around. The Python interpreter looks at some pre-determined places (including PYTHONPATH) for packages; pip chooses one of them to put the packages it installs into, so the interpreter can locate them at runtime. It’s a bit like how DLL works on Windows (if you’re familiar with that); a binary (e.g. python.exe) would simply reference the DLL by name (e.g. Python38.dll) without resolving it during linking, and the operating system looks at places (e.g. the PATH environment variable) to find a matching file to load.

pip does log the files it installs, so it knows what to do when you run pip uninstall. But that information is not used by the interpreter to resolve import statements.

pip does look for requirements in places (namely, the places it puts things it installs into) to determine whether a package is installed, but generally speaking PYTHONPATH is not one of them.

sajid · February 11, 2020, 3:51pm

Thanks for giving such a detailed response!

I tried installing a simple package (scipy, which has only one dependency numpy) in an isolated conda-env and I see that pip behaves as I expect it to (i.e. downloads missing dependencies from PyPI)

(py3test) [sajid@xrm-backup ~]$ pip install scipy
Collecting scipy
  Using cached https://files.pythonhosted.org/packages/dd/82/c1fe128f3526b128cfd185580ba40d01371c5d299fcf7f77968e22dfcc2e/scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting numpy>=1.13.3 (from scipy)
  Using cached https://files.pythonhosted.org/packages/63/0c/0261693cc3ad8e2b66e66dc2d2676a2cc17d3efb1c58a70db73754320e47/numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whlInstalling collected packages: numpy, scipy
Successfully installed numpy-1.18.1 scipy-1.4.1

Now, when I look at /py3test/lib/python3.7/site-packages/scipy-1.4.1.dist-info , I do see a RECORD file which lists all the files that were installed (.py and .pyc). But, there is no information in the dist-info which logs the fact that pip used a numpy-wheel (cached from PyPI) to satisfy the requirement for scipy. Is this stored somewhere ?