List dependencies of a package

Hi, I’m a developer for the Spack package manager. Spack is a general-purpose package manager, similar to Conda in the sense that it can build both Python and non-Python libraries. Like Conda, Spack package recipes explicitly list all build and run dependencies. Unlike Conda, Spack packages usually contain several different versions, each of which may have a different set of required dependencies. Obviously, this can be difficult to maintain, and we would like to find a way to automate this if possible.

Python packages can list their dependencies in one of several places:

  • pyproject.toml: usually just build dependencies are listed, unless tools like flit/poetry are being used
  • setup.cfg: only for projects that use the setuptools build system
  • setup.py: for packages that need to dynamically change their deps
  • requirements.txt: for packages that use pbr or load this file in their setup.py
  • any others?
  • possibly all of the above in a single project

In order to parse all of these, Spack would need parsers for TOML, INI, Python (not possible), TXT, and possibly more file formats. However, pip seems to be able to parse all of these, and tries to download dependencies if they are missing. Is there any pip API to get this list of packages? Ideally I would be able to get back a list like:

build_requires = ['setuptools>=32', 'wheel']
install_requires = ['numpy~=1.21.0', 'colorama; platform=="Windows"']
extras_require = {
    'foo': ['foo', 'bar'],
    'baz': ['baz==1.2.3'],
}

If I can get that level of information, I can translate that to the equivalent Spack terminology. I can then do this for every version of every Python package, making it easier to update things.

Note that this is not a question of detecting which packages a library depends on (tools like findimports are designed for this). I simply want a list of dependencies declared by the package developers in the above files. If those dependencies are wrong, I would consider that to be a bug and report it to those package developers.

P.S. I’m sure this is a common question, and I apologize if this is a duplicate. I saw many similar questions but haven’t yet found an identical post that answers my question.

1 Like

any others?

Off the top of my head, pipfile/pipfile.lock is still a relatively common one, and its important to note that setup.cfg uses a custom INI parser, and there are multiple formats for deps in pyproject.toml: build deps under [build-system], standardized PEP 621 runtime (“project”) deps under [project] , and deps for many of the non-setuptools build backends (Flit, Poetry, PDM, etc), each in their own tool-dependent format, under [tool]. Plus, there are “extras”, which are expressed differently for each formats, and legacy setup-requires and test-requires equivalents in some formats. In short…what a mess! Hopefully, someday they’ll all use PEP 621 project source metadata for runtime deps, and [build-system] for build deps, but that future is still years away.

However, pip seems to be able to parse all of these, and tries to download dependencies if they are missing. Is there any pip API to get this list of packages?

Disclaimer: I’m pretty far from a packaging expert unlike many users on here, so take this with a grain of salt, but the following is my high level understanding of the situation.

Historically, this worked because all packages used a setup.py, which pip would call as part of its build/install process, and then Setuptools (or the legacy distutils) would actually do the dirty work (or they used a requirements.txt, which had to manually be fed to pip with -r).

Nowadays, the way this works (for non-editable installs) is defined by PEP 517. Under the modern nomenclature, Setuptools is what we would call a build backend, which does the dirty work of taking the project source tree and transforming it into a distribution package that pip knows how to install, either a semi-standardized sdist, or ultimately a standardized wheel; while pip is a build frontend, which directly interacts with the user and calls the backend. Other package manager with their own tool-specific dep formats, including Poetry, pipenv, Flit, PDM, etc are all build backends, though some of them can act as frontends too.

Essentially, the backend takes care of handling its own tool-specific (or generic) dependency format and transforming it into standardized metadata in the distribution packages following the core metadata standard, which pip can then consume to install its dependencies, and so forth in the same manner. But in order to use that, the package needs to be rendered to a source distribution (sdist), and from there a built distribution (wheel), which then contains the deps under Requires-Dist keys in the RFC 822 (ish) format METADATA file under the .dist-info directory in the wheel archive.

So, to use the same method as pip and work with any modern Python package regardless of build system or format, what you could do is:

  1. Use build, pip or another build frontend, or call the PEP 517 hooks directly, to build the project into a built (“binary”) distribution package for your target platform
  2. Extract the build wheel (its just a ZIP) and locate the METADATA file
  3. Parse the file with Python’s email.parser in legacy compat32 mode (or write/use a parser that emulates it) and extract the Requires-Dist keys
  4. Parse the requirements with packaging’s Requirements format parser
  5. Rinse and repeat for each package and sub-dependency.

As you can see, this is certainly non-trivial, and seems unlikely to fully meet your requirements above (no running Python). However, it is unfortunately the only reliable way to do so for arbitrary packages, using a variety of build tools and dep formats, particularly in the case of non-static metadata (setup.py) that can be arbitrary and must be executed to return an accurate result.

This is basically most of what Linux distros do in order to repackage Python projects into their own distro packages; conda has automated checks that I believe do either this or read the source, but canonical deps are specified in the conda reciepie. pip-tools hooks pip at a low level to generate fully resolved standard requirements.txts, but these contain all direct and indirect deps and are resolved down to concrete versions which is not likely what you need. While it is a lot, if it only needs to be done on CI services or at the packager’s end, it is much less prohibitive than for every package user.

The alternative, which might make more sense for a “best-effort” approach as a first guess or additional check for human packagers, is reading as many source dependency specification formats as possible using the appropriate parser for the file and packaging.requirements to parse the actual PEP 508 strings, and you could do basic static AST introspection or even regexes on setup.pys to try to guess at the deps there. But you’d have to be able to tolerate a significant amount of error and a substantial and perhaps, at least for a while until PEP 621 adoption accelerates, a growing number of projects whose deps are in tool-specific formats, as well as a hopefully shrinking ones who use setup.py instead of static metadata.

So, it really depends on your goals. However, perhaps others might have better ideas on an approach that requires less novel effort on your end by hooking the internals of existing tools. Ultimately, though, something is going to need to do each of those steps in order to get from a project source tree to reliable package requirements for an arbitrary Python project, at least until the world adopts PEP 621 (or at the very least static requirements metadata in a constrained number of formats).

You cannot create a package out of a Pipfile and I don’t think anyone’s loading Pipfiles into their setup.py.

@CAM-Gerlach is mostly correct. However, we also have implicit and dynamic build dependencies. I would suggest to extract a package’s build dependencies using build “out of phase” to configure Spack.

Initialise a project using build from source:

import build

project = build.ProjectBuilder('project_path')
static_build_deps = project.build_system_requires 
# Install these and then...
dynamic_build_deps = project.get_requires_for_build('wheel') 
build_deps = static_build_deps | dynamic_build_deps

import build.util

# Install the dynamic build deps and then...
wheel_metadata = build.util.project_wheel_metadata('project_path')
runtime_deps = wheel_metadata.requires()

# Write build_deps and runtime_deps out to your build file
1 Like

We use thoth-solver to obtain this type of package metadata. Essentially, it downloads the package and installs it into an environment. Then the tool extracts relevant metadata of the package that can be specific to the environment. Naturally, this can involve building the package. That is expected in our case as the tool is also capturing how the package behaves when one tries to simply install it using pip in the target environment. Maybe it could be helpful for you in some way.

1 Like

Oh you’d be surprised to know a lot of people both are doing exactly that, and actively want Pipenv to support the use case better by providing a one-to-one conversion API (which is of course impossible, and I got a lot of hate keep saying no).

Sorry for the pipfile confusion; FWIW, I’ve never actually used pipenv myself or worked on any major projects that did, so it slipped my mind that pipfile/pipfile.lock was intended as a format for specifying concrete deps for an application rather than abstract deps for a library (as I understand it).

Uh… well then :upside_down_face:

No, pip doesn’t parse any of these - it asks the backend to build a wheel and reads the metadata from there, which is what I’d recommend you do.

Python package metadata, which is where install requirements are recorded, is only available in built packages (i.e., wheels). No one application or library is able to extract metadata from the source code. Instead, tools work with wheels, or build wheels on the fly, and extract the metadata from there. In particular, pip builds all source distributions it has to work with, to get the metadata. Pip doesn’t have an API (it’s an application, not a library) but even if it did, there wouldn’t be a way to get requirements from an arbitrary source tree or distribution.

Build requirements are obtained in a similar manner, by asking the backend (PEPs 517 and 518 have the details, basically you parse pyproject.toml and then call the build backend to get any extra build requirements the package has specified in a backend-dependent way).

Yes, this is inefficient. But it’s baked into Python packaging, and we have to work with the reality. Standards like PEP 643 – Metadata for Package Source Distributions and PEP 621 – Storing project metadata in pyproject.toml make it easier for tools to read metadata from source (643 for sdists, which are a specific standard format for source archives, and 621 for random source trees). Neither is in common use yet, though, and both allow for the project to declare metadata as “dynamic”, which means that you still have to do a build to get the value.

1 Like

Might be worth taking a peek at dephell. The author has marked the project as archived, but I’ve used it sparingly to do some conversions to/from a few different formats fairly successfully.

Hello all! I happen to be working on a pip resolve command in metadata-only resolve with new `pip resolve` command! by cosmicexplorer · Pull Request #10748 · pypa/pip · GitHub right now to make pip print the output of a resolve in a structured way (JSON)–it appears to have clear maintainer buyin and seems likely to be merged under an experimental flag soon!

Please feel free to weigh in on the pip PR on whether the current JSON output i’ve presented fits the needs of this goal. It would be significantly faster than normal pip download to execute, since it doesn’t download wheels and instead uses the sick jedi mind trick I figured out last year, implemented in --use-feature=fast-deps to only examine the metadata file from a remote wheel by doing a series of http range requests (thereby avoiding the download of e.g. gigabyte size wheels). I believe this performance can be even further improved.

I have also tried to propose this pip resolve command for the use of the pex python packaging tool as well–see this comment on their repo: Add support for json output to the Graph tool. · Issue #1137 · pantsbuild/pex · GitHub. pex is invoked as a subprocess by the pants build tool.

The dist_info_metadata key is the implementation of PEP 658. For pip resolve -o output.json tensorboard && jq ./output.json, the current JSON output format looks something like:

{
  "experimental": true,
  "python_version": "==3.10.1",
  "input_command_line_args": [
    "tensorboard"
  ],
  "resolution": {
    "tensorboard": {
      "requirement": "tensorboard==2.7.0",
      "download_info": {
        "url": "https://files.pythonhosted.org/packages/2d/eb/80f75ab480cfbd032442f06ec7c15ef88376c5ef7fd6f6bf2e0e03b47e31/tensorboard-2.7.0-py3-none-any.whl#sha256=239f78a4a8dff200ce585a030c787773a8c1184d5c159252f5f85bac4e3c3b38",
        "hash": {
          "name": "sha256",
          "value": "239f78a4a8dff200ce585a030c787773a8c1184d5c159252f5f85bac4e3c3b38"
        },
        "dist_info_metadata": null
      },
      "dependencies": {
        "google-auth-oauthlib": "google-auth-oauthlib<0.5,>=0.4.1",
        "setuptools": "setuptools>=41.0.0",
        "tensorboard-data-server": "tensorboard-data-server<0.7.0,>=0.6.0",
        "protobuf": "protobuf>=3.6.0",
        "numpy": "numpy>=1.12.0",
        "wheel": "wheel>=0.26",
        "grpcio": "grpcio>=1.24.3",
        "tensorboard-plugin-wit": "tensorboard-plugin-wit>=1.6.0",
        "absl-py": "absl-py>=0.4",
        "requests": "requests<3,>=2.21.0",
        "markdown": "markdown>=2.6.8",
        "google-auth": "google-auth<3,>=1.6.3",
        "werkzeug": "werkzeug>=0.11.15"
      },
      "requires_python": ">=3.6"
    },
    "absl-py": {
      "requirement": "absl-py==1.0.0",
      "download_info": {
        "url": "https://files.pythonhosted.org/packages/2c/03/e3e19d3faf430ede32e41221b294e37952e06acc96781c417ac25d4a0324/absl_py-1.0.0-py3-none-any.whl#sha256=84e6dcdc69c947d0c13e5457d056bd43cade4c2393dce00d684aedea77ddc2a3",
        "hash": {
          "name": "sha256",
          "value": "84e6dcdc69c947d0c13e5457d056bd43cade4c2393dce00d684aedea77ddc2a3"
        },
        "dist_info_metadata": null
      },
      "dependencies": {
        "six": "six"
      },
      "requires_python": ">=3.6"
    },
... (truncated)
2 Likes