Reduce waste for pip install

raj-open · April 22, 2025, 9:46am

This question is to some extent not really just about python. I think there is a lot of waste that happens, when installing dependencies from 3rd party packages. Let’s say you want to use one class and a few methods from a particular package. Pip install will take this package and install all its dependencies in turn. So you end up with a huge .lock file (whether poetry.lock or uv.lock) containing dependencies of dependencies of dependencies.

Some of these, sure, are ultimately used in the methods that you use in your code base. But I would suggest that most of them are not. In one case I remember installing a package for one single method, and this package in turn installed fastapi, greenlet, etc. even though the method itself that I actually used never uses these (not even in the transitive closure).

I wonder whether these can be significantly reduced, to make the “builds” (of python virtual environments) much “thinner”.

Here are some ideas:

Within a python project, the interpreter should be able to for every method, class, constant trace out all the dependencies that are needed.
As a consequence of 1, any package manager will/should know, that in order to install a particular bundle of methods/classes, it has to install a particular (in general strict) subset of the dependencies listed out in the pyproject.toml file.

When installing a package, it should be possible to choose which methods/classes one really needs, e.g.

[project]
name = "mypackage"
version = "1.0.0"
dependencies = [
   ...
   "otherpackage1>=3.0.1,exports=*",
   "otherpackage2>=3.0.1,exports=[SomeClass,SomeOtherClass,some_method1,some_method2]",
   ...
]

(I am of course not suggesting that that be the syntax.)

Now consider the transitive closure. If every pyproject.toml file is set up in this conservative way, we end up with a better diet of dependencies. The enormous tree of unnecessary waste will be pruned down to just the sliver of things that a particular project/package actually needs and uses.

MegaIng · April 22, 2025, 9:58am

This is literally impossible because of the highly dynamic nature of python.

There are tools that attempt to do this kind of analysis as part of e.g. the python-to-exe projects, but they are all only approximations.

JamesParrott · April 22, 2025, 10:47am

A tree-shaking tool for Python would be interesting, but it’s a non-trivial problem. And it applies to applications in which a complete dependency graph can be formed, not to venvs, in which Python users expect to be able to import anything they like that is installed.

A lot of optimisation can be achieved towards the same goal, simply by adding lazy loading of any expensive dependencies, if you’re happy to make waste, as well as speed.

ncoghlan · April 23, 2025, 12:23pm

There are definitely projects that attempt to do this, such as:

PyPI - treeshaker offers an approach for arbitrary environments
PyInstaller tries to only include the files the main application modules need to run

As others have noted, these don’t necessarily work with arbitrary Python code - they may need additional hints so they can find connections between files that they would otherwise miss.

However, the other common aspect is that the pruning happens after the initial installation as part of preparing artifacts for further distribution, as it requires full code introspection on the entire dependency tree in order to work out which parts aren’t necessary for a given use case.