Optimizing installs of many virtualenvs by symlinking packages

I’d like to be able to set up a large number (~100) of virtualenvs on a single system. Many of these virtualenvs will share the same dependencies of the same version although some may have different versions of those dependencies. Suppose I have envs A, B and C all using requests 2.20.0 and env D using 2.22.0.

If I simply run pip installs for all of the envs, I’ll have three copies of requests 2.20.0 on my file system. Is there any standard pattern to avoid that wasted disk space? So for example to have the actual requests 2.20.0 sit in some sort of cache and have each of the virtualenvs either symlink to that folder from its site-packages or to reference that folder with a line in a .pth file.

Sort of the equivalent of pip install -e but against a local cache of packages.

Doing this would also make spinning up a new virtualenv that includes most of the same packages super fast as they wouldn’t need to be downloaded from anywhere, copied or installed.

2 Likes

FWIW, this is actually how Spack environments work.

Each package is installed in its own prefix with a unique hash, and packages “in” and environment are symlinked into a common prefix for that environment. There will be only as many installs of each package as you have unique configurations for it.

We also do the virtualenv trick, where we copy the python interpreter (and a few other things) into place so that python thinks it lives in the env, but everything else is just pointers back to the original packages. Practically speaking, that just means you can use pip in a spack env if you want to mix the two.

More on spack here. See also the tutorial on environments.

Also, FTR, Conda environments use hardlinking to automatically share packages which are installed accross environments.

1 Like

I am also interested in this. I believe the following discussion is related:

ftr - for personal reasons i haven’t worked any further on this

Thanks for the suggestions - Conda and Spack might be overkill for our use case and it sounds like there’s not an easy way to do this with pip :frowning:

There’s something I have been experimenting a bit with:

$ mkdir /pool
$ python3 -m venv /venv
$ /venv/bin/python3 -m pip install --target /pool pyramid
$ echo "/pool" > /venv/lib/python3.6/site-packages/pool.pth
$ /venv/bin/python3 -m pip list
$ /venv/bin/python3 -c "import pyramid"

See Python’s documentation chapter on “Site-specific configuration hook” for the details on what the effect of the *.pth exactly is. In short it adds the location of the pool directory to Python interpreter’s sys.path list. This way packages installed in this directory can be imported. This file can be added to as many site-packages as one wants, and the pool can be shared between virtual environments. This probably has a lot of limitations. But with some clever wrapping code around pip it could probably get somewhat useful, depending on the exact requirements.

Why “overkill”? Conda isn’t really more difficult to use than pip.