I’d like to be able to set up a large number (~100) of virtualenvs on a single system. Many of these virtualenvs will share the same dependencies of the same version although some may have different versions of those dependencies. Suppose I have envs A, B and C all using requests 2.20.0 and env D using 2.22.0.
If I simply run pip installs for all of the envs, I’ll have three copies of requests 2.20.0 on my file system. Is there any standard pattern to avoid that wasted disk space? So for example to have the actual requests 2.20.0 sit in some sort of cache and have each of the virtualenvs either symlink to that folder from its site-packages or to reference that folder with a line in a .pth file.
Sort of the equivalent of pip install -e but against a local cache of packages.
Doing this would also make spinning up a new virtualenv that includes most of the same packages super fast as they wouldn’t need to be downloaded from anywhere, copied or installed.
Each package is installed in its own prefix with a unique hash, and packages “in” and environment are symlinked into a common prefix for that environment. There will be only as many installs of each package as you have unique configurations for it.
We also do the virtualenv trick, where we copy the python interpreter (and a few other things) into place so that python thinks it lives in the env, but everything else is just pointers back to the original packages. Practically speaking, that just means you can use pip in a spack env if you want to mix the two.
More on spack here. See also the tutorial on environments.
See Python’s documentation chapter on “Site-specific configuration hook” for the details on what the effect of the *.pth exactly is. In short it adds the location of the pool directory to Python interpreter’s sys.path list. This way packages installed in this directory can be imported. This file can be added to as many site-packages as one wants, and the pool can be shared between virtual environments. This probably has a lot of limitations. But with some clever wrapping code around pip it could probably get somewhat useful, depending on the exact requirements.
This leads to the OP’s scenario where we have 100’s to 1000’s of virtualenv per robot. Most with identical dependencies. This leads to our deployment artifacts being very large.
Not much, no (mostly because it is not an annoyance for my use cases). Many things have changed in Python’s packaging ecosystem since 2020, maybe there are fresh ideas now. You could start a new thread (linking to this one), see if if the state of the art has progressed in 4 years. I do not recall having seen anything new in this specific area recently.