Hey folks, I’ve been thinking a lot about how to solve a particular problem I’m having, and I’m noticing that other systems have a different solution. I’m wondering if it would be possible to implement something that allows installing and importing multiple versions of a package.
Challenge
Let me start with the use case. I’m working in a production environment where I may need to upgrade my dependencies at any point. I’m in a really precarious position where installing new dependencies is fine, but upgrading/downgrading is nearly impossible because they may conflict with existing deployments. This is generally because python’s philosophy is around creating virtualenvs. Specifically, I have a packaging and deploy process that separates configuration. Today the way we do this is:
- Install any required dependencies to the entire production infrastructure
- Deploy the package to the production infrastructure
- Run tests on the deployed version
- symlink the “live” production version to the latest version just deployed
The challenge that this pattern poses is installing new dependencies that are not compatible with old dependencies. All packages need to be cross-compatible across versions of my application.
There are solutions to the current problem, each of which has its limitations. I could:
- Package all dependent code with the package. Install into a location in PYTHONPATH rather than site-packages
- Create virtualenv as part of the package deploy process. Cut out step 3 above and just cut over to the new live version as soon as deploy happens
- Install in some external system other than a python environment or PYTHONPATH (e.g. Docker image)
Anyways, I won’t go into all the pros/cons of those solutions, but I wonder if there’s a solution that mirrors the binary distribution pattern.
This deploy process above actually works great for binary dependencies. We can install e.g. libpng with multiple versions. Place those versions in the PATH. Symlink the major/minor/patch versions to the specific installation paths. Then any dependency has the freedom to load libpng by specifying any of:
- No version at all
- only the major version
- Any intermediate major/minor versions that are symlinked
- Full version, including whatever modifications we want to make (e.g. libpng9.9.9-tweak.2)
- Full absolute path
This works because *nix has a notion of PATH, and way to resolve binaries in order, and we take advantage of symlinking and semantic versioning to create a pattern. I believe this is a common pattern around the *nix world.
My thought is to have a similar pattern with PYTHONPATH. Can we allow multiple packages of one name to be installed into a given site-packages directories (so that we can still have our virtualenv and any future pep’s) and create a scheme where we import the most specifically correct version?
Having never written a PEP, I wanted to throw this out there and see what people think (and figure it out before I write a formal PEP). Here’s what I’m thinking:
New syntax for imports
The core feature is to be able to somehow specify at runtime (as opposed to installation time). My idea for modified syntax is:
import some_package version 1.2.3
from some_package version 1 import some_module
import some_package version 1.2 as sp
New functionality on imports
- Check if the package of the given name has already been imported
- If it has, check if it’s a compatible version to the one specified, and if it mismatches raise an
ImportError
- If it has, check if it’s a compatible version to the one specified, and if it mismatches raise an
- If the package is not already imported, search the following directories in order:
- each directory in PYTHONPATH
- current interpreter site-packages
- user site-packages
- system site-packages
- For each search directory it would find all packages matching the name
- Filter the packages for those that match
- If any package remains, import the highest version package
- If no package remains, continue to the next directory in the import list
Not having read the implementation of imports, I’m hoping the second step and sub-steps is pretty much the same as what currently exists.
Version compatibility
I would specify the versioning to intuitively be as specific as written. For example,
-
version 1
should take any1.*
version. -
version 1.1
should neither accept version versions1.0.*
nor1.2.*
but should accept any1.1.*
version as well as=== 1.1
- similarly for arbitrary levels of depth
For this reason I would either suggest one of the two following schemes
-
version X(.Y)*
should map to~= X(.Y)*.0
according to PEP 440 – Version Identification and Dependency Specification | peps.python.org -
version X
should map to== X.*
according to PEP 440 – Version Identification and Dependency Specification | peps.python.org
pip
Obviously python would need to be capable of understanding the new import statement, so any package supporting this feature should probably specify the supported python version in their package metadata. Since the package would lack backwards compatibility with previous python interpreter versions, pip should fail to install multiple versions of any package into a site-packages directory belonging to an interpreter below a specified version.
Since python needs to have a new way of finding packages with a matching scheme, pip needs to be modified to install packages according to the scheme. Whatever the scheme is for finding packages on disk with multiple versions pip needs to understand it. The most obvious solution to me is to move existing packages
Furthermore, pip currently fails to install if conflicting versions are specified. An optional flag may be required a la pip install --allow-multiple
, with fallback behavior being exactly what pip already does.
Finally, since a package with a given name but different versions can be imported into the same name, it should be an error to try to import multiple versions of the same name.
discussion
What do people think about the idea in general?
First and foremost I imagine this would reduce the need for having virtualenvs everywhere you have a script/package. It would not reduce the utility of creating a virtualenv to isolate your environment.
Secondly, I hope this helps decouple environment configuration from individual applications. One environment could legitimately support multiple systems.
Today python does a really great job of enable the reuse of existing modules during development. It’s super easy to package one simple package and depend on existing chunks of functionality. However, there is forever a tension between specificity of requirements and inevitable dependency hell, vs generality and broken forwards compatibility. I would hope that by providing developers with the ability to be specific about a version to their level of need, they can be as specific as necessary while letting the environment expand without conflicting. Furthermore, I hope it begins to encourage package maintainers to make major version bumps only when interfaces change, and be more clear about interface guarantees.
Finally, to address the inevitable “just run it in a Docker” comment, in my experience creating a docker image doesn’t give me any more isolation than I need compared with virtualenv. The only difference is that the environment is in the production package. IMHO it’s a pattern that reinvents the whole notion of having a cross-platform interpreter. Docker containers are great for running long-running services, like flask web servers, where the startup overhead doesn’t matter, and where external configuration and state may be very important. But, when running just one short-lived process as an entrypoint, startup time matters. From the configuration vs runtime perspective, it is actually equivalent to (1) zipping up a virtualenv, or (2) packaging up every dependency with my code, or (3) simply installing a virtualenv every time I distribute code. The proposed feature intends to offer an alternative approach that tries to make a production environment less brittle, rather than doubling down on the idea that an environment should never be shared.