Installing multiple versions of a package

Hey folks, I’ve been thinking a lot about how to solve a particular problem I’m having, and I’m noticing that other systems have a different solution. I’m wondering if it would be possible to implement something that allows installing and importing multiple versions of a package.

Challenge

Let me start with the use case. I’m working in a production environment where I may need to upgrade my dependencies at any point. I’m in a really precarious position where installing new dependencies is fine, but upgrading/downgrading is nearly impossible because they may conflict with existing deployments. This is generally because python’s philosophy is around creating virtualenvs. Specifically, I have a packaging and deploy process that separates configuration. Today the way we do this is:

  1. Install any required dependencies to the entire production infrastructure
  2. Deploy the package to the production infrastructure
  3. Run tests on the deployed version
  4. symlink the “live” production version to the latest version just deployed

The challenge that this pattern poses is installing new dependencies that are not compatible with old dependencies. All packages need to be cross-compatible across versions of my application.

There are solutions to the current problem, each of which has its limitations. I could:

  • Package all dependent code with the package. Install into a location in PYTHONPATH rather than site-packages
  • Create virtualenv as part of the package deploy process. Cut out step 3 above and just cut over to the new live version as soon as deploy happens
  • Install in some external system other than a python environment or PYTHONPATH (e.g. Docker image)

Anyways, I won’t go into all the pros/cons of those solutions, but I wonder if there’s a solution that mirrors the binary distribution pattern.

This deploy process above actually works great for binary dependencies. We can install e.g. libpng with multiple versions. Place those versions in the PATH. Symlink the major/minor/patch versions to the specific installation paths. Then any dependency has the freedom to load libpng by specifying any of:

  • No version at all
  • only the major version
  • Any intermediate major/minor versions that are symlinked
  • Full version, including whatever modifications we want to make (e.g. libpng9.9.9-tweak.2)
  • Full absolute path

This works because *nix has a notion of PATH, and way to resolve binaries in order, and we take advantage of symlinking and semantic versioning to create a pattern. I believe this is a common pattern around the *nix world.

My thought is to have a similar pattern with PYTHONPATH. Can we allow multiple packages of one name to be installed into a given site-packages directories (so that we can still have our virtualenv and any future pep’s) and create a scheme where we import the most specifically correct version?

Having never written a PEP, I wanted to throw this out there and see what people think (and figure it out before I write a formal PEP). Here’s what I’m thinking:

New syntax for imports

The core feature is to be able to somehow specify at runtime (as opposed to installation time). My idea for modified syntax is:

import some_package version 1.2.3
from some_package version 1 import some_module
import some_package version 1.2 as sp

New functionality on imports

  1. Check if the package of the given name has already been imported
    1. If it has, check if it’s a compatible version to the one specified, and if it mismatches raise an ImportError
  2. If the package is not already imported, search the following directories in order:
    1. each directory in PYTHONPATH
    2. current interpreter site-packages
    3. user site-packages
    4. system site-packages
  3. For each search directory it would find all packages matching the name
    1. Filter the packages for those that match
    2. If any package remains, import the highest version package
    3. If no package remains, continue to the next directory in the import list

Not having read the implementation of imports, I’m hoping the second step and sub-steps is pretty much the same as what currently exists.

Version compatibility

I would specify the versioning to intuitively be as specific as written. For example,

  • version 1 should take any 1.* version.
  • version 1.1 should neither accept version versions 1.0.* nor 1.2.* but should accept any 1.1.* version as well as === 1.1
  • similarly for arbitrary levels of depth

For this reason I would either suggest one of the two following schemes

pip

Obviously python would need to be capable of understanding the new import statement, so any package supporting this feature should probably specify the supported python version in their package metadata. Since the package would lack backwards compatibility with previous python interpreter versions, pip should fail to install multiple versions of any package into a site-packages directory belonging to an interpreter below a specified version.

Since python needs to have a new way of finding packages with a matching scheme, pip needs to be modified to install packages according to the scheme. Whatever the scheme is for finding packages on disk with multiple versions pip needs to understand it. The most obvious solution to me is to move existing packages

Furthermore, pip currently fails to install if conflicting versions are specified. An optional flag may be required a la pip install --allow-multiple, with fallback behavior being exactly what pip already does.

Finally, since a package with a given name but different versions can be imported into the same name, it should be an error to try to import multiple versions of the same name.

discussion

What do people think about the idea in general?

First and foremost I imagine this would reduce the need for having virtualenvs everywhere you have a script/package. It would not reduce the utility of creating a virtualenv to isolate your environment.

Secondly, I hope this helps decouple environment configuration from individual applications. One environment could legitimately support multiple systems.

Today python does a really great job of enable the reuse of existing modules during development. It’s super easy to package one simple package and depend on existing chunks of functionality. However, there is forever a tension between specificity of requirements and inevitable dependency hell, vs generality and broken forwards compatibility. I would hope that by providing developers with the ability to be specific about a version to their level of need, they can be as specific as necessary while letting the environment expand without conflicting. Furthermore, I hope it begins to encourage package maintainers to make major version bumps only when interfaces change, and be more clear about interface guarantees.

Finally, to address the inevitable “just run it in a Docker” comment, in my experience creating a docker image doesn’t give me any more isolation than I need compared with virtualenv. The only difference is that the environment is in the production package. IMHO it’s a pattern that reinvents the whole notion of having a cross-platform interpreter. Docker containers are great for running long-running services, like flask web servers, where the startup overhead doesn’t matter, and where external configuration and state may be very important. But, when running just one short-lived process as an entrypoint, startup time matters. From the configuration vs runtime perspective, it is actually equivalent to (1) zipping up a virtualenv, or (2) packaging up every dependency with my code, or (3) simply installing a virtualenv every time I distribute code. The proposed feature intends to offer an alternative approach that tries to make a production environment less brittle, rather than doubling down on the idea that an environment should never be shared.

1 Like

Four things. One, this is a massive ask. Changing the syntax and asking all packaging tools to change is not a small thing.

Two, you didn’t outline how you plan to actually making importing work. For instance, how are you going to separate the module versions in sys.modules to control for the versions? Have you tried prototyping this?

Three, do you know if your proposed syntax can be supported? Using * is ambiguous and I don’t know if the grammar could support it.

Four, is your version language expressive enough to cover all potential version constraints? If not then you will probably need to justify why requirements files need to specify something more specific than you can as an import.

I’m afraid the only way to even consider this is with a working proof-of-concept to understand the proposed semantics and that won’t be a simple thing to do. You could try to modify importlib as appropriate and just use a function to start to avoid having to make syntactic changes.

Thanks for the feedback Brett. I appreciate it. I could use a bit of help with understanding what the current implementation looks like and how to show a proof of concept.

There are a couple of points that I think bear clarification. I’ve responded to some of your individual comments below.

Apologies for another long one :innocent:

Well, actually my plan was not to change packaging tools at all. I’m kinda leaving it to developers to know that their package uses a syntax that isn’t backwards-compatible and declaring it in their package metadata with whatever tools they’re currently using. This is basically the same as if you use the async keyword. In fact it should be less painful because nobody was going to accidentally use extra syntax in their import statements with current versions of the interpreter.

I don’t. I’ve never read the parser and I have no idea what the implementation looks like. I’ve never even contributed to the python language. Do you have suggestions about where I can look to see if this is possible?

To be clear, I don’t mean to support * in the import statement itself. I meant that to be an example of how to interpret a version constraint in an import statement. I hoped the examples were clear, but maybe you can suggest what makes it seem ambiguous?

In order to keep things simpler, I was shooting for a subset of PEP-440 version constraints. I think it’s sufficient to cover 90% of use cases by saying “this import requires version 2 of this package”, or even “this import requires version 2.3.2.1”. Since this is a runtime syntax, the idea is that as a package developer I could disambiguate which installed dependency is mine vs another package’s. Today pip makes some best efforts, but definitely lets you have package versions that aren’t the ones you specified. If pip could install a second version and you could declare which one of the installed versions you prefer, then you may avoid some of those “s*$% I just installed another package and now my existing ones don’t work because of installed dependencies” or “I just installed this same package in another environment, why doesn’t it work in this environment?”

Mostly to keep things simple. I wanted to avoid the scenario where a developer is always duplicating all the packaging constraints into code. That’s bad. In general my hope is that it would separate between API-breaking changes in packages, vs implementation changes that need to be avoided. You could use API version 2 and you’ll know that the calling code will succeed. However the package could be more specific if testing surfaces a bug with one particular version. The API is still callable, but the package needs to be more specific. Similarly with bugfix versions and security patches.

It would be really nice to be able to run two different flask applications from one environment. Imagine I could say “install flask app A from source X, and app B from source Y” and not have to worry about whether flask or werkzeug or whatever database you’re connecting to is different between the two applications. They just work. Today, if they conflict then you’ll immediately jump to “make a virtualenv” and then you’ll have to invent your own solutions for how to launch each application by either setting PYTHONPATH, or invoking python explicitly, or creating wrapper callers with a more specific shebang line. It’s not very nice to end users of applications. And the truth is most of us don’t apt-get install some_flask_app because the boiler-plate knowledge of “how to run a python application” without having these dependency issues adds mental load to end-users.

Again, the packaging tools don’t loose any specificity here. The packager still holds the same burden of understanding what versions of their dependencies are compatible with their package, and specifying that with whatever tools they’re using. But consumers of a package gain the ability to have a package installed into an environment where some other version of a dependency may already be installed. It is also more useful if package developers are clear about major- and minor-version API changes. I’m not sure that numpy could benefit from this type of import statement as it is today, but I imagine that flask and werkzeug could.

Hopefully the motivation is more clear?

This gets to “how” and not “what”. I’m happy to implement a reference implementation, but I am afraid of making a proposal that is too specific to an implementation and not a specification. Is there something I can do to to make a proposal of “what” more palatable without implementing the solution in cpython?

I just started to look at importlib in cpython. I’ll see if I can understand it. Any help with getting started would be really great.

That’s very much going to depend then on how you plan to implement this. And when I say “packaging tools”, I’m talking about what you’re going to require of pip when it installs something so that your implementation knows what version of a package is installed (if any), as your proposal will lead to reading that sort of information way more than it currently is and thus could be a performance bottleneck.

We just landed a new parser, so my old suggestion would no longer hold. But …

" version X(.Y)* should map to ~= X(.Y)*.0" suggested you wanted that actual syntax to work, not that it was a regex to suggest potential valid version numbers. But if that is just a regex then your import _ version _ should be fine to parse.

But that very much assumes that SemVer is a universally used thing. For instance, attrs · PyPI uses CalVer. So does pip. So a new major version does not necessarily communicate a breaking change, just a new release at a new date (with all the bugfixes). I wrote a blog post on how SemVer isn’t always a good fit, especially for libraries.

Sure, this is the npm module. But that model also has its own drawbacks, such as not being able to force your code to use versions of packages that do not have some critical security vulnerability. There is very much a pro/con to Python’s current approach, but there is also one for the approach you’re advocating for. IOW people have made this suggestion many times over Python’s 30 year history and there are reasons we have stayed with what we got.

If you can be extremely clear in the semantics and how you expect Python to implement, then sure. But then at that point people will probably want a proof-of-concept to be able to play with it. I’m not suggesting a PoC is what the spec is, but it is an implementation of the spec people can try out and to help make sure the spec covers all cases.

But I will say I don’t think this has a chance of being accepted. This has been brought up many times in the past and the perceived benefits have never been enough to overcome the status-quo (both from momentum and the benefits the current approach has). But if you do choose to pursue it, then good luck!

Hmm, yes I see your point. It’s a strong argument for the environment to have more control over what gets used than the developer, especially for security fixes. After all, I know more about what my requirements are at the time I’m installing/running something than at the time I’m uploading it to pypi for literally anybody in the world to use.

Perhaps I’ve deceived myself into thinking os package managers had a better solution. I can’t remember a single time I tried to apt-get install a package and had it tell me “no”. Perhaps the fact that python is a programming language, and reusing code is a different pattern than installing applications.

Another argument against versioned imports is binary compatibility. JavaScript “solves” this by provides no object structure at all, but Python has a more rigid object model and would run into issues if e.g. library A 1.0 returns an object of type X, but library B expects an input of type X from A 2.0. This is also a problem with system paackage managers as well, and their solution is to put the responsibility on library developers and packagers, forcing them to consider this issue when releasing new versions. Python package authors do not have such limitations, however, so it’d be very possible that a vast number of packages would have problems working together if we switch to allow multiple versions of a same package in an environment.

I would also say that ultimately, all these problems to the multi-version feature are entirely solvable, as evident by ecosystems that do this (C, JavaScript, and Rust, to name a few). But they have to develop tools for them, and the problem is whether it is worth the effort to do all those, while a current “flattening” approach works well enough most of the time.

I believe my proposed import constraints would prevent that issue rather than create it. Of course python is also dynamically typed so nothing stops you from passing an object around without imports entirely. But my proposal would prevent library B from import A 2.0 if A 1.0 was already loaded. One package that depends on multiple versions of a library is still invalid.

This has been brought up before multiple times (here’s an easy-to-find recent instance: Allowing Multiple Versions of Same Python Package in PYTHONPATH - #9 by uranusjr).

I’ll note that easy_install does provide such functionality as well and… is on the path of deprecation because no one used it enough to justify maintaining that tool. :slight_smile:


You might not have encountered it, but it certainly happens a lot. The output line for users hitting dependency conflicts with apt is “The following packages have unmet dependencies:”. I’m certain that a web search will find more than a hundred thousand results for that term.

I’ll note that this happens even though Debian is curating/managing/patching the set of packages in the central repository. This is unlike Python, where anyone can upload packages to PyPI.

I suggest you look WAY back in the archives of various mailing lists for discussion of this approach. It was talked about a lot and rejected - with virtual environments being ultimately the solution.

In fact, before virtualenv was a thing, wxPython had its own version selection mechanism, called wxversion, and I’m pretty sure wxGTK had another, and as has been pointed out, setuptools included a more generic system. In the end, none of them caught on, and virtual environments have been pretty successful.

I think one reason that a C dynamic linking-style approach didn’t “take” is that C programs are linked at compile time, then again at run time. Whereas Python packages are imported only at run time. So you don’t know if your various packages requirements are met, or in conflict, until you run the code.and it also means that requirements are specified in the code itself, rather than as metadata that can be checked at install time.

What I’m getting at is: your application requires package a and b. Package a and b both require package c, but with different (but overlapping) versions. So you really want to resolve all this at install time, not run time.

It would also get a lot harder to test, and document what’s been tested against.

Finally, I really don’t understand how virtual environments don’t solve your problem. why do you feel the need to run two different apps in the same environment?

Environments are pretty lightweight. I mostly work with conda: a conda environment consists of links to the packages—so if you have two environments that only differ by one package, only the one is duplicated.

So what’s the downside?

-CHB

3 Likes

Might this approach be a good idea?

pip install pandas==1.1.3 --as=pandas_1_1_3

  • Which would create a folder with the name pandas_1_1_3 in PATH, rather than just pandas
  • The import syntax could then be import pandas_1_1_3 as pandas

FYI, this question has been discussed at great length by now, and it is generally concluded that there are too many hurdles to make this practically useful. See

for some of the most recent discussion.

This is basically just vendoring the import package, but but exposed externally as a distribution package. This has a number of issues; listing a few off the top of my head:

  • A lot of details would need to be handled and worked out (e.g. distribution metadata), often with non-trivial or non-optimal solutions.
  • Non-relative imports within pandas itself would still import from whatever version of the package is installed under its standard import name, which would very likely break, unless there was some magical rewriting done (which could not handle all cases reliably)
  • Code wanting to use this version of the package would need to be rewritten too in order to use the new name
  • That code would also need full control over what that name is, or be manually rewritten by the user accordingly.
  • What would be done about Pandas’s own dependencies? Are they just installed in the regular site-packages? What if they are incompatible with anything else installed (which is a big reason for allowing multiple installs in the first place)? If not, how or where are they installed?

On top of the issues specific to this proposal, it also runs into the same fundamental issues discussed on the above-mentioned thread—if at any point the top-level code calling pandas, or any code that code calls, or any code that code calls, ad nauseum, needs/uses a different version of pandas (and if it didn’t, you wouldn’t need multiple version support in the first place), and anything from any of that code interacts with any of the objects (functions, classes, methods, dataframes, timestamps, etc) from the other copy you’ve installed, you’re likely in all kinds of nasty trouble.

At that point, if you absolutely, positively cannot get it to work in separate virtual environments or with compatibility fixes, you may as well just vendor the packages you need (which is what this is doing anyway), which avoids many of these problems and allows you full control to reliably hack around the rest.

To jump back to your actual requirements, it sounds like you are trying to deploy a tool/service as a single unit, right? If so, have you considered doing what you suggest above, using a tool like pex? That is essentially a zipped up virtualenv, and once added to PATH, it can be run as a tool by a human, or you can just setup a systemd unit (or similar) to kick off a service.

That might be a simpler mechanism to get the isolation you want (being able to upgrade a dist you depend on) without the overhead (docker daemon, etc) of a running container.

A touch of history:

Way back in the day, before pip, and virtualenv, and …

Python package versioning was problematic, and there were a number of ways folks tried to solve it. In particular, wxPython had wxversion, that worked something like this (from memory):

import wxversion
wxversion.select('1.2')

And I believe wxGTK, and probably other packages, had their own custom solution as well.

A number of us advocated for building something like this into Python as a standard system. However, there are a lot of issues with this, some of which have been brought up in this thread.

In the end it was the consensus in the community that individual package versioning was not the right solution, and rather, some sort of virtual environment was the right way to go.

So I really don’t think it’s worth bringing up again – trust the very smart people in the community (of years ago :slight_smile: ) – use an environment system to control the whole set of packages for an application instead:

Docker, conda, virtualenv, probably others, all work great.

I don’t think those tools actually address the underlying issue that often motivate folk to want this functionality. For example, a complicated data processing / ML library might have deps A, B, and C which all internally depend on numpy and pandas but make resolving specific versions impossible. This forces vendoring of, say, C to use its own vendored copies of numpy and pandas.

Scenarios like the above suggest there aught to be a mechanism to allow optional isolation of certain dependencies (in a transparent fashion?). Perhaps put another way, I suspect most programming languages would benefit greatly if they adopted a dependency resolution and import strategy that takes after the Nix philosophy. Just like I should be able to build two application binaries that depend on conflicting versions of a lib (by pulling in the specific lib versions as necessary), I should generally be able to import two packages into my codebase that “internally” use conflicting versions of the same package.

The forced vendoring scenario I describe above could probably be handled fairly easily if there were import semantics that could optionally check for vendored deps. For example, suppose in my hypothetical library deps A and B can both make use of numpy>=1.2 whereas dep C requires numpy<1.2. If authors of A, B and C could write an import statement such as

from _ import numpy as np

which resolved to either import numpy as np or import <package>.numpy as np depending on whether there’s some version of numpy vendored in a namespace package. Then a package manager like pip could auto-vendor numpy by detecting the failed resolution of numpy and install numpy==1.1.x to a namespace package C.numpy and numpy==1.2.y to the usual site packages.

I’m not making a specific suggestion for the exact syntax or anything, and understand a lot of the subtle issues this can cause, especially in languages like Python, but it certainly would solve a lot of pain points I’ve personally faced.

3 Likes