Allowing Multiple Versions of Same Python Package in PYTHONPATH

TLDR; I wanted to get feedback on a potential feature that may be added to nixpkgs that allows multiple versions of the same python package to be installed in the same PYTHONPATH. This is a general approach that is not specific to nixpkgs and could be used in other package managers. The only nix specific part is the tooling to allow for the building of these specialized packages. All of the materials/demo is in this repo https://github.com/costrouc/python-multiple-versions. Sorry discourse prevented sharing all linksā€¦ to only 2 so go to the repo for actual html links. I want to clarify I definitely donā€™t think this is a feature that should be regularly used nor depended on. But for a package manager to provide a consistent place where all packages are ā€œnearlyā€ compatible a trick like this is needed for nixpkgs and possibly others.

Demo of Multiple Python Versions

This is a self contained demo of having multiple versions of a python
package in the same PYTHONPATH. In this demo bizbaz requires flask=0.12.4 and foobar requires flask>=1.0. It requires nix (sorry no windows support in nix). This
idea is not nix specific but would rely on package managers/builds to
allow for multiple versions.

$ nix-shell
...
[nix-shell:~/p/python-multiple-versions]$ python
Python 3.7.4 (default, Jul  8 2019, 18:31:06) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import foobar; foobar.foobar()
I am using flask version 1.0.3
>>> import bizbaz; bizbaz.bizbaz()
I am using flask version 0.12.4
>>> quit()
$ echo $PYTHONPATH
...:/nix/store/f3j11lk2m8ddw2j2axvcdfc2al2bk98c-flask-0.12.4/lib/python3.7/site-packages:.../nix/store/wv42si07c8wd64ravd4va4kh4j7prwlk-python3.7-Flask-1.0.3/lib/python3.7/site-packages:...

Motivation

In nixpkgs we like to have a
single version of each package (preferably latest) with all packages
compatible with one another. Often times it is true that two packages
may be incompatible with one another but if it is a compiled
library/binary we have luxury of rewriting the shared library path
allowing two packages that use different versions of a package to
coexist. In python this philosophy breaks down because all packages
are specified in the global PYTHONPATH. This means that if a package
requires import flask it searches the path for flask and uses the
one that it finds.

For nixpkgs this is troublesome because it prevents all packages from
being compatible with one another.

Examples of Issue

  1. jsonschema. jupyterlab_server
    requires jsonschema >= 3.0.1 and cfn-python-lint did not
    support jsonschema 3 until about a month
    ago. 3.0 was released in February!

  2. Some packages fix the version of a package such that other packages
    in the same PYTHONPATH cannot depend on the latest version. For
    example apache-airflow fixes pendulum ==
    1.4.4. That
    pendulum release is over 1.5 years old and libraries .io reports that 400+ packages depend on pendulum. We cannot let a single package restrict the version of other packages.

How does this work?

I wrote a tool python-rewrite-imports that helps to make multiple versions possible. Lets say that package bizbaz wants an old version of flask==0.12.4 but we have another package foobar that requires the latest version of flask>=1.0. Normally these two packages would be incompatible. In order to do this we:

  1. Create a build of flask for 0.12.4 and install
  2. Use Rope to rewrite all the imports of flask of itself to flask_0_12_4_1pamldmw2y7g and rename the package to flask_0_12_4_1pamldmw2y7g
  3. Rename the dist in site-packages and move the package to flask_0_12_4_1pamldmw2y7g
  4. Rewrite all imports of flask in bizbaz to flask_0_12_4_1pamldmw2y7g

Rewriting all imports is done with Rope a robust python refactoring tool.

Current Limitations

  • Wanting several versions of a package that builds c-extensions
    looks a little hard than rewriting the imports?
  • Suppose package A requires C==1.0.0 and B requires
    C>=1.1. Letā€™s say that package B calls a method in A with a
    structure built from C>=1.1 and then A proceeds to call its
    package C with that data. This will probably not happen often.
  • Rope does not handle all rewrites currently in
    python 3. Expressions within fstrings are the only example that I
    know of.
  • It is impossible for Rope to handle all import rewrites. For
    example. import flask; globals()[chr(102) + 'lask'].__version__

I believe for the vast majority of packages that require multiple
versions these issues will be rare.

2 Likes

Links:

  • nixpkgs a general package manager with 45k+ packages and 5k+ python packages.
  • Rope python refactoring tool used in editors

Crosslink to discussion at Nixpkgs Discourse https://discourse.nixos.org/t/allowing-multiple-versions-of-python-package-in-pythonpath-nixpkgs/3849

Situations that will need to be covered as well:

  1. In a development environment where one works with plain Python as source, one needs to be able to perform an import flask instead of import flask_0_12_4_1pamldmw2y7g. I suppose shims are going to be needed that map to a certain hashed import.
  2. Similarly, dynamic imports will resolve to something like import flask (as already mentioned in ā€œCurrent limitationsā€).

My worry is that this will cause tons of bugs to be filed on random packages because of subtle breakage youā€™re introducing, and package maintainers will have no idea whatā€™s going on. All those ā€œcurrent limitationsā€ are things that real packages do.

If you do this, please print some kind of banner at startup and in exception tracebacks noting that this is a nix-modified python setup and that any bugs should be reported to nix only.

1 Like

Note that setuptools did something like this (but not nearly as invasive, as far as I know, as it didnā€™t have multiple versions of a package active at the same time), and even that has since been mostly abandoned as a bad idea (as I understand it).

I agree with @njs, this should be considered extremely non-standard, and very definitely ā€œuse at your own riskā€. Package maintainers should not be expected to support this usage.

1 Like

i consider it very worrying that instead of replacing/hooking __import__ module code gets actually changed

i recall a experiment with the import hook system enabling something comparable with import hook mechanism and ensuring the hook mechanism is correctly interjected

unfortunately i forgot the name of the experiment

1 Like

I know Armin Ronacher wrote a POC before. Not sure if this is the same one as you have in mind (there are many similar experiments).

1 Like

i refer to something one of the twisted developers experimented with, which could do multi version imports without code changes, however the changes to the import system where too invasive to have a future

I was going to post something similar to python-ideas but just found this existing topic.

So, I had the same idea, but not related to nixpkg.

With the growth of the packaging ecosystem comes an increasing risk of dependencies version conflict.

I like Pipā€™s description of the problem: letā€™s say we have package tea which depends on boiling_water v1.0 and coffee which depends on boiling_water v2.0 . If I create package afternoon_drink which depends on both coffee and tea, there is
no correct version of boiling_water to use.

One solution could be side by side installation of different versions of the same package.

In practise, this could take the form of an additional directory similar to the site-packages directory. One could imagine a versionned-packages directory with the following content:

versionned-packages
	+ boiling_water
		+ v1.0/
			+ ...
		+ v2.0/
			+ ...

The files composing my initial example are:

file boiling_water.py (v1.0)

class HotWater:
	pass

def heat_water():
	return HotWater()

file boiling_water.py (v2.0)

# notice how v2.0 of boiling_water is incompatible with v1.0
class BoilingWater:
	pass

def boil_water():
	return BoilingWater()

file tea.py

import boiling_water

class Tea:
	def __init__(hot_water):
		pass

def prepare_tea():
	# uses v1.0 of boiling_water
	hot_water = boiling_water.heat_water()
	tea = Tea(hot_water)
	return tea

file coffee.py

import boiling_water

class Coffee:
	def __init__(boiling_water):
		pass

def prepare_coffee():
	# uses v2.0 of boiling_water
	boiling_water = boiling_water.boil_water()
	coffee = Coffee(boiling_water)
	return coffee

file afternoon_drink.py

import tea
import coffee

def prepare_afternoon_drinks():
	my_tea = tea.prepare_tea()
	my_coffee = coffee.prepare_coffee()
	return [my_tea, my_coffee]		

The next step is to define how to import these versionned packages. I imagine something looking like this:

file tea.py (with versionned packages)

boiling_water = import_with_version('boiling_water', '1.0')

class Tea:
	def __init__(hot_water):
		pass

def prepare_tea():
	# uses v1.0 of boiling_water
	hot_water = boiling_water.heat_water()
	tea = Tea(hot_water)
	return tea

file coffee.py (with versionned packages)

boiling_water = import_with_version('boiling_water', '2.0')

class Coffee:
	def __init__(boiling_water):
		pass

def prepare_coffee():
	# uses v2.0 of boiling_water
	boiling_water = boiling_water.boil_water()
	coffee = Coffee(boiling_water)
	return coffee

The function import_with_version() would use the existing import machinery and add two functionalities:

  • the imported module would be searched in versionned_packages, using the package name + version identifier instead of site-packages
  • the resulting module would not be put into sys.modules but returned directly.

I believe this can solve the practical problem of conflicting package versions.

If I go one step further, one limitation of the solution above is the assumption that I can freely modify coffee.py and tea.py . Modifying dependencies is usually not
possible nor desirable. A more evolved approach would probably look like this.

file afternoon_drink.py

with ctx_import_with_version('boiling_water', '1.0'):
	import tea

with ctx_import_with_version('boiling_water', '2.0'):
	import coffee

def prepare_afternoon_drinks():
	my_tea = tea.prepare_tea()
	my_coffee = coffee.prepare_coffee()
	return [my_tea, my_coffee]		

The context manager ctx_import_with_version() would have to hook into the import
machinery so that when tea.py performs its import boiling_water, it gets the v1.0
version and coffee.py should get the v2.0 .

I am not that familiar with the Python import machinery, but from what I know, it does not look extra difficult. There are probably many edge cases which I am missing and which would require clever treatment but the general idea stands.

I am curious to know what other people think of this approach ?

Cheers,

Philippe

This can already be implemented using a 3rd party library, though Iā€™d advise against it as it may lead to bugs when packages assume their are the only version and have global variables, etc

If you have a a very specific use case, this might be a reasonable thing to do, but itā€™s probably a footgun for general users, so I am not seeing this becoming a thing anytime soon.

Agreed, and this was something that was in fact tried in the past, as part of the ā€œeggā€ format introduced in early versions of setuptools. In general, the approach is considered to have failed, and it caused more issues than it solved.

If youā€™re interested in finding out more, Iā€™d suggest researching how eggs and pkg_resources did this. But like @FFY00 Iā€™d be very surprised if there was any interest in this being a generally available feature.

1 Like

Thanks for the feedback. The number of packages has increased quite a lot since the introduction of the egg format. I would expect the problem to be more prevalent today, so to have more users interested into a workaround.

I have no specific use case myself, but I am surprised not to see the solution being proposed a bit more. Package conflicts are a reality of today.

I suppose a better solution is to encourage the upstream developers
of conflicting packages to improve compatibility. In the worst case
where compatibility cannot be achieved, an alias (e.g. pyreadline
vs pyreadline3) can be created.

I donā€™t think downstream maintainers will be happy with multiple
versions of a same package, plus single versioning makes it easier
to roll out bug fixes for every package using an affected library.

To explain a bit why this isnā€™t as desirable as it might seem: imagine I am using package A which imports NumPy 1.21, alongside package B which uses NumPy 1.16. I call A.load_data() which returns a NumPy array, then I pass it to B.process(arr). Either B thinks it hasnā€™t got a NumPy array at all (isinstance(arr, numpy.ndarray) is False), or it has an array that may have a different layout than its NumPy library expects (different attribute & method names at the Python level, different memory layout at the C level).

Being restricted to a single version of a library actually makes life much simpler: within a process, you can assume that a NumPy array is a NumPy array, and you can always pass it back to NumPy functions. The alternatives would be that either the NumPy developers need to plan for in-memory forwards & backwards compatibility - handling an array created by both older & newer versions of NumPy - or that code using NumPy has to keep track of types like ā€˜NumPy 1.21 arrayā€™ and convert between different variants of the same thing.

Iā€™m using NumPy as an example, but this would affect any library which defines its own classes and allows them to be used from outside its own code.

2 Likes

Your case makes sense but itā€™s the less common. The problem I described are sub-dependencies conflict. Being sub-dependencies, they are much more unlikely to be mixed together.

In an ideal world, yes. In practice, many packages have no maintainer, thatā€™s why they are likely to depend on an outdated version, which may eventually conflict with the latest version.

Look how many packages are still only python2 compatible.

You are assuming that the person haveing the problem has the control on one of its direct dependency and is able to patch it to rename a subdependency. Thatā€™s a significant effort and is simply not possible in many situatinots.

The problem exists, irrespective of whether it makes downstream maintainers happy or not. In the example I gave, you are faced with the following choices :

  1. port package Tea to a new version of boiling_water
  2. port package Coffee to an old version of boiling_water
  3. give up because you have no porting effort capabilities
  4. find a solution with multiversionning

Version 4 might be the only realistic and affordable solutions in many real-world situations.

How so? I do not understand this statement.

How is it more realistic and affordable to implement a whole new mechanism in Python, as opposed to upgrading a package to a newer version of its dependencies? Is it because youā€™re assuming that ā€œsomeone elseā€ would do the work in one case? The fact that you list ā€œgive up because you have no porting effort capabilitiesā€ suggests that you probably also donā€™t have the capability to implement multiversioning, so essentially youā€™re hoping that someone else will solve the problem for you.

Iā€™d expect that the cost would be far higher for (4), even if you allow for the fact that such a solution would only be implemented once whereas you might have to upgrade more than one package.

As Iā€™ve already said, multiversioning solutions have already been tried and proved unworkable in practice. While someone might be willing to have another go at the problem, Iā€™m pretty sure that things havenā€™t changed enough to alter the outcome. But if you (or anyone else) wants to try, no-one is going to stop you. Just donā€™t expect it to be easyā€¦

1 Like

I hope you donā€™t feel as if your idea was immediately shot down. However, I think the issue here is youā€™re proposing a hypothetical solution to a hypothetical problem, without a substantial number of compelling real-world use cases that motivate the resources, complexity and risk the solution entails, in the face of a number of hypothetical and practical barriers to its implementation, and when there have been a number of real-world efforts to solve it over the years by some of Pythonā€™s most knowledgeable and experienced developers that have been uniformly unsuccessful and not merited sufficient real-world interest to sustain them.

Particularly in the scientific Python ecosystem, at least from my experience as a user, developer and maintainer, Iā€™d say its itā€™s much more common than not that when two packages depend on one or more of the same packages deeper in the stack (Numpy, SciPy, Pandas, Matplotlib, Cython, Sympy, xarray, Dask, etc), thereā€™s some form of direct or indirect data interchange, or other cross-dependency between at least one pair of them, if not most. In particular, when they are used, numpy arrays, pandas dataframes, xarray objects are routinely exchanged, and code compiled with different Cython versions (if they actually merited a hard dependency non-overlap) may well be ABI-incompatible and cause a C-level hard crash.

As such, it seems likely that this will further break as many packages as it will fix, and in ways that can be far harder to debug and recover from than a simple dependency conflict on installation, this does not really seem to be a viable solution, relative to other strategies. Of course, youā€™re always welcome to work on a proof of concept implementation, but I would suggest looking into real-world examples of conflicts affecting popular packages before investing too much time and effort into an approach that may or may not be useful to more than a niche handful of developers. Best of luck!

2 Likes