Crosslink to discussion at Nixpkgs Discourse https://discourse.nixos.org/t/allowing-multiple-versions-of-python-package-in-pythonpath-nixpkgs/3849
Situations that will need to be covered as well:
- In a development environment where one works with plain Python as source, one needs to be able to perform an
import flaskinstead of
import flask_0_12_4_1pamldmw2y7g. I suppose shims are going to be needed that map to a certain hashed import.
- Similarly, dynamic imports will resolve to something like
import flask(as already mentioned in “Current limitations”).
My worry is that this will cause tons of bugs to be filed on random packages because of subtle breakage you’re introducing, and package maintainers will have no idea what’s going on. All those “current limitations” are things that real packages do.
If you do this, please print some kind of banner at startup and in exception tracebacks noting that this is a nix-modified python setup and that any bugs should be reported to nix only.
Note that setuptools did something like this (but not nearly as invasive, as far as I know, as it didn’t have multiple versions of a package active at the same time), and even that has since been mostly abandoned as a bad idea (as I understand it).
I agree with @njs, this should be considered extremely non-standard, and very definitely “use at your own risk”. Package maintainers should not be expected to support this usage.
i consider it very worrying that instead of replacing/hooking
__import__ module code gets actually changed
i recall a experiment with the import hook system enabling something comparable with import hook mechanism and ensuring the hook mechanism is correctly interjected
unfortunately i forgot the name of the experiment
I know Armin Ronacher wrote a POC before. Not sure if this is the same one as you have in mind (there are many similar experiments).
i refer to something one of the twisted developers experimented with, which could do multi version imports without code changes, however the changes to the import system where too invasive to have a future
I was going to post something similar to python-ideas but just found this existing topic.
So, I had the same idea, but not related to nixpkg.
With the growth of the packaging ecosystem comes an increasing risk of dependencies version conflict.
I like Pip’s description of the problem: let’s say we have package tea which depends on boiling_water v1.0 and coffee which depends on boiling_water v2.0 . If I create package afternoon_drink which depends on both coffee and tea, there is
no correct version of boiling_water to use.
One solution could be side by side installation of different versions of the same package.
In practise, this could take the form of an additional directory similar to the site-packages directory. One could imagine a
versionned-packages directory with the following content:
versionned-packages + boiling_water + v1.0/ + ... + v2.0/ + ...
The files composing my initial example are:
file boiling_water.py (v1.0)
class HotWater: pass def heat_water(): return HotWater()
file boiling_water.py (v2.0)
# notice how v2.0 of boiling_water is incompatible with v1.0 class BoilingWater: pass def boil_water(): return BoilingWater()
import boiling_water class Tea: def __init__(hot_water): pass def prepare_tea(): # uses v1.0 of boiling_water hot_water = boiling_water.heat_water() tea = Tea(hot_water) return tea
import boiling_water class Coffee: def __init__(boiling_water): pass def prepare_coffee(): # uses v2.0 of boiling_water boiling_water = boiling_water.boil_water() coffee = Coffee(boiling_water) return coffee
import tea import coffee def prepare_afternoon_drinks(): my_tea = tea.prepare_tea() my_coffee = coffee.prepare_coffee() return [my_tea, my_coffee]
The next step is to define how to import these versionned packages. I imagine something looking like this:
file tea.py (with versionned packages)
boiling_water = import_with_version('boiling_water', '1.0') class Tea: def __init__(hot_water): pass def prepare_tea(): # uses v1.0 of boiling_water hot_water = boiling_water.heat_water() tea = Tea(hot_water) return tea
file coffee.py (with versionned packages)
boiling_water = import_with_version('boiling_water', '2.0') class Coffee: def __init__(boiling_water): pass def prepare_coffee(): # uses v2.0 of boiling_water boiling_water = boiling_water.boil_water() coffee = Coffee(boiling_water) return coffee
The function import_with_version() would use the existing import machinery and add two functionalities:
- the imported module would be searched in
versionned_packages, using the package name + version identifier instead of
- the resulting module would not be put into
sys.modulesbut returned directly.
I believe this can solve the practical problem of conflicting package versions.
If I go one step further, one limitation of the solution above is the assumption that I can freely modify coffee.py and tea.py . Modifying dependencies is usually not
possible nor desirable. A more evolved approach would probably look like this.
with ctx_import_with_version('boiling_water', '1.0'): import tea with ctx_import_with_version('boiling_water', '2.0'): import coffee def prepare_afternoon_drinks(): my_tea = tea.prepare_tea() my_coffee = coffee.prepare_coffee() return [my_tea, my_coffee]
The context manager
ctx_import_with_version() would have to hook into the import
machinery so that when
tea.py performs its
import boiling_water, it gets the v1.0
coffee.py should get the v2.0 .
I am not that familiar with the Python import machinery, but from what I know, it does not look extra difficult. There are probably many edge cases which I am missing and which would require clever treatment but the general idea stands.
I am curious to know what other people think of this approach ?
This can already be implemented using a 3rd party library, though I’d advise against it as it may lead to bugs when packages assume their are the only version and have global variables, etc
If you have a a very specific use case, this might be a reasonable thing to do, but it’s probably a footgun for general users, so I am not seeing this becoming a thing anytime soon.
Agreed, and this was something that was in fact tried in the past, as part of the “egg” format introduced in early versions of setuptools. In general, the approach is considered to have failed, and it caused more issues than it solved.
If you’re interested in finding out more, I’d suggest researching how eggs and pkg_resources did this. But like @FFY00 I’d be very surprised if there was any interest in this being a generally available feature.
Thanks for the feedback. The number of packages has increased quite a lot since the introduction of the egg format. I would expect the problem to be more prevalent today, so to have more users interested into a workaround.
I have no specific use case myself, but I am surprised not to see the solution being proposed a bit more. Package conflicts are a reality of today.
I suppose a better solution is to encourage the upstream developers
of conflicting packages to improve compatibility. In the worst case
where compatibility cannot be achieved, an alias (e.g. pyreadline
vs pyreadline3) can be created.
I don’t think downstream maintainers will be happy with multiple
versions of a same package, plus single versioning makes it easier
to roll out bug fixes for every package using an affected library.
To explain a bit why this isn’t as desirable as it might seem: imagine I am using package A which imports NumPy 1.21, alongside package B which uses NumPy 1.16. I call
A.load_data() which returns a NumPy array, then I pass it to
B.process(arr). Either B thinks it hasn’t got a NumPy array at all (
isinstance(arr, numpy.ndarray) is False), or it has an array that may have a different layout than its NumPy library expects (different attribute & method names at the Python level, different memory layout at the C level).
Being restricted to a single version of a library actually makes life much simpler: within a process, you can assume that a NumPy array is a NumPy array, and you can always pass it back to NumPy functions. The alternatives would be that either the NumPy developers need to plan for in-memory forwards & backwards compatibility - handling an array created by both older & newer versions of NumPy - or that code using NumPy has to keep track of types like ‘NumPy 1.21 array’ and convert between different variants of the same thing.
I’m using NumPy as an example, but this would affect any library which defines its own classes and allows them to be used from outside its own code.
Your case makes sense but it’s the less common. The problem I described are sub-dependencies conflict. Being sub-dependencies, they are much more unlikely to be mixed together.
In an ideal world, yes. In practice, many packages have no maintainer, that’s why they are likely to depend on an outdated version, which may eventually conflict with the latest version.
Look how many packages are still only python2 compatible.
You are assuming that the person haveing the problem has the control on one of its direct dependency and is able to patch it to rename a subdependency. That’s a significant effort and is simply not possible in many situatinots.
The problem exists, irrespective of whether it makes downstream maintainers happy or not. In the example I gave, you are faced with the following choices :
- port package Tea to a new version of boiling_water
- port package Coffee to an old version of boiling_water
- give up because you have no porting effort capabilities
- find a solution with multiversionning
Version 4 might be the only realistic and affordable solutions in many real-world situations.
How so? I do not understand this statement.
How is it more realistic and affordable to implement a whole new mechanism in Python, as opposed to upgrading a package to a newer version of its dependencies? Is it because you’re assuming that “someone else” would do the work in one case? The fact that you list “give up because you have no porting effort capabilities” suggests that you probably also don’t have the capability to implement multiversioning, so essentially you’re hoping that someone else will solve the problem for you.
I’d expect that the cost would be far higher for (4), even if you allow for the fact that such a solution would only be implemented once whereas you might have to upgrade more than one package.
As I’ve already said, multiversioning solutions have already been tried and proved unworkable in practice. While someone might be willing to have another go at the problem, I’m pretty sure that things haven’t changed enough to alter the outcome. But if you (or anyone else) wants to try, no-one is going to stop you. Just don’t expect it to be easy…
I hope you don’t feel as if your idea was immediately shot down. However, I think the issue here is you’re proposing a hypothetical solution to a hypothetical problem, without a substantial number of compelling real-world use cases that motivate the resources, complexity and risk the solution entails, in the face of a number of hypothetical and practical barriers to its implementation, and when there have been a number of real-world efforts to solve it over the years by some of Python’s most knowledgeable and experienced developers that have been uniformly unsuccessful and not merited sufficient real-world interest to sustain them.
Particularly in the scientific Python ecosystem, at least from my experience as a user, developer and maintainer, I’d say its it’s much more common than not that when two packages depend on one or more of the same packages deeper in the stack (Numpy, SciPy, Pandas, Matplotlib, Cython, Sympy, xarray, Dask, etc), there’s some form of direct or indirect data interchange, or other cross-dependency between at least one pair of them, if not most. In particular, when they are used, numpy arrays, pandas dataframes, xarray objects are routinely exchanged, and code compiled with different Cython versions (if they actually merited a hard dependency non-overlap) may well be ABI-incompatible and cause a C-level hard crash.
As such, it seems likely that this will further break as many packages as it will fix, and in ways that can be far harder to debug and recover from than a simple dependency conflict on installation, this does not really seem to be a viable solution, relative to other strategies. Of course, you’re always welcome to work on a proof of concept implementation, but I would suggest looking into real-world examples of conflicts affecting popular packages before investing too much time and effort into an approach that may or may not be useful to more than a niche handful of developers. Best of luck!
Thanks for the general feedback. I was really wondering why this has not been tried before, to a level where I would have heard of it.
Your explanation makes it very clear why it is a risky idea. I would expect that there are other cases where this works but as you pointed out, practical packages breaking are needed to see if any solution applies.
It has been tried before.
setuptools started out with multi version support, and, in some ways, that was the USP of it.
The tooling for this still exists in setuptools and is still documented at Multi-version installs — Python Packaging User Guide.
The thing is, this approach is somewhat incompatible with Python’s import mechanisms (there’s a single global import cache), causes subtle/difficult to reason about failures (see numpy example above, and think about how you’d debug a failure there — remember, repr and type aren’t going to include versions) and all the other things mentioned already.
This is so baked into setuptools that pip calls it with
--single-version-externally-managed whenever it uses setuptools as an installation mechanism to avoid hitting any of the multi-version support code paths.
This will likely also be unsupported by now, because there’s absolutely not enough interest to keep that working — and even if it does work for you, unless you want to step forward and start maintaining that code while being mindful of backwards compatibility, I don’t think anything is going to change here.