Package dependencies and virtual environments

defjaf · July 20, 2023, 6:03am

(Not sure if this belongs in “help” or “packaging”.)

I am trying to understand workflows around virtual environments, prompted a little bit by the discussion around single-file scripts.

There seems to be a general understanding at least among the cogniscenti here that virtual environments are the way to go for pretty much all Python use — we shouldn’t “pollute” the main Python install with third-party packages.

I use python for a significant fraction of my programming. I am a physicist, and most of that programming is either in the form of Jupyter notebooks or medium-sized programs, with quite a few very short single-file scripts that do a single job as part of the mix.

I find my workflow is a very bad fit for virtual environments. Creating a new venv for each new program or script seems quite heavyweight. Jupyter, in particular, does not play all that well with venvs. It works, but it’s several extra steps for each new environment, and as far as I can tell each one semi-permanently pollutes the list of available Jupyter kernels. Perhaps a fix for this part of the problem needs to come from the Jupyter development side?

Moreover, almost all of my work uses the same mix of packages. One recent counter-example is Numba, which is usually pinned to the not-quite-latest version of numpy. So I’ve actually created a new venv — and Jupyter kernel — for just this, but this means I need to know ahead of time if I’m going to need it, since sometimes there are advantages to using the latest version of numpy rather than whatever Numba is pinned to.

I’m not completely sure where I’m going with this, but I suppose I wanted to clarify the underlying philosophy here. The Python Tutorial discusses virtual environments solely (?) as a solution to the problem of clashing dependencies. Is that why we should use them? Or do they have other advantages?

Slightly, or more than slightly, controversially, I feel that they are a bad solution to the dependency problem, if that really is their only raison-d’etre, although of course I am not sure that I have a better one. The most obvious other kind of proposal is a modification to import, which puts the burden on the calling code rather than the package itself (although within a distributed package it seems simple enough).

Thanks for your thoughts on the matter.

petersuter · July 28, 2023, 11:40am

The other “motivation” for virtual environments is largely specific to a “system Python” e.g. on Linux I think, where version conflicts can break core operating system programs. This deficient system design seems to drive the push for virtual environments in all places, even when not needed, useful, or wanted. You can largely ignore that on e.g. Windows or when you install Python separately.

defjaf · July 28, 2023, 11:51am

Indeed. I should have been clear that I am on macOS and I always install a separate python (in my case with homebrew although I used to use the python.org package install) as my “main python” exactly to avoid using the system python for anything at all…

fungi · July 28, 2023, 12:30pm

It’s not a problem “for” Windows because Windows doesn’t implement
large portions of its operating system in Python, and it doesn’t
have a system-wide package manager that offers curated packages of
Python-based applications which all share a common copy of the
Python interpreter/runtime and import module namespace.

The problem arises more on GNU/Linux and some Unix derivatives
because people see that central shared Python and say “ooh I’d like
to use that rather than just installing a separate Python
environment, but, no, I need some packages or versions of packages
which aren’t part of the curated set my distribution offers, so I’m
going to cowboy in some packages of random things they didn’t plan
for and just hope it works.” And often it does work, but sometimes
it causes major damage, like if your car didn’t have a trunk so
instead of going to the trouble of getting a trailer to put things
in you decided to just stick them in empty spaces under the hood and
hope for the best.

This really isn’t a “Linux problem” it’s a “mixing package managers
problem.” You can get into the same bad place with, say, Conda by
trying to mix packages from Condaforge and PyPI with conflicting
requirements. Conda solves it by giving you the ability to create
separate environments for your different tasks. This is,
conceptually, no different than using separate venvs for your
different tasks, or separate Python installations for that matter.

It can also be handy to combine these solutions. In my case, I
compile half a dozen different versions of CPython in my home
directory (I need to test that changes I’m making to software work
with a range of minor Python versions not all of which are supplied
by my distro). I then use those different versions of CPython to
create Python-version-specific venvs for different sets of software
so I can isolate their dependencies from one another to reduce
unexpected interactions, to make sure I have correctly specified the
full set of dependencies for those applications, and so on.