The most popular advice on the internet for `error: externally-managed-environment` is to force packages to be system installed

astrojuanlu · June 29, 2024, 1:17pm

I think it’s fine if there are several solutions to the problem.

But that top voted answer basically “hides” the correct steps behind a link, and then gives the ill advice that’s repeated in the other answers.

I proposed a modification but I no longer have a SO account so I don’t know if they are going to take it.

The correct answer as of now is to use venvs, period. One could argue whether this DX can be improved but that doesn’t change the fact that users should at least see in a very clear and visible way what’s the “blessed” way of doing things that will never risk breaking their system.

Whoever has a SO account could help by improving that first answer so that the python -m venv step is more visible.

mwichmann · June 29, 2024, 2:23pm

Flatpacks (or snaps, or appimages) also introduce yet another trustworthiness vector: like with PyPI, anyone can upload to Flathub once an account is created. At least Flathub and Snapcraft do have a “verified” checkmark procedure - Linux Mint is notably taking the approach of hiding snaps that are not verified. Just one more thing for users to worry about but I guess it’s not materially different from deciding whether you trust an install from PyPI.

Meanwhile, the flatpak ecosystem does have some tooling for Python, which I’ve read about but have no experience using (flatpak-builder-tools/pip at master · flatpak/flatpak-builder-tools · GitHub). I’d imagine the snap world does the same.

Just an FYI, not making any value judgements (though I can confirm the irritating proliferation of slightly different versions of things, particularly across updates, as outdated versions are not pruned automatically. But that’s just “implementation detail”, no?)

kknechtel · June 29, 2024, 2:40pm

From what I understand, they’re being warned about, rather than hidden. And in principle, there’s already a security risk anyway.

kknechtel · June 29, 2024, 4:57pm

(Would it be possible to split the posts about bundling zipapps and/or standalones to a separate thread?)

Aside from that, their single-file options will unpack themselves into a temporary directory, which AFAIK is not something zipapp does.

FWIW, over the last few hours I figured out a proof of concept that patches open so that code that does Path(__file__) / 'foo.txt' type tricks will work (with restrictions: it unzips to memory and creates a BytesIO or StringIO as appropriate - so read and write methods are available regardless of the “mode”, and writes won’t persist). I also worked out that something like pip install --target _vendor/ --only-binary :all: --platform=whatever requirements.txt can be used to set up a directory with dependencies for a given platform that could then be included in a zipapp, and of course the sys.path could be hacked at startup to include the vendored libraries.

I’d like to explore that route for the cases where a Python interpreter doesn’t need to be bundled but it should otherwise be turnkey. I think part of the reason zipapps aren’t popular is because the standard library zipapp doesn’t solve the “read a file from within the archive” problem and it’s annoying to have to modify the code to take those extra steps for when it gets zipapp’d.

sinoroc · June 29, 2024, 6:55pm

I did, we’ll see if it sticks:

The edit:

I wrote this in just a couple of minutes and took a lot of shortcuts, which is needed in such context, but maybe I took the wrong shortcuts, so feel free to improve (yourself, or if you do not have SO account I can do it for you).

sinoroc · June 29, 2024, 7:03pm

I believe it would be fine to have the downloadable artifacts directly on the github repo. Is there such a thing as a .flatpak file, that I then can flatpak install Downloads/spotdl.flatpak? And of course I just picked flatpak semi-randomly, because that’s the one I use these days, but it could be something else.

mcepl · June 29, 2024, 8:24pm

I understand that the latest fashion is to target now Python only for large and serious projects of million lines minimum (see switch to typing, now mandatory venv and many others expressions of this trend), but there are still some of us who use Python for ten lines Python scripts for administering our systems.

Before I manage to port all such scripts to shell scripts and abandon Python for my system administration altogether, could you wait a little bit so that I can finish the porting? Yes, it is a bother that I have to use completely silly options like --break-system-packages (and even worse, we have to explain to our customers, why they have to use them), but it is better than maintaining hundreds of venvs for ten-line scripts each.

oscarbenjamin · June 29, 2024, 8:25pm

I think that the reason zipapps aren’t popular is because they don’t bundle a Python interpreter and so only provide an incomplete solution to app distribution. You can maybe partition the set of people would want to install some app into three categories:

People who know how to install Python and run Python code and can happily unzip a source archive or checkout the code from github and figure out how to run it from the README. These people can also figure out how to use pip to install things.
People not in group 1 but who do know how to install Python and can download a zipapp and follow the instructions to run a zipapp.
People not in group 2 who generally do not know how to install Python and would not know how to follow the instructions to run a zipapp.

The vast majority of people are in group 3. A much smaller but important chunk of people are in group 1. The zipapp caters to group 2 but it’s a thin slice in the middle where the benefits of catering to them seem marginal from the perspective of anyone shipping an app.

The pyinstallers etc try to make everything a single .exe file in the hope that this is usable for people in all groups (just dowload and double-click the .exe). This is not generally how app distribution should work though.

The most useful thing for improving the distribution story for Python applications would be a tool that could do sort of what zipapp or pyinstaller do but actually turn the end result into something that resembles an installer for a non-Python applications. You want an .msi installer for Windows, .dmg for MacOS, and maybe other things for various unices. The result should be a self-contained application installer that does all the stuff like add an icon, add to start menu add an uninstaller, updater etc. These installers should definitely bundle a Python interpreter. The user should be able to install the application without ever needing to know that Python is any part of it because it will just look like any generic application.

The “read a file from within the archive” problem is irrelevant if you just make a proper installer and install the files. End users in group 3 really don’t care about the application being in a single file so just extract the files from the archive at install time and the problem is solved.

fungi · June 29, 2024, 8:58pm

[…]

it is better than maintianing hundreds of venvs for ten-line
scripts each.

The work on PEP 723 – Inline script metadata | peps.python.org might benefit you in
that case, e.g. via pipx run or similar tools.

barry-scott · June 29, 2024, 9:40pm

I created a single venv that I use for almost all my scripts.
No need for one per script.

The goal of the venv is to keep the PyPI packages separate from the system.

mcepl · June 29, 2024, 9:45pm

Yes, squashing of pyproject.toml into a script metadata is a nice idea, but otherwise I don’t see the point. What’s so different in running those scripts from one shared venv from running them from one ~/.local/lib/python*/site-packages/?

mwichmann · June 29, 2024, 10:04pm

Small scripts probably don’t need packaging at all, just run them. If they have dependencies on other (non-stdlib) Python packages, and those dependencies can be satisfied by a system-delivered package, then there’s also no need to fuss. It’s not really that dire, is it? I have dozens of admin/tool type scripts sitting around that I just use without worrying about any of the stuff in this thread.

kknechtel · June 30, 2024, 12:59am

I, too, default to a “sandbox” venv that contains a few useful and common dependencies like Numpy, and generally only make per-project venvs if I need it for testing some build/deployment/etc. stuff. For my toolchain stuff like twine, I use pipx. Then if I consciously want installed applications (including self-developed ones) to share an environment, I can pipx inject --include-apps them together.

kknechtel · June 30, 2024, 1:02am

The shared venv is out of the way of scripts that the distro provides, and which the distro maintainers consider to be part of the operating system. Even if you only did user-level installs, something you installed could for example shadow a name that a system script wants to import, and cause that script to break (hence “break system packages”). And of course if you don’t want a --user install then you’ll need root privileges in order to unpack the files, and that means setup.py could run as root, too.

This can be quite serious - e.g. the system package manager itself may have Python components (I know this is true for Debian’s apt, and I think I saw in the PEP that it’s true of Fedora’s dnf too).

On Linux, Python doesn’t come with Linux to save you the step of installing it like Windows users have to do. It comes with Linux because parts of Linux (well, GNU, really) need it.

Kwpolska · June 30, 2024, 9:52am

This is another example that shows virtual environments are too difficult, and hard to teach, especially for beginners. When faced with multi-step instructions and new complications to the workflow (activating the venv or running .venv/bin/python or convincing their IDE to pick up the venv), or a simple option to make the error disappear (turns out people aren’t scared to --break-system-packages, because they don’t understand the implications), people will choose the latter.

How to fix this? Provide something simple that does the right thing out of the box. Something like PEP 582 would make it so — there would be no need to use venvs, pip install would default to installing in the current directory (or the nearest __pypackages__), and python would automatically pick those packages up^[1]. pipx should also be available in the default install to handle installing applications that happen to be written in Python (in a central per-user location with entrypoints in ~/.local/bin).

Virtual environments are never going to be easier or friendlier, people are going to keep breaking their systems if virtual environments are the only way to prevent such breakage.

These days, in many countries, basics of programming are taught to everyone, specifically schoolchildren, as part of the basic curriculum. Simplicity of Python means it’s a popular language for teaching. Python the language is simple, but Python the ecosystem has a high barrier to entry, by requiring understanding of the command line, virtual environments, and other parts of the packaging ecosystem. Adding all that to the curriculum would take away time that can be spent on other things that would be more useful to students who don’t want to become professional Python developers^[2].

It is possible for an expert in *nix and another programming language to be dropped into Python’s uniquely user-unfriendly ecosystem, and choose the easy way out (--break-system-packages) instead of trying to grok it, because they are using Python for just one small thing which requires Python, and want to get things done quickly and go back to their preferred language.

Something simpler than virtual environments would benefit beginners, but advanced users may find it better and easier than venvs as well. After all, this is Python, a language where I don’t need to change my numeric type when the number exceeds 2^{31} -1, where 1/2 == 0.5, where there are countless other things that are simpler than in other languages, and nobody complains about coding in Python without needing to understand the specifics of memory management in C.

I’m fully aware that the PEP as written is not ideal and would need improvement to succeed, especially w/r/t the refusal to look for __pypackages__ in parent directories. ↩︎
Note that you can easily be a professional developer in some ecosystems and live without ever touching the command line. And the material that concerns Python packaging would not apply elsewhere, unlike the general programming concepts taught while teaching Python. ↩︎

pf_moore · June 30, 2024, 11:53am

This is certainly one possible way forward. But PEP 582 was the nearest we ever got to an alternative to virtual environments, and it failed. At a minimum, we’d need a new PEP which addressed the issues that resulted in the rejection of PEP 582, and I’m not aware of anyone working on anything.

kknechtel · June 30, 2024, 2:22pm

I don’t understand where the line is. If someone can learn to use a terminal in the first place, and navigate around directories and run commands like pip, and to add --break-system-packages to a pip command, then why is it any harder to learn python -m venv .venv and .venv/bin/activate? What you call multi-step has two steps, as far as I can tell.

Virtual environments are, equally well, never going to be any less necessary for testing and development and any kind of ecosystem participation. But for those who are only learning to program, there absolutely is a clean “way to prevent such breakage”: installing and using a separate Python. Which, for Windows users (the majority, especially for those with lesser understanding of a command line), is spelled “installing and using Python, which has to be done anyway”.

And really, how could --break-system-packages be any clearer? (I’m reminded of the hdparm Linux utility, which can do some things with HDDs and SDDs that are extremely dangerous to data integrity and possibly even to the future use of the drive. There’s apparently a --please-destroy-my-drive flag required for some operations, not mentioned in the man pages but only in error messages - obviously I am not counseling anyone to rely on that! But my point is, it seems to work well enough.)

There’s a lot I find less-than-ideal about how Python environments (both virtual and “real”) work. But the way I’d see the whole mess fixed, takes a devil-may-care attitude to backwards compatibility - it simply couldn’t be applied to Python, which is why I’m explicitly designing it for my own language instead.

I agree. A hypothetical ensurepipx.py sounds more useful to me than the existing ensurepip.py. And any future tools that might be needed (like, say, twine) are much better dealt with this way. But people using pipx, especially, are going to need to learn about virtual environments eventually.

Okay, but I’m talking about people who want to be programmers, not simply people who want to learn how to write some code (or have it as a curriculum requirement). First off, curricula that are serious about this stuff should be allocating a separate slot to understanding how to use a computer generally - as a tool, not an appliance - because that is valuable in its own right. But more importantly: for the current problem under consideration to be relevant, we would have to be talking about “schoolchildren” who:

are not using Windows
are being expected to set up the environment, rather than e.g. having a school sysadmin do it on the school computers
are being expected to use third-party libraries for something (if the goal is not to train future developers, then the assignments should be scaled back to something that the student can write fully)

And if they’re being taught to use an IDE, I maintain that this is more complex than teaching them to use the command line - and always will be, for every IDE. (Which is part of why I don’t use one.) Each one has its own menus and configuration and that knowledge is not transferable, and they do things behind your back (like creating virtual environments) that you might have to understand when they go wrong, even though you wouldn’t have otherwise.

… I genuinely don’t understand how this is comparable to the situation with virtual environments. Understanding how your platform (operating system, the command line, etc.) works is not the same as understanding how your hardware works.

By analogy: It used to be that car drivers were expected to be able to do basic maintenance themselves (they’re still expected to fill the gas tank, at least) rather than having a mechanic do everything. That’s different from having to understand how an internal combustion engine works, at the physical chemistry level. It’s nice that today we have cars with air bags that deploy automatically, and anti-lock braking, etc. None of that negates the fact that if you can’t change the oil yourself, you’re becoming dependent on others in a way that partially negates the point of owning a car and the sense that one actually owns and is responsible for it.

oscarbenjamin · June 30, 2024, 3:17pm

It is not just two steps because the activate part needs to be done every time you open a terminal in order to be able to use the packages that were installed. I find this annoying myself which is why I have pyenv manage this automatically via .python-version so that an environment is activated just by cding into a directory.

We are likely talking about a diverse group of people but I expect that for most people who are using --break-system-packages the problem is not learning about virtual environments. Rather the problem is that virtual environments are not what they want. They want a global environment that does not need to be activated so that they can install things once and then have those installed things be always available. Unless a venv can behave like that then telling them to use a venv misunderstands what they are asking for.

Virtual environments should really be an optional feature to be used by people who actually want to have more than one environment.

mcepl · June 30, 2024, 5:44pm

Two things:

I am a SUSE engineer working on Python packaging, so I have seen many bugs related to Python. In the six years I have been with SUSE, I have seen one (or maybe two) bugs, where a Python library got in the way, so I am really not persuaded that it is a big problem as every makes it to be. Perhaps, I am a bit more comfortable with it because zypper (SUSE equivalent of Red Hat’s yum/dnf) is C++ application and Yast (our configuration tool) is in Ruby.
If you do something as root user, then you are supposed to be responsible for consequences of your actions. There are so many ways how to screw up your system if you are root, that I don’t think just this rather complicated way how to hurt yourself matters that much. Again, perhaps, I am too tolerant about this, because neither zypper nor yast can get hurt.
I am still suspicious that venv is slightly too big gun for this. I understand what you are saying, but wouldn’t something less drastic be sufficient (e.g., if sys.argv[0].startswith(‘/usr/’) then sys.path.doesnt-include-home-lib)?

Matěj

barry · June 30, 2024, 5:50pm

shiv for example does unpack on first run. One of the reasons why unpacking is necessary is that if the zipapp contains shared library extension modules, they can only be imported from the file system. On *nix systems^[1], extension modules are loaded via dlopen() and AFAIK there’s still no portable from-memory API for dlopen().

Back when I was helping to work on shiv, I wrote a small (<50 lines) custom importer that would extract the .so on demand into a temporary file and import it from there. You could eliminate the unpacking step, but it turned out to not really be any faster, especially on a warm start (shiv caches its unpacked directory).

Window? ↩︎