Playing nice with external package managers

We’ll be happy to switch RPM macros to this standard! (I think I can speak for Fedora’s Python maintainers here, and I believe other RPM distros will join/follow.)

cc @hroncok

It’s not just macros – if we take what @ncoghlan said literally, it means that when you’re installing, say, the python-requests rpm, then rpm would need to first check for /usr/lib/python*/site-packages/requests*.dist-info/MANAGED-BY, and if it exists and contains some string besides rpm then it should refuse to install that rpm. So I think this would require patching the rpm tool itself to add a special case check? (Or are macros powerful enough to do that? I thought they were just for setting up the package, not for controlling whether it’s installed at all, but I’m definitely not an rpm expert.)

Yes, there will be resistance if we need to add a special-case to RPM itself :​(
At least on Fedora, though, pip installs into /usr/local/lib/ while RPM installs to /usr/lib/.

FWIW, it does degrade gracefully for these tools since the main issue for us has been pip modifying packages being managed by a different tool.

As long as this file as added by the distros, it’s an improvement on status quo, regardless of whether other package managers get updated for this scheme (since pip would stop fiddling with packages that are managed by those tools). :slight_smile:

1 Like

The in-person discussion included Matthias Klose on the Debian side, and Kale Franz on the conda side, and then I gave Petr a heads up for Fedora when I distributed the original set of notes.

Aside from conda, where my understanding is that the developers actively want folks to be able to install non-conda packages directly from PyPI without risking inadvertent upgrades of conda-managed packages, we’re not really expecting platform installers to respect MANAGED-BY - if the platform package manager provided the Python installation, then it can reasonably assume it has full control over that installation.

Instead, we’re mainly offering the platform tool developers the opportunity to populate the file and have the Python level platform independent tools refrain from breaking people’s Linux installations, even if a user does run ye olde “sudo pip install break-my-distro-please” command. (And even for conda, I’d expect them to end up offering their users a way to check their environment for installations that overlap with projects available from their conda channels and replace the PyPI version with the conda-managed version).

I don’t expect RPM and such to pay attention to the MANAGED-BY file. We basically have two classes of installs from the POV of Python packaging:

  • Managed by a tool that uses the relevant PEPs as their primary database of installed packages and related metadata.
  • Managed by an external tool that might happen to also emit the files that the relevant PEP databases uses because the Python level tooling needs it to function properly (not just packaging tools, but other runtime tooling as well).

For the first case, we want all of these tools to largely be interoptable. If you install something with pip you should be able to then uninstall it with totally-not-pip. As long as all of the tools in this category are using the same database for what they consider installed or not, and they all implement the relevant specs, then these tools should largely be interchangeable.

For the second class, these tools are largely NOT interoptable, and it doesn’t even make sense for them to be. apt and rpm are unlikely to ever be in a situation where they’re both installed on the same system and trying to install into the same set of directories. The closest thing to “crossing the streams” in this world would be installing something like conda or Linuxbrew on a system that already has apt or rpm or similar, but in every case of those I can think of, those tools are installing to an entirely different location and don’t attempt to touch each other files at all.

So interactions between two tools within the same category is already largely a solved problem, through one mechanism or another. What we care about really is interactions between tools in different categories.

Within those interactions between categories we have two “directions” the interaction can go, those interactions are roughly a “type 1” (e.g. pip) installed thing being overwritten by a “type 2” (e.g. apt, rpm, etc) install and the reverse, a “type 2” installed thing being overwritten by a “type 1” installed thing.

MANAGED-BYis largely solving the second case, and it does it by teaching the type 1 tools how to understand a special marker that type 2 tools can easily be modified to write.

Of course, we could envision a world where this same system could be used for the inverse, and keep rpm/deb/etc from clobbering something installed by something like pip. However it’s unlikely that such a thing will ever gain traction, because these tools work with far more things than just Python (and in most cases it doesn’t even know something is Python, it’s just dropping files in predetermined locations).

That does mean we’ll need a different solution for the two directions that these conflicts can happen in. However I think that is inevitable given the realities of the capabilities of the two different “types” of tools, and the politics surrounding them that control what kind of changes are possible or not.

2 Likes

I didn’t see (maybe I missed) some references above, but didn’t the same apply to pip crossing apt/rpm as well? IIRC sudo pip installs to /usr/local/{bin/lib}, and officially managed apt packages install to /usr/{bin/lib}.

I didn’t see it explicitly distinguished, so just in case: The most likely confusion (including pypa/pip#5605 IIUC) is not that the lib installations gets overwritten, but that pip installs another distribution that takes precedence over the existing (installed by another package manager) distribution. I do not think a MANAGED-BY file (or any other in-distribution marker) would solve this.

That’s not something we’re trying to solve with this really. We’re not stomping over the same files, the user is just electing to install something that takes precedence. It might break their system but it doesn’t have the same “two systems fighting over who owns what files” problem.

I likely didn’t express my concern clearly. I cannot think of a widespread example of pip actually stumping over another package installer. I also have not seen any real examples above that demonstrates this (unless I missed something). The worry I’m having is that we are solving a phantom problem that does not actually exist.


Edit: I double-checked nad realised that EPEL’s python-pip does install pakages to the same location as yum itself, so there indeed are examples that actual overwrites happen. Sorry for the false alarm.

apt-get install python-requests && pip install --upgrade requests

pip will uninstall the version installed by apt-get, then install the version pulled down from PyPI. The desired outcome is pip will do nothing with the version installed from apt-get, and will install the version pulled down from PyPI to /usr/local (the /usr/local thing still requires a patch on the distro side though, making that official is a future enhancement In cases where that patch doesn’t exist and we’re trying to install to the same place we should just fail).

Actually, pip already does exactly what you say in this situation, which is what prompted my initial misunderstanding:

$ apt-get install -qqy python3-requests && pip3 install --upgrade requests
[a bunch of messages from dpkg installing python3-requests]
Collecting requests
  Using cached https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl
Collecting idna<2.9,>=2.5 (from requests)
  Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests)
  Using cached https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests)
  Using cached https://files.pythonhosted.org/packages/e6/60/247f23a7121ae632d62811ba7f273d0e58972d75e58a94d329d51550a47d/urllib3-1.25.3-py2.py3-none-any.whl
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in /usr/lib/python3/dist-packages (from requests)
Installing collected packages: idna, certifi, urllib3, requests
  Found existing installation: idna 2.6
    Not uninstalling idna at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: certifi 2018.1.18
    Not uninstalling certifi at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: urllib3 1.22
    Not uninstalling urllib3 at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: requests 2.18.4
    Not uninstalling requests at /usr/lib/python3/dist-packages, outside environment /usr
Successfully installed certifi-2019.6.16 idna-2.8 requests-2.22.0 urllib3-1.25.3

Notice that pip skips uninstalling the requests distribution from apt.

Uninstalling pip’s copy of requests also does not break APT’s. There are some dependency problems, but they are caused by pip’s copy taking precedence (and as you said, not our problem), not APT’s copy being broken by pip.

$ python3 -c 'import requests; print(requests.__file__)'
/usr/local/lib/python3.6/dist-packages/requests/__init__.py
$ pip3 uninstall -y requests
Uninstalling requests-2.22.0:
  Successfully uninstalled requests-2.22.0
$ python3 -c 'import requests; print(requests.__file__)'
/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.3) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
/usr/lib/python3/dist-packages/requests/__init__.py

So this is not really a problem between APT and pip, at least in recent Ubuntu versions. I’m not sure who takes the credit though :slight_smile:

That said, RPM users are not as fortunate (as I reaslised in the previous edit). So (un)fortunately the issue discussed here is still very valid.

Pip doesn’t do that, Debian’s patched version of pip does that.

Maybe the “right fix” here is to have upstream pip adopt that patch? That is, make upstream pip install to /usr/local [under the circumstances when it would currently install to /usr].

Sorry there’s some confusion here.

On Debian, all versions of pip will install to /usr/local, becasue Debian has patched Python to tell any Python level installation tool the correct place to install things to is /usr/local. Where pip itself installs things to is not what this proposal is about fixing though.

On Debian, the debian supplied version of pip will refuse to touch /usr, because they’ve patched it to do that. That patch cannot be applied to upstream pip, because it only currently makes sense in the context of Debian’s other patches to other software (namely Python). The Debian patch to pip to not touch /usr only works because they’ve also patched Python to do the above. We could upstream the patch to Python, but that would only solve the problem for Python 3.9+ whereas MANAGED-BY solves it right now, for all versions of Python.

1 Like

Resurrecting an old thread to cross-link a new discussion: https://discuss.python.org/t/updating-pep-376-making-record-optional-in-installed-dist-info/

Rather than adding a new file to indicate that an installation is externally managed, PEP 627 instead proposes that external management be indicated by leaving out the RECORD file from the installed dist-info directory. This actively makes life easier for system package managers, so it’s an approach they have strong incentives to adopt.

@ncoghlan Your link is broken. It says “Oops! That page doesn’t exist or is private.”

It’s title was prefixed by the PEP

1 Like

While PEP 627 helps a bit, there is still a way to go.
I got back to this topic when reviewing Fedora’s patch that adds a warning when pip install is run under root. (sudo pip install still appears in variuos tutorials, and still usually a bad idea. Except in some cases, like containers.)
We had a brief discussion with @dstufft on IRC, where we essentially rehashed ideas from this discussion before we remembered the discussion already exists.

FWIW, Fedora:

  • splits /usr (system packages) and /usr/local (pip-installed packages)
  • runs most system-installed software in Python’s isolated mode (so it ignores /usr/local)
  • also adds a warning for sudo pip install under root

Donald summarized what I think is the best plan:

I think the “right” solution is teach distro tooling to emit MANAGED-BY, and teach python tooling that MANAGED-BY means “don’t touch”, then get a Debian style split where distros install to /usr and pip installs to /usr/local (and then each distro can decide if they want to exclude /usr/local from sys.path when running system provided tools or not)
But devil is in the details, and I think someone needs to write those PEPs.

2 Likes

Relevant discussion in the pip bugtracker

I think many of the folks on this thread have seen it, but we had some further discussions on the topic at this year’s PyCon, which has resulted in a new PEP that hopefully addresses this in a little more general way: