Playing nice with external package managers

At PyCon this year, there was a lot of discussion and grumbling about how pip can fight against python installations that are managed by other tools (apt, yum, conda, etc.). It was thought that the INSTALLER file in the dist-info folder for a given package’s installation was already something that pip would respect and refuse to alter packages where INSTALLER did not read “pip.” After further discussion, this was decided not to be the right way, and another metadata file more specific to this purpose was desired.

This thread is to discuss and decide on that other file, and come up with a PEP for standardizing on it.

Possible notes that I missed are in @sumanah’s post at Drawing a line to the scope of Python packaging

Can you say a bit more about the goal here? My first thought would be that the goal is to avoid situations where pip and some other package manager are stepping on each other’s toes and corrupting their respective package databases. But a package database is a single coherent data structure representing a whole environment, so I’d expect this to be handled by marking an entire python environment as managed-by-pip versus managed-externally.

If you tell pip it can’t touch specific packages, then you can still get in a mess:

  1. Pip installs some package
  2. Apt installs the same package, partially blowing away Pip’s copy
  3. Now you have a mix of two different versions installed in the same environment

the goal is to avoid situations where pip and some other package manager are stepping on each other’s toes and corrupting their respective package databases.

That sums it up pretty well, actually.

But a package database is a single coherent data structure representing a whole environment

I don’t think this is representative of the reality that we have today (though I really wish it was). People use pip in conda environments, and they also use pip in system environments. When they do, they often encounter problems, which the maintainers of the external tools have to deal with (“I pip installed tensorflow, and conda broke! It must be conda’s fault!” or worse: “I sudo pip installed tensorflow and now ubuntu won’t boot! Ubuntu sucks!”). I see this topic as a universal agreement to stop trampling packages that are managed by something else. If stopping completely is not feasible (which it probably won’t be), then lay out standards for how to tell users what needs to be done, and that the thing they are doing is unsafe.

I got the sense at PyCon that this was a settled matter that just needed some implementation elbow grease. @ncoghlan @dstufft @kalefranz and Matthias from Canonical were in most of the discussions. Maybe they have more to add - I’m kind of going on second hand discussions with Kale.

1 Like

Relevant here: https://github.com/pypa/pip/issues/5605

We’d discussed that we’d use a new “MANAGED-BY” file. I believe @ncoghlan made initial notes on this. If he’s okay with it, someone who has a copy of those notes should share them here. :slight_smile:

1 Like

These are my post-PyCon notes - the redacted bit was a specific name in the original emailed notes, but I want to let them chime in themselves if they’re still planning to work on it (since a lot of us have been in that situation of volunteering to work on something, and then just never finding the time to get to it).

Next steps:

Why not use INSTALLER:

  • We want Python level tools to be cooperative by default (install
    with pipenv, uninstall with pip, reinstall with poetry, uninstall with
    pipenv, etc)
  • Making INSTALLER strict would either interfere with that, or else
    we’d see everyone treating “pip” as a magic string that meant “shared
    management”, and we’d lose the tracking of which installer actually
    did the installation

Using a new MANAGED-BY file instead:

  • File is absent by default, indicates shared management by any tool
  • If MANAGED-BY is present, then management is limited to tools that
    understand and support the external management scheme named in the
    file
  • Examples: Debian would set ‘dpkg’; Fedora would set ‘rpm’; conda
    would set ‘conda’
  • Tools that don’t recognise the scheme would refuse to uninstall,
    upgrade, or downgrade the package
  • Tools that convert Python packages to system packages should add the
    new metadata file with the appropriate contents

Legacy metadata formats:

  • egginfo files must be treated as if MANAGED-BY was present with an
    unknown management scheme
  • egginfo directories must be handled the same way as distinfo
    directories (i.e. shared management by default, specific scheme can be
    named in MANAGED-BY file)

So does this mean that this plan depends on convincing the dpkg, rpm, etc. maintainers to check for this file? Do we know if they’re interested in doing that?

We’ll be happy to switch RPM macros to this standard! (I think I can speak for Fedora’s Python maintainers here, and I believe other RPM distros will join/follow.)

cc @hroncok

It’s not just macros – if we take what @ncoghlan said literally, it means that when you’re installing, say, the python-requests rpm, then rpm would need to first check for /usr/lib/python*/site-packages/requests*.dist-info/MANAGED-BY, and if it exists and contains some string besides rpm then it should refuse to install that rpm. So I think this would require patching the rpm tool itself to add a special case check? (Or are macros powerful enough to do that? I thought they were just for setting up the package, not for controlling whether it’s installed at all, but I’m definitely not an rpm expert.)

Yes, there will be resistance if we need to add a special-case to RPM itself :​(
At least on Fedora, though, pip installs into /usr/local/lib/ while RPM installs to /usr/lib/.

FWIW, it does degrade gracefully for these tools since the main issue for us has been pip modifying packages being managed by a different tool.

As long as this file as added by the distros, it’s an improvement on status quo, regardless of whether other package managers get updated for this scheme (since pip would stop fiddling with packages that are managed by those tools). :slight_smile:

1 Like

The in-person discussion included Matthias Klose on the Debian side, and Kale Franz on the conda side, and then I gave Petr a heads up for Fedora when I distributed the original set of notes.

Aside from conda, where my understanding is that the developers actively want folks to be able to install non-conda packages directly from PyPI without risking inadvertent upgrades of conda-managed packages, we’re not really expecting platform installers to respect MANAGED-BY - if the platform package manager provided the Python installation, then it can reasonably assume it has full control over that installation.

Instead, we’re mainly offering the platform tool developers the opportunity to populate the file and have the Python level platform independent tools refrain from breaking people’s Linux installations, even if a user does run ye olde “sudo pip install break-my-distro-please” command. (And even for conda, I’d expect them to end up offering their users a way to check their environment for installations that overlap with projects available from their conda channels and replace the PyPI version with the conda-managed version).

I don’t expect RPM and such to pay attention to the MANAGED-BY file. We basically have two classes of installs from the POV of Python packaging:

  • Managed by a tool that uses the relevant PEPs as their primary database of installed packages and related metadata.
  • Managed by an external tool that might happen to also emit the files that the relevant PEP databases uses because the Python level tooling needs it to function properly (not just packaging tools, but other runtime tooling as well).

For the first case, we want all of these tools to largely be interoptable. If you install something with pip you should be able to then uninstall it with totally-not-pip. As long as all of the tools in this category are using the same database for what they consider installed or not, and they all implement the relevant specs, then these tools should largely be interchangeable.

For the second class, these tools are largely NOT interoptable, and it doesn’t even make sense for them to be. apt and rpm are unlikely to ever be in a situation where they’re both installed on the same system and trying to install into the same set of directories. The closest thing to “crossing the streams” in this world would be installing something like conda or Linuxbrew on a system that already has apt or rpm or similar, but in every case of those I can think of, those tools are installing to an entirely different location and don’t attempt to touch each other files at all.

So interactions between two tools within the same category is already largely a solved problem, through one mechanism or another. What we care about really is interactions between tools in different categories.

Within those interactions between categories we have two “directions” the interaction can go, those interactions are roughly a “type 1” (e.g. pip) installed thing being overwritten by a “type 2” (e.g. apt, rpm, etc) install and the reverse, a “type 2” installed thing being overwritten by a “type 1” installed thing.

MANAGED-BYis largely solving the second case, and it does it by teaching the type 1 tools how to understand a special marker that type 2 tools can easily be modified to write.

Of course, we could envision a world where this same system could be used for the inverse, and keep rpm/deb/etc from clobbering something installed by something like pip. However it’s unlikely that such a thing will ever gain traction, because these tools work with far more things than just Python (and in most cases it doesn’t even know something is Python, it’s just dropping files in predetermined locations).

That does mean we’ll need a different solution for the two directions that these conflicts can happen in. However I think that is inevitable given the realities of the capabilities of the two different “types” of tools, and the politics surrounding them that control what kind of changes are possible or not.

1 Like

I didn’t see (maybe I missed) some references above, but didn’t the same apply to pip crossing apt/rpm as well? IIRC sudo pip installs to /usr/local/{bin/lib}, and officially managed apt packages install to /usr/{bin/lib}.

I didn’t see it explicitly distinguished, so just in case: The most likely confusion (including pypa/pip#5605 IIUC) is not that the lib installations gets overwritten, but that pip installs another distribution that takes precedence over the existing (installed by another package manager) distribution. I do not think a MANAGED-BY file (or any other in-distribution marker) would solve this.

That’s not something we’re trying to solve with this really. We’re not stomping over the same files, the user is just electing to install something that takes precedence. It might break their system but it doesn’t have the same “two systems fighting over who owns what files” problem.

I likely didn’t express my concern clearly. I cannot think of a widespread example of pip actually stumping over another package installer. I also have not seen any real examples above that demonstrates this (unless I missed something). The worry I’m having is that we are solving a phantom problem that does not actually exist.


Edit: I double-checked nad realised that EPEL’s python-pip does install pakages to the same location as yum itself, so there indeed are examples that actual overwrites happen. Sorry for the false alarm.

apt-get install python-requests && pip install --upgrade requests

pip will uninstall the version installed by apt-get, then install the version pulled down from PyPI. The desired outcome is pip will do nothing with the version installed from apt-get, and will install the version pulled down from PyPI to /usr/local (the /usr/local thing still requires a patch on the distro side though, making that official is a future enhancement In cases where that patch doesn’t exist and we’re trying to install to the same place we should just fail).