Python module conflict discussion

packaging module conflict

phenomenon

As we all know, we use a package name to download a distribution from pypi.org. However, we import the module with its module name.

some experiment

So the problem is that, a package name is unique in the pypi.org while module names are not. What happens if I download two packages and they have the same module (same module name and different content)? I do some experiments and here is the result.

Here are the two packages I constructed. The first package has two modules, mod1 and mod2, while they have their own add and sub submodules. The second package also has two modules, mod1 and mod2, but they have their own div and multi submodules. I install the first package first and then the second package. We can see that both packages are installed in the site-packages directory at the same time, and their structure is shown in the figure.

It is interesting that, except for the metadata files of the two packages, which are stored in their own folders, the rest of the modules are jumbled together due to a module conflict between the two packages. What’s worse, later installed packages will overwrite modules that previously installed(e.g. mod1/__init__.py and mod2/__init__.py). And the other modules are installed in the same folder.

analysis

This seems to be the convention, and there has been a lot of work noticing module conflicts, but such an overwrite setting seems to break the local environment. For example, if there is a module foo.py in the local environment, and a user accidentally overwrites it when installing an open source package subsequently, then the user will use the wrong module when importing foo module.

A special case is represented in the dependency graph. If pip installs an open source package with conflicting modules in its dependencies, pip will have an overwrite problem when installing these dependencies into the same path. This may cause the functionality of some packages to fail, or an attacker may use this situation to compromise common modules in the local environment to achieve an attack. However, pip does not have any warning messages for this situation.

In addition I did a large-scale analysis of module conflicts in the dependency graph and found that nearly 4.71% of the packages on pypi have module conflicts in the dependency graph.

1 Like

I suspect this discussion will keep coming up until pip refuses to extract a file that already exists.

That itself will break a range of packages,[1] but as we move toward more predictable build backends it should be easier to ensure those packages release without conflicting files.


  1. e.g. those who want to act like namespace packages but want the safety of having an __init__.py and so all possible children will include it - yes I know this is wrong and there are better ways to get that safety, but even the people who ask me for advice on this keep deciding to ignore it :wink: ↩︎

I can’t quote footnotes, but you’re right, legacy-style namespace packages with __init__.py are a big reason pip can’t do this. That, and the broader “there is no bad practice so awful that there isn’t a massive closed-source project that relies on it” issue. See xkcd: Workflow.

Possibly a “pip fails on file conflict unless the contents is identical” is sufficient to get past that case?

1 Like

Another place this is relied on is when you have two packages
providing the same API but with alternative implementations behind
it. There is a desire to be able to use one (often a successor) as a
drop-in replacement for the other with no changes to existing code
that imports it.

The problem will creep up again on uninstallation. This obviously doesn’t work now either, but if you “fix” installation people would reasonably expect uninstallation to also work.

It’s always been there, and it’s why when we set up the Azure SDK (which uses namespace packages extensively) we put the package __init__.py’s in their own package, so they get installed exactly once and uninstalled exactly once (or more likely, never :wink: ).

Reference counting conflicting files is the only real way around it, but that still doesn’t answer how to handle the conflict in the first place - fail (because it’s different), or increase the count?

1 Like

This is a better idea the more I think of it. With pkgutil-style namespace packages mostly in the rear mirror, a reasonable way out like this (fork the projects, remove the __init__.py, and put it in a separate package) is good enough IMO. (We need a deprecation period for this to be implemented by potentially impacted users, of course.)

With the workaround available, I think erroring out on any file conflict is the right thing to do.

Maybe one way around this and towards a safer world would be to prevent overwriting files by default, but allowing it by using an --allow-overwrite flag. At least the user needs to consciously opt-in and the annoyance of that flag might nudge package maintainers towards a safer alternative, like the “metadata” alternative, described here. At some final stage, pip could completely disallow infringing on another package’s module space, unless an explicit flag is specified.

The pkgname is unique on PyPI, so is it possible to use pkgame as its namespace by default when installing packages (similar to metadata folders, creating separate folders for each package and declaring them when using them, such as from pkgname import mod_name or from pkgname.mod_name import function/class). This lets pip perform correctly during installation and uninstallation, and allows developers to be aware of which package they are using, thus avoiding the use of the wrong package or even a malicious package.

I don’t enjoy doing this but we’re repeating a discussion that has largely taken place on pip’s issue tracker already. I suggest we move the rest of this discussion there and, uhm… please read the existing discussion before posting further there!

1 Like

+1 – certainly as a start.

To go a bit further, I think there are two different potential use cases here:

  1. Two different distributions that have nothing to do with each-other use the same package name – no idea what can be done, but at least the user should know something’s up!

  2. Cooperating distributions, such as namespace packages – then maybe pip could know that it’s OK to overwrite certain files.

OK – just saw the link to the issue tracker – I’ll go there now.

Maintainer requested following the original discussion on GitHub, reading it fully before posting. pip overwrites existing files unconditionally during installation · Issue #4625 · pypa/pip · GitHub