What's the difference between install-time and run-time package names?

cadojo · June 11, 2022, 3:21am

We install Python packages with python -m pip install <package-name>, but often that name is different than the name we import. I’ll use one of my favorite package ideas as the example…

Installation

python -m pip install plum-dispatch

Usage

>>> from plum import dispatch
>>> 
>>> @dispatch
... def f(x: int): print(f"{x} is an int!")
... 
>>> @dispatch
... def f(x: float): print(f"{x} is a float!")
... 
>>> f(1)
1 is an int!
>>>
>>> f(1.0)
1.0 is a float!

What is the technical name for the label we installed (“plum-dispatch”), and the label we used (“plum”)?

As an extra question… what if we install two packages with different installation names, but identical import names?

python -m pip install a-pkg # provides one Python package, "pkg", which prints "A" upon loading
python -m pip install b-pkg # provides one Python package, "pkg", which prints "B" upon loading

>>> import pkg
A

When I run this experiment, it seems to always pick the pkg provided by a-pkg. Why?

steven.daprano · June 11, 2022, 5:00am

shrug

There is no real consistency here. “Package” can have two related but distinct meanings:

a collection of software and/or data files you install with a “package manager”, like pip e.g. “plum dispatch”;
a collection of software and/or data files accessible as a Python library, e.g. “plum”.

People call both of them a package.

Even when the two packages refer to the same collection of software, they don’t necessarily have the same name. Package manager names can include characters that aren’t legal in Python module/package names, they don’t even have to be related. There is nothing stopping somebody from creating a package (sense 1) called “red” that installs a package (sense 2) called “green”.

As for your second question, if two different packages (sense 1) install packages (sense 2) with the same import name, then at best one would merely over-ride and shadow the other, hiding it, and at worst it would over-write the files.

I don’t know whether pip offers any sort of protection against that behaviour, whether accidental or malicious.

As for which one “wins”, I guess it depends on

the order they were installed;
whether one appears earlier in the PYTHONPATH search path than the other.

petersuter · June 11, 2022, 5:15am

The technical term for the pip install name is the Distribution Package.

The technical term for the import name is the Import Package.

Unfortunately that means there can be name clashes, like you describe. If clashes happen that can be a big problem and the only solution is to make one of the owners rename one of the packages.

It is probably best to use a single name when creating new packages. That helps avoid clashes and reduces general confusion.

Personally I think it is an unnecessary design flaw of the ecosystem that using different names was even allowed, but it is too late to change now.

cadojo · June 11, 2022, 5:32am

I’m sure it’s not this simple, but would something like a UUID in the package metadata be a low-cost way to differentiate between packages with the same names, or even the same distribution names? Or is that incompatible in some way to PyPi’s “index, not registry” philosophy?

I know PyPi and the Python package structure are both distinct from one another, but they’re all under the Python Packaging Authority so I’d have to guess the design of one influences the other.

petersuter · June 11, 2022, 6:01am

I’m guessing adding a UUID to the package metadata would be simple, but doesn’t accomplish much if the Python code still contains import somename and not import someuuid.

In principle you could probably already write a module importer/finder/loader that somehow uses your idea, so you could experiment with where / how you would specify the UUID. I would not recommend it, except if you find it interesting to explore advanced concepts for educational purposes only.

CAM-Gerlach · June 12, 2022, 4:37am

This stems from the fact that a distribution package can contain one or more top-level importable modules, import packages, or any combination (for example, the setuptools distribution package contains two top-level import packages—setuptools, with the functionality intended to be used at build-time, and pkg_resources, with the functionality intended to be used at run time.Furthermore, there’s namespace packages, where multiple distribution packages can map to the same top-level import package.

I agree (though not everybody does) that the simplicity, clarity and reliability of requiring a 1:1 mapping outweighs the mostly aesthetic/convenience benefits of supporting a N:N mapping, and many modern packaging tools (e.g. Flit) explicitly enforce this as a design goal, but unfortunately its not readily practical to require all legacy code to conform to these norms.

petersuter · June 12, 2022, 7:07am

That does not match what I see happening. I just tried it and pip still just overwrites the file. The distribution package name is irrelevant.

I’m no expert on this, maybe there are different ways to do it, with different outcomes.

CAM-Gerlach · June 12, 2022, 8:53am

Sorry, I made a rather silly mxiup here—the .dist-info metadata directories in site-packages are named and looked up per the distribution packages. but the actual top-level package directories dumped in the same location are those of the modules/import packages included within the distribution. One more side effect of this complexity…

I may be missing it somewhere, but AFAIK, for packages installed in the same location (site-packages dir), whether and how existing package files get clobbered is undefined installer-dependent behavior, as I don’t see it discussed in either the wheel spec or the installed projects spec.