Understanding site-packages directories

I’m a developer for the Spack package manager. Spack is similar to Conda in the sense that it can install both Python and non-Python libraries, and similar to Nix in the sense that each package is installed to a unique installation prefix. Spack supports reusing system installations of Python (built with apt, yum, conda, etc.).

When dealing with Python installations, we’ve noticed that third-party Python libraries may be installed in one of several directories:

  • lib/pythonX.Y/site-packages on most POSIX systems
  • lib64/pythonX.Y/site-packages on RHEL/CentOS/Fedora with system Python
  • lib/pythonX/dist-packages on Debian/Ubuntu with system Python
  • lib/python/site-packages on macOS with framework Python
  • Lib/site-packages on Windows
  • others?

We would like to know this directory ahead of time so that we can set PYTHONPATH appropriately. My question is, when installing a Python library, how does the site-packages directory get decided? Does it depend on the installation method? For example, do all of the following make the same decisions:

  • distutils: legacy python setup.py install (deprecated)
  • setuptools: legacy python setup.py install (deprecated)
  • pip: pip install
  • installer: python -c 'import installer; installer.install(...)'
  • others?

So far the most reliable method I’ve found is to query distutils.sysconfig.get_python_lib(...) (deprecated) or sysconfig.get_path(...). Is this how these installation methods make the decision? How do these installation methods decide whether to use purelib or platlib?

Note: I originally opened this question on the setuptools GitHub but it seems like this question is better asked to the broader community.

2 Likes

I don’t have the energy to explain how all this happens, and I have stopped trying to fix the issues with Python install locations, but I believe Filipe Laíns - Python, Debian, and the install locations provides a lot of the information you are looking for.

2 Likes

sysconfig is the correct way to get this information. distutils, as you note, is deprecated, but used to be commonly used. They should be the same, but they haven’t always been, so there’s some inconsistencies there.

The biggest problem here is that many Linux distros patch their copies of Python (specifically sysconfig and/or distutils) so that it matches the distro policies. As long as you query the sysconfig of the Python interpreter you’re installing into, you should be fine. But if you’re trying to infer how a different interpreter will expect to have things laid out, expect headaches…

You should never have to specify this directory in PYTHONPATH, as it will be automatically detected and added by site.py. Any distro that changes the location should also patch site.py to include the correct path.

Launching Python with the -S option will disable importing site, which means this directory will be excluded. This is very useful for diagnosing issues due to installed packages.

This is the “Correct™” answer, and it’s unfortunate that it requires being able to launch the Python interpreter in question (which may not be possible when cross-compiling, for example).

I have no idea - on Windows these concepts don’t exist, so I’ve never had to figure it out :slight_smile:

Best impression I’ve gotten from dealing with builds and getpath is that purelib is for platform-independent files (i.e. *.py) while platlib is for platform-specific files (i.e. *.so and also sometimes *.py, so… good luck). But that seems to be little more than a convention, probably lingering from the days where your local filesystem was made up of multiple network shares and you didn’t have space to replicate the purelib onto every single platlib share.

I’m very unclear what you mean by this. Does this mean that if I have a Spack environment (in its “unique installation prefix”) and add Python to it, it may just link in my system install? If I then add a package written in Python, do I have to launch Python through the Spack environment in order to access that package? Or can I launch the system install directly to get it?

In the first case, I’d say you want your link to the system Python to include a PYTHONPATH setting to an arbitrary directory that you control where you put your packages. The layout is irrelevant, it’s just a directory of importable code.

In the second case, I’d say you want to install a stub package into the system Python that includes a spack.pth file with one line of text that is the path to an arbitrary directory that you control where you put your packages.

You’ll notice the common piece here :slight_smile: Don’t try to drop packages into a system install unless you’re the system package manager. Instead, put them somewhere that you control fully and tell Python to look there, either with an environment variable (tied to the entry point) or a .pth file (tied to the “site” or installation).

1 Like

I’ve never used Spack but if it’s anything like Nix, each Python package is installed under its own prefix and the site-packages folder is exposed in the environment using PYTHONPATH. This way you could, in theory, install a bunch of Python packages in a Nix environment without depending on Nix’s Python, and the system Python would pick them up. I’ve never seen this being advertised as a feature however and Python packages in Nix depend on the Python interpreter.

installer decides based on the value of Root-Is-Purelib in the wheel metadata (refer to the wheel spec). I assume pip does the same.

To clarify, I’m not looking for the default site-packages directory where modules that come with Python can be found. When I install something with pip install --prefix=<prefix>, I’m wondering about the latter half of <prefix>/lib/pythonX.Y/site-packages and how to determine which directory to add to PYTHONPATH.

Yes, we definitely run into this issue with Spack. I’m still trying to figure out how best to handle this situation. I’ll likely just guess lib/pythonX.Y/site-packages since it’s most common.

What I mean is that you can either let Spack install a new copy of Python or reuse the system one. With Spack, all packages are installed to their own unique installation prefix, including Python. You can choose to create a symlinked environment with these packages, or you can load them into your PYTHONPATH and use them as is.

Yep, Spack installs everything to its own prefix under its control. You can choose to symlink things to a system location if you want, although it’s not recommended. Spack was originally designed for users on supercomputers where you don’t have admin privileges anyway.

Yes, this is exactly how Spack works.

Any idea how build decides when to set this to True vs. False?

purelib is for pure Python, platlib is for architecture dependent, similarly to /usr/share and /usr/lib in Linux. In practice almost everyone sets them to the same path, I recommend you do the same if you don’t have any reason not to.

That’s up to the build backend entirely, for setuptools uses this check at package time wheel/bdist_wheel.py at main · pypa/wheel · GitHub. build makes no decisions here as build is not the one actually building the wheel. It just creates the environment to perform the build but then the wheel generation is entirely up to the build backend and each of them might make their own decision.

It doesn’t. build simply orchestrates builds in a backend-agnostic manner. The backend is responsible for producing wheels. setuptools sets Root-Is-Purelib to false if the distribution contains extension modules or libraries; I don’t know about other backends.

Put another way, purelib comes under sys.prefix; platlib comes under sys.exec_prefix. If these two are the same, purelib and platlib will be as well.

That’s just the default, but distributors of the python interpreter are allowed to overwrite these two independently of those prefixes.

Overwrite them how? By patching sysconfig? And then do they also patch site? Because if they change the prefixes in sysconfig, site won’t be able to locate the site packages. And why would anyone…

Debian Python patches both of those places AFAIK.

Debian doesn’t patch sysconfig at all. If you are looking to extract the scheme paths from sysconfig on Debian you are all out of luck. However Debian doesn’t alter the relationship between prefix and purelib and exec_prefix and platlib, which is what I was trying to explain. What they do in site is load f"{prefix}/local/path/to/packages looping over [sys.prefix, sys.exec_prefix], i.e. they add another level to the prefix, and, on Debian, prefix and exec prefix are the same, as are the paths following the prefix. There are genuine problems with the way Debian patches Python but they have nothing to do with the distinction between prefix and exec prefix.