I’m afraid I really don’t have a clear picture of the overall system here.
I wrote out my thought process below, but first I want to check something:
I lack experience here obviously, but I’m struggling to imagine what some of these convoluted things might look like, or why they would be necessary. In my mind, a wheel can only contain two kinds of code: native Python modules, and… everything else. The native Python modules come from copying .py
files in the sdist (perhaps compiling to .pyc
), while everything else comes from some automated process with setup.py
at the top (even if it just in turn invokes Ninja or CMake etc.)
Creating a wheel, as I understand it, entails:
-
putting the Python code in the right places;
-
creating the non-Python-code pieces;
-
putting the non-Python-code pieces in the right places;
-
adding metadata.
As far as I’m aware, any build backend (including vanilla Setuptools) knows how to do 1/3/4 - at least, it can read some tool-specific config data (like [tool.setuptools.package-dir]
etc. in pyproject.toml
for Setuptools) to figure out what needs to go in the wheel, where it is prior to wheel building, and where in the wheel it should go.
So I would think that the only interesting part is writing code in setup.py
that creates the non-Python pieces and puts them in appropriate places. Once that’s done, the rest is formulaic.
So - where does the convolution come in? Why would it be necessary to do things that build
can’t do, or that are more than just putting some code in setup.py
that ultimately just shells out to some compilers and maybe moves some files around afterward?
Moving on, let’s see if I understand the situation with scipy-openblas32
properly.
Let’s first suppose I have a project where I’ve installed Scipy and call some Scipy function in the code, and Scipy requires some BLAS functionality (i.e., uses a non-Python dependency). I know of a few fundamentally different ways that this could be interfaced:
-
The code is written in C (or perhaps C++), in such a way that it already conforms to the Python-C FFI. The built and installed distribution contains a corresponding .so
(or .pyd
on Windows) file which Python can just import
directly. My understanding is that BLAS has a decades-long history, its own API, and is normally implemented in Fortran, so this doesn’t apply.
-
The Python code uses ctypes
to communicate with a vendored DLL (still .so
, or .dll
on Windows).
-
The Python code expects the system to provide a DLL already; it looks up that DLL (whether by a hard-coded path, or some more sophisticated search/discovery mechanism) and communicates with it via ctypes
.
-
A Python wrapper chooses one of the above strategies at runtime (when using ctypes
, a wrapper would normally be used anyway just to avoid littering ctypes
calls throughout the rest of the code).
Do I understand properly so far? Did I overlook anything?
Then, let me try to shift to the building/packaging perspective. I infer that SciPy is taking the ctypes
approach, and it dynamically wants to use either a vendored DLL or a system-provided one. The existing SciPy sdist includes the necessary pieces to build a vendored DLL, as well as the logic to build and include that DLL in wheels for platforms where it’s necessary. If again as an end user there isn’t a wheel for my platform, I can ask Pip to install from the sdist, and hopefully it will succeed in building the vendored DLL if I need one.
Am I still on the right track?
So, now the goal is to move the DLL-specific stuff into a separate (already existing, in fact) scipy-openblas32
package that doesn’t actually contain any Python modules, and is only provided in wheel form; and then have the vendored DLL come from there when needed.
But the problem is that only some subset of wheels should have this as a dependency; describing it as an “optional dependency” is insufficient because the decision to include it should be made automatically and not by user preference? I.e. the following two situations are unacceptable:
-
a user who lacks a system BLAS, opts to try to install SciPy without the separate BLAS “extra” and then has code fail at runtime when the BLAS functionality isn’t found
-
a user who has a system BLAS, opts for an installation with the “extra” and it’s simply redundant
Aside from that, this isn’t clear to me:
Why is this different from the situation with the overall sdist for SciPy? Surely the work required to build and install SciPy from source, for platforms where BLAS support isn’t provided by the system already, would be a superset of the work required to build and install the BLAS support?