Options to build the same package in different ways (with different build dependencies)?

paugier · June 17, 2020, 9:32am

For some packages, we really need to have different builds (producing different wheels), with different build dependencies.

My motivation is in particular about the fluidfft and fluidsim packages and we need to be able to build with and without Pythran and with and without MPI and mpi4py.

I don’t enter in the details of why we need this because it’s going to be long to explain (I can do it if it is useful).

The solution we choose was to build differently depending of the result of import mpi4py and import pythran with try/except statements. It is far from optimal but it works. Of course, it forbids the use of isolated build and therefore pyproject.toml. Note that it is similar to how conda works (the package installed by a command depends on what was installed in the environment).

However, I feel that soon having a pyproject.toml will be kind of mandatory to stay mainstream and get good behavior with many Python tools.

Unfortunately, all examples I see with pyproject.toml are very simple and I don’t see any mechanisms to specify options to build the same package in different ways (potentially with different build dependencies) so that it produce different wheels. For example, in conda the fftw package (https://anaconda.org/conda-forge/fftw/files) is build as fftw-3.3.8-mpi_openmpi, fftw-3.3.8-mpi_mpich, fftw-3.3.8-nompi, etc.

Could it be possible to support things like

pip install fluidsim               # build with pythran as build dependency (or get the wheel if available)
pip install fluidsim[purepy]       # wheel without extension and with no MPI support 
pip install fluidsim[pythran, mpi] # pythran & MPI

It seems to me that currently the brackets in pip install commands are not at all usable for this usage.

I tried to read PEP 517/518 but I don’t understand how they apply to our needs.

Note also that mpi4py does not upload wheels on PyPI (as far as I understand it is for good reasons because mpi4py needs to be recompiled for different MPI implementations). I know that it was a problem few months ago to add it in [build-system] requires, but maybe it’s no longer a problem?

I hope someone here could give me some good advice.

pf_moore · June 17, 2020, 10:35am

My first question would be how you expect the different wheels (with different dependencies) to be named. When installing a package, installers should be able to look at the wheel filename and determine if the wheel is the correct one to install (that’s the point of the filename standard and the packaging tags in the wheel spec).

If you need different wheel filenames, you need to argue for a revision to the wheel spec. If you can explain how the package index would store these variant wheels without a filename spec change, then that would probably clarify how installers like pip could be told which variety to choose.

EpicWink · June 17, 2020, 11:17am

Could you misappropriate the local version identification? eg, 3.14.15+mpi12, 3.14.15+mpi13, 3.14.15+nompi, then have exact version specifiers.

My suggestion would be to have multiple subdirectories, each with their own “pyproject.toml” etc (setup.cfg, setup.py, libs), then sym-link (or copy, for peace of mind) in the source. This wouldn’t work if you have different package source code between the different builds.

steve.dower · June 17, 2020, 3:53pm

I just posted the seed of an idea that may help you (one day, if/when it ever happens): Idea: selector packages

Would be interested in your thoughts.

pf_moore · June 17, 2020, 4:49pm

Installers wouldn’t be able to do anything other than treat these as part of the version, so they only really work well if you pin exact versions. Christoph Gohlke’s wheel repository for Windows used to use this scheme (I don’t know if it still does) and it worked OK, but not perfectly. And that was for relatively casual use.

So maybe, but probably only if you’re looking at a well-constrained environment, not for general use.

sinoroc · June 17, 2020, 6:55pm

Wouldn’t that be a good use case for the Provides-Dist metadata?

For example one could publish both projects MyProjectPure (pure Python) and MyProjectC (built with C extensions). And to link them somehow on the metadata-level they could both announce as Provides-Dist: MyProject. This way tools (installers) could react appropriately and for example make sure only one or the other is installed.

paugier · June 17, 2020, 7:24pm

Thank you for you answers and suggestions. I understand that there is no proper way to do that (even though Provides-Dist could be useful).

A very hacky way would be to modify pyproject.toml dynamically to be able to publish different packages (with different package names and different build dependencies) from one unique source.

It is strange (and unfortunate) to have to do this by hand but it could work. For example fluidsim-purepy could depend on fluidfft-purepy and fluidsim[mpi] (here, just an extras_require since there is no extension using MPI in fluidsim) could depend on fluidfft-mpi.

dstufft · June 25, 2020, 5:24pm

This feels like a desire for some sort of “variants” feature that I’ve seen pop up before. Maybe it’s something we should be trying to push on in some way to get something figured out.

paugier · June 26, 2020, 1:42pm

It seems to me that it could be useful if tools like pip, poetry, pipenv, etc… could use a pyproject.toml dynamically produced from one string. Of course, the production of the pyproject.toml file has to be done without external packages but it is not really a bad limitation. Just pure python and the assurance that the python code is run from a root directory of the project.

In my case, I can use different pyproject.toml files to produce different variants of a package. These variants can be uploaded to PyPI under different names. It’s very hacky but so far so good.

However, there is only one pyproject.toml in the root directory of a repository, so only one variant of the package can be installed directly from the repository (with something like pip install git+https://github.com/serge-sans-paille/pythran#egg=pythran).

pganssle · June 26, 2020, 7:16pm

I don’t think you need any sort of dynamically produced pyproject.toml file. If you are building from source, what you want to do is available today, because PEP 517 has a mechanism for backends to dynamically provide build dependencies: get_requires_for_bdist_wheel. You can already use setuptools to achieve this; here’s an example setup.py:

from setuptools import setup
import os
import sys

GREP_TARGET = "x11234"
prefix = f"{GREP_TARGET} ({sys.argv[1]})"

try:
    import attr
    print(f"{prefix}: Attrs imported!")
except ImportError:
    print(f"{prefix}: Attrs not imported.")

if os.environ.get("BUILD_FLAG", None) == "1":
    setup_requirements = ["attrs"]
else:
    setup_requirements = []

setup(
    name = "mypkg",
    version = "0.0.1",
    setup_requires = setup_requirements
)

Pair it with an appropriate pyproject.toml, and you can use an environment variable to determine whether attrs is present while the wheel is built:

(venv) $ pip install . -v 2>&1| grep "x11234"
  x11234 (egg_info): Attrs not imported.
    x11234 (dist_info): Attrs not imported.
  x11234 (bdist_wheel): Attrs not imported.
(venv) $ BUILD_FLAG=1 pip install . -v 2>&1| grep "x11234"
  x11234 (egg_info): Attrs not imported.
    x11234 (dist_info): Attrs imported!
  x11234 (bdist_wheel): Attrs imported!

The real problems here are not about providing a mechanism to do this (as you can see, such a mechanism already exists), but more about recording metadata about it. How do you tag the wheels? How do you tell pip which version you want? How do you declare dependencies on this?

I don’t think there’s any good solution to the metadata/labeling problem that exists today. I would definitely love to see support for something equivalent to Recommends-Dist (optional dependencies where the default is some flavor of “install it if you can”), which would make it easier to have a bunch of build variants that are selected either with extras or as a chain of fallbacks, but that’s just my own personal wish list.

paugier · June 26, 2020, 10:33pm

I understood pyproject.toml and the isolated builds were used in particular to avoid setup_requires and having to specify build dependencies in the file setup.py. I also read that there are potential problems with setup_requires. I would prefer something cleaner than using both [build-system] requires plus setup_requires.

You mention get_requires_for_bdist_wheel and PEP 517. It would be good to provide somewhere simple but realistic examples of how to use these things. I find only the PEP on the web (maybe I was just not able to find the good resource) and this part Build backend interface is not simple. For example, I guess it should not be very difficult to reproduce your example (build dependency depending on an environment variable) with get_requires_for_bdist_wheel but how?

How do you tag the wheels? How do you tell pip which version you want? How do you declare dependencies on this?

Yes, an environment variable is not sufficient. Something like pip install my-super-package[a-string-about-the-variant] would be nice. Of course the name of the wheel should somehow contain a-string-about-the-variant.

uranusjr · June 27, 2020, 12:35am

pyproject.toml (specifically the build-system.requires array) is created to avoid setup_requires being the only way to specify build-time dependencies, and thus setuptools being the only possible build back-end (because you need it to read the value), PEP 517 uses a two-step process to solve this: First, pyproject.toml specifies what back-end to install and run; once you’re in the context of a given back-end, two hooks get_requires_for_build_wheel and get_requires_for_build_sdist are used to specify additional build-time requirements. For setuptools, they simply provide whatever is specified in setup_requires. There is no point to avoid setup_requires here since it provides the functionalities that you want—a way to specify build-time dependencies dynamically determined at build-time.

If you’re inclined to avoid setuptools entirely, indeed you can also quite easily write your own PEP 517 back-end that uses those hooks to specify what you want dynamically. But you don’t really need to do that if you’re calling setuptools under the hood—setup_requires provides the exact same functionality anyway.

pganssle · June 28, 2020, 3:39pm

To add to what @uranusjr said, the primary reasons you should generally not use setup_requires:

Anything specified only setup_requires cannot be required dependencies for invoking setup.py. If you have import somepackage in your setup.py not guarded under a try/catch or an if statement, the build machinery that installs your dependency won’t ever get invoked, because seutp.py will fail before it calls setup().
When invoking a legacy build (i.e. not a PEP 517 build), setuptools will do the installation with easy_install, which can mess up your Python environment and has many other major flaws.

In your case, neither of these applies, because you are necessarily doing a conditional import, since your intention is to be able to do a full build, even if the package isn’t installed, and as @uranusjr says, setuptools will just add anything in setup_requires to the requirements when get_requires_for_build_wheel is invoked, and easy_install will never be invoked.

In any case, my point wasn’t really “here’s how you can accomplish what you want to accomplish”, because I think that outside of very limited circumstances, build flags of this sort tend to circumvent the expectations of a lot of tools. My point is that in the development of any “variant builds” functionality, the focus should be on the metadata problem, because it’s already technically possible for backends to “dynamically” modify the build requirements (and indeed for setuptools, already implemented).

pganssle · June 28, 2020, 3:48pm

To follow up on my earlier recommendation of Recommend-Dist: This may just be my preferences and biases speaking, but I think this whole issue really comes down to the fact that we don’t have a way to specify a Recommends-Dist “opportunistic” dependency, nor any good way to say, “here are 5 things that can satisfy this dependency”. If you could do that, then with a little extra work, you should be able to refactor your packages to allow for “build variants” in most cases by structuring them as a core package with a bunch of “plugins”. Right now, you can already achieve the “build variants” with metadata, by splitting your package up into four packages:

fluidsim
fluidsim-purepy-backend
fluidsim-pythran-backend
fluidsim-pythran-backend-mpi-support

Then you would declare that fluidsim depends on fluidsim-purepy-backend (or just have fluidsim-purepy-backend ship with fluidsim unconditionally), and have pythran and pythran-with-mpi extras that add dependencies on the other two.

The problem is that in my experience most people who provide multiple packages that do the exact same thing are doing it because they have one package that provides wide compatibility but is slow (e.g. a pure Python version), and one that is faster or better in some way, but is otherwise hard to build, only works on some platforms, etc. Generally you prefer the default to be to install the faster/better one if possible, and if not, fall back to the compatibility mode. But there’s no way to do that, because extras can only add rather than replace existing dependencies.

If we had Recommends-Dist and a system for saying what groups of packages can satisfy a given dependency, you could get the default values you want with the correct fallbacks. For example, you could add a dummy package fluidsim.backend, which can be satisfied by either fluidsim-pythran-backend or fluidsim-purepy, and a recommends-dist on fluidsim-pythran-backend, so that if fluidsim-pythran-backend is unavailable or hard to install for whatever reason, you fall back to fluidsim-purepy. The extras pythran and pythran-with-mpi would add hard dependencies.

Obviously this is a bit more constraining than a situation where you invoke different build options on the core package itself, but it has the advantages that it’s basically how feature flags are implemented with extras now and it has some advantages:

It discourages excessive build-time complexity, because having N tightly-coupled feature flags would require creating and maintaining N separate packages (I see this as a benefit rather than a downside).
It would make the true dependency graph (including feature flags) more amenable to static analysis.
Nothing would need to change about wheel tagging, because the variant features are encoded into the package name.
It would likely play “nicely” with other packaging systems, because it doesn’t really introduce a new “package variant” abstraction — the structure of these packages and their dependencies wouldn’t need to change much or at all when translating them into package management systems that only have the concept of a single “package” that has dependencies.

paugier · June 28, 2020, 8:03pm

as @uranusjr says, setuptools will just add anything in setup_requires to the requirements when get_requires_for_build_wheel is invoked, and easy_install will never be invoked

Thank you for this explanation. I didn’t understand that and I thought setup_requires would still use easy_install.

paugier · June 28, 2020, 9:22pm

@pganssle thank you for your long and interesting answer.

I now approximately see how with pyproject.toml, setuptools and environment variables, the same package could be build in different variants (for example fluidsim-purepy-backend, fluidsim-pythran-backend). I’m still not sure that it would be supported to change the name of the package using pyproject.toml, setuptools and environment variables (the name should not be written in pyproject.toml but computed dynamically in setup.py) but I guess it should work. (In practice in my case, I wouldn’t use a package fluidsim.backend and fluidsim-purepy-backend and fluidsim-pythran-backend would both contain the whole fluidsim code, but anyway, it is not important).

But then for the dummy package fluidsim (which would only be used to depend on fluidsim-purepy-backend or fluidsim-pythran-backend), I would need to compute the runtime dependencies from the list of extras (and maybe from other information, like the availability of wheels for one platform, like “Is fluidsim-pythran-backend available or not?”).

Even without considering this extra-complication (availability of wheels), is it possible to get the list of extras in setup.py and compute runtime dependencies from it? I’ve never managed to do that…

takluyver · August 4, 2020, 5:53pm

I’m trying to apply the recommendation to use setup_requires for dynamically determined build dependencies in h5py (specifically, optional MPI support).

Unfortunately, it’s not quite the case that with PEP 517, setup_requires are only passed through to the frontend. They are exposed through the get_requires_for_build_* hooks, but if they’re not satisfied when the other hooks are called (as they may not be if you use pip install --no-build-isolation), setuptools cheerfully starts trying to install them itself.

I believe this is a bug in setuptools - when it’s being a PEP 517 backend, it shouldn’t be trying to install anything itself. I’ve filed an issue about it.