Explicit optional cythonization via config-setting post pep517 and install-option deprecation

Hello, I am the co-developer of reedsolo, a Reed-Solomon codec. It provides both a pure-python implementation and a faster cythonized extension, both bundled together because they aim to provide the same interface and same coverage and math guarantees (the pure python implementation being the reference). Before pep517, users could install either :

  • by default the pure python implementation, that did not require any dependency nor a C compiler apart from the CPython interpreter.
  • optionally by supplying a --cythonize argument (ie, python setup.py --cythonize , or pip install reedsolo --install-option=--cythonize): the creedsolo.pyx file would be cythonized and then compiled into a creedsolo.dll or creedsolo.so module.

I am now moving to a modern packaging specificaton, with a pyproject.toml file instead of setup.py for most purposes. I could successfully use extras to allow the optional install of Cython. Nevertheless, setup.py is still required by setuptools for extensions, or other custom classes for other backends, because contrary to python scripts packaging, extensions packaging is not standardized under the latest PEPs.

Now, my latest remaining issue is that it seems there is no way to allow explict optional cythonization:

  • implicit optional cythonization is possible with other backends than setuptools, such as poetry. Enscons may also work, Hatch does not yet. However, implicit optional cythonization is not robust from a user standpoint: this is what we used to do for reedsolo for years, but we kept seeing issues pop up from users who had Cython installed but no C compiler (especially on Windows). This is why we had to switch to explicit optional cythonization (ie, the user can trigger cythonization manually with a commandline flag).
  • for explicit optional cythonization to work, I need a way to pass a flag to my setup.py before the build process. Before, --install-option could be used with pip, but this has been deprecated in issue #11358. Clearly, the recommended workaround (not solution) is to pass an environment variable. However, I don’t like relying on the OS, just like we don’t just provide all the build metadata as environment variables but inside a pyproject.toml file, I would also prefer to use a standard mechanism to propagate some custom flag from the user to my build process.
  • It seems there is an attempt at standardizing extensions packaging with extensionlib, but it seems the project is lacking updates at the moment and it’s not clear how it can be used for my purpose or even if it can allow explicit optional cythonizatino.

I ended up doing the following two things to make it work:

  • pass the --cythonize flag through --config-setting. But since documentation was lacking, I had a very hard time understanding how to pass the flag to by setup.py build process, I ended up using the following: pip install --upgrade reedsolo --config-setting="--build-option=--cythonize" --use-pep517 --verbose.
  • I dropped my attempts to make cython an extra/optional dependency, because when I did so, setuptools would not do its magic to make the Cython module available in setup.py, so it resulted in an ImportError. I ended up making cython a required dependency in pyproject.toml, but I can live with that. Before, I wanted to make it an optional dependency to avoid overwriting user’s cython (I am using a beta release), but now with isolated builds this is a non issue. I will regardless open an issue (/EDIT: see issue #3880 on setuptools) because I guess it may bite others too.

Is this correct? I mean, am I propagating the flag properly, via --config-setting="--build-option=--cythonize" , won’t this get deprecated?

Thank you very much in advance for your insights!

One approach I’ve seen people doing is using environment variables instead of or alternatively to command-line options, e.g.:

if os.environ.get("REEDSOLO_CYTHONIZE", "0") == "1":
    ...

or alike.

It has the advantage that it is passed through without any magic (though I’m not 100% sure if this was intentional).

1 Like

AFAIK, there there isn’t a way to specify optional build dependencies. Specifying Cython as a build dependency also when you don’t need it inevitable. Fortunately, on most platforms, the Cython binary wheel can be installed, resulting in a fairly low setup cost.

For building Python wheels with native language components, I think the most interesting build backend ot use is meson-python. Full disclosure: I’m one of the meson-python maintainers.

Having an optional Cython module included in the package is fairly straightforward. Specify the meson-python build backend in your pyproject.toml:

[build-system]
build-backend = 'mesonpy'
requires = [
  'meson-python >= 0.12.0',
  'meson >= 1.0.0',
  'cython >= 3.0.0a11',
]

Define a Meson project option to enable or disable compiling the Cython module in meson_options.txt (this will be meson.options if you can depend on Meson version 1.1 or later):

option('cython', type: 'feature', value: 'auto')

Add the build definition in meson.build:

project('optional-cython', version: '0.0.1')

py = import('python').find_installation()

cython = add_languages('c', 'cython', required: get_option('cython'))

if cython
    py.extension_module(
        '_optional',
        'src/_optional.pyx',
        install: true,
        subdir: 'optional_cython'
    )
endif

py.install_sources(
    'src/__init__.py',
    subdir: 'optional_cython',
    pure: not cython
)

That’s all is needed. If the package is build on an host where a C compiler is found, the Cython module is compiled, otherwise it is not. Note that Cython is installed as part of the build dependencies, thus is always found. To force using Cython, compile with python -m build -w -Csetup-args=-Dcython=enabled. This will result in an error if a C compiler is not found. To disable the Cython module compile with python -m build -w -Csetup-args=-Dcython=disabled.

The only tricky part is passing the pure keyword argument to py.install_sources() if you don’t do that, the __init__.py gets installed by default into purelib/optional_cython/ also when the package contains an extension module installed in platlib/optional_cython/. On most platforms purelib and platlib are the same installation directory, thus things work as expected, however, having the same package in purelib and platlib is undefined behavior and better avoided. Thus the pure: not cython argument: if the package contains an extension module, all the components of the package are installed in platlib.

2 Likes

I’ve seen another project (PyOpenGL) manage this by releasing a separate binary-only package (pre-compiled into wheels for most platforms, but also an sdist) for the C-acceleration. This wheel could be depended on either directly or via an extra. The pure-Python would attempt to import, otherwise fall back to the Python implementation

1 Like

AFAIK, I thought Setuptools was the only major standalone backend using setup.py, and the others either don’t support extension modules directly or have their own (often declarative) config to do so?

Just to be clear (I believe it was mentioned on the issue as well), but that doesn’t work because extras are for optional runtime dependencies, not build-time dependencies (like Cython, for your case).

Given the vast majority of your users should hopefully be obtaining your package via your pre-built wheels (or downstream redistributors like Linux distros using their own build systems), rather than building from source via the sdist, and most of the additional overhead seems likely to come from the C build itself rather than installing one additional package (cython) into the isolated build env, just adding cython to the build dependencies, and clearly documenting what users should pass to config-settings to enable it, seems to be a reasonable approach at present.

1 Like

Thank you all so much for your suggestions and insights! This is tremendously helpful to me, now I have got a lot of possibilities where I thought I had little to none! :smiley:

@mgorny Thank you for your suggestion! I actually implemented both approaches now, because environment variables are necessary to pass build-time variables for cibuildwheel to propagate to my project’s setup.py (and hence build the cythonized extension). I have no issue using environment variables for development tools, but I think it’s not a clean design choice to rely on an external factor, the OS, for the end user targeted build process.

@daniele This looks like a very clean solution, thank you very much for taking the time to write an extensive example, I will have a close look! One question: since I essentially replace setup.py by meson.build to handle the cythonization and C extension building, does it mean that I can theoretically remove setup.py entirely? (and just rely on pyproject.toml + meson.build)

@EpikWink Separating the C extension in a distinct binary-only package is something I considered, but… I did not think about using extras/optional-dependencies as a way to offer the C-accelerated module, that’s clever! In my case I don’t know to overload the pure python implementation with the C extension because they have slightly different scopes (the python implementation is a bit broader), but I could probably offer both with namespace packages! Thank you for your suggestion, I will have a look!

@CAM-Gerlach

AFAIK, I thought Setuptools was the only major standalone backend using setup.py , and the others either don’t support extension modules directly or have their own (often declarative) config to do so?

Sorry for the confusion, I meant that currently there is no standardized way to support extensions, so either we need to keep using setup.py with setuptools, or for other backends we have to design custom classes (for the few that support extensions), there is currently no way to my knowledge with any backend to define extensions in a purely declarative way (I know this would limit functionalities, but it would still fit the bill for 80% of use cases, per Pareto rule we could leave more advanced custom class definitions for the 20% remaining cases).

Just to be clear (I believe it was mentioned on the issue as well), but that doesn’t work because extras are for optional runtime dependencies, not build-time dependencies (like Cython, for your case).

Yes, I thought that extras_require worked to import Cython optionally, but it was actually a side effect of a lack of build isolation! I talk more into the issue here, but yeah it’s a great example of why moving to PEP517 standards is the way to go, even if it takes some time to adjust, especially when migrating old non pure python projects :sweat_smile:

Thank you for confirming that using config-settings is OK at the moment, I understand it can change in the future, but this may be good enough for now :slight_smile:

I would like to clarify that Cython used to be an optional dependency in the past because reedsolo is meant to primarily be a 0 dependency pure python implementation, as its primary goal is to ensure long term archival and reproducibility. It’s meant to be used as the core routine to protect and restore data, and a lot of other similar libraries got unusable under a decade or less because of their dependencies being out of date, so a major goal here is to ensure this won’t happen, even if the pure python implementation is slow, it has no dependency and (should) work the same across Python releases since Python 2.7 up to Python 3.11 and across OSes.

I understand my need is a bit extreme, and this module was first published in 2015, the python packaging landscape was a lot different then, pip was nowhere near as functional, nowadays post PEP 517 I guess it’s OK to require Cython even if it’s not necessarily used IMHO because it’s only at build-time and under build isolation, so it won’t break things at run-time anyway nor wreak havoc in the user’s installed packages.

You’re welcome. setup.py is ignored by meson-python and by Meson thus you can remove it from your package. Note that SciPy, NumPy and many other packages with complex builds have moved or are moving to meson-python. It is proven a solution that is planned to be supported for quite some time. I would not hesitate to port your package to it if it makes maintenance easier.

1 Like

In fact, it is possible to specify optional build dependencies. The PEP 517 get_requires_for_build_wheel hook and an in-tree build backend will be helpful. Here is an example:

@lrq3000, you can have a look on these docs if you use setuptools: Build System Support - setuptools 69.0.3.post20231214 documentation

Please note however that you have to be careful to not fail when importing cython when building sdists (the get_requires_for_build_wheel will only take place for the wheel not sdist).

If you go down that route, you will also have access to config_settings to decide if you want to add cython.

Sure. I was thinking “there isn’t a standardized way to declare optional build dependencies via pyproject.toml settings” but the qualification clause got dropped between the thought and the keyboard. Indeed, if you write your own build back-end, many more things become possible. Thank you for pointing out the oversimplified statement!