Need to run [Make] when building wheel from sdist or installing the latter, how?

rmn · September 19, 2024, 10:18am

Hi all,

I have learned some very rudimentary Python packaging best practices and methods, reading the Python Packaging Guide and the documentation of setuptools. I have a fairly typical Python package I want to distribute (principally to PyPi) and am using the so-called “src” packaging layout. I have a pyproject.toml which specifies to use setuptools as the build backend. Packaging seems to work, except that I need to process a Python template file (valid Python module code but not intended for immediate importing) which is encapsulated with a Makefile. Obviously, just running python -m build doesn’t get me any such processing.

How can I accomplish running e.g. make when the wheel is built from the sdist? The sdist is otherwise perfectly valid, as it includes the template that needs processing (automatically since it’s a Python module under the “src” directory, after all) and I have also got the sdist to include the Makefile, through a MANIFEST.in. So while the sdist is now sound at least in terms of information it contains that is in principle sufficient to process the template, I need to somehow have the latter be triggered during building of the wheel. Am I thinking right, even?

I imagine make is run when the wheel is built and the result of the template processing, a Python module (which make builds beside the template in the “src” directory) is included in the wheel. There’s also the template processing script that is included in the sdist (as it should) but which I naturally want to exclude from the wheel.

I have spent considerable time with the aforementioned documents, but I find them to simultaneously be too verbose while being too vague and missing what I seem to be after (if my assumptions are even remotely valid, that is), so I have come up empty so far, except for one slimmer of hope I hold – to just use setup.py and “do things myself” (keeping the “pyproject.toml”, though).

sinoroc · September 19, 2024, 5:06pm

Are you sure you want to run this make step when building the wheel? If possible I recommend doing this step when building the sdist.

In any case, what you need is adding a “custom build step” to setuptools. This requires a setup.py file and it is perfectly fine to do and compatible with the modern packaging practices (including pyproject.toml). The documentation for this is in the “Customizing Commands” section of the “Extending or Customizing Setuptools” page. I agree that this documentation is hard to read and apply.

I guess something like this in setup.py (completely untested, consider it “pseudo code”):

import subprocess

import setuptools

class MySdist(setuptools.command.sdist.sdist):

    def run(self):
        subprocess.check_call(["make", "something"])
        super().run()


setuptools.setup(
    entry_points={
        "distutils.commands": [
            "sdist = MySdist",
        ],
    },    
    # Everything else that can be placed in `pyproject.toml`
    # should be in `pyproject.toml`
)

rmn · September 19, 2024, 6:12pm

Thank you for your informative answer.

First things first: you asked if I am sure I want to run the make step when building the wheel, specifically – the answer is yes, I am fairly sure I do want exactly that, the sole reason being that in my understanding the sdist should contain the original source code, if possible, and in my case it is the template and associated instantiation script(s) that are, in fact, the original source code (I wrote them, after all), while the corresponding Python module that is compiled by instantiating the template is naturally more of a build artefact. I want parties obtaining the sdist to have access to the same source code I have access to, directly (i.e. not through third party like Github which would be storing the template file and instantiation scripts that the sdist otherwise wouldn’t), resting assured they can build the wheel from the sdist, following the same procedure I am now trying to add to the wheel building process. Does that answer your question in a satisfactory manner?

Your code snippet looks promising indeed, and I have already taken a look at something very similar: python - How to exclude a single file from package with setuptools and setup.py - Stack Overflow

By similar I do mean similar – they aren’t solving the same problem, admittedly, but employing a similar or even the same setuptools extension API.

I don’t want to digress, but I would like to understand the following:

Do I use the entry_points or the cmdclass parameter to setuptools.setup? I found a brief explanation of the difference, but I must admit it doesn’t explain much to me personally.
In your code snippet specifically, do you not program for the make step to run before (or during, if you will) the sdist is built, effectively? I am asking in light of my wanting to run make when sdist is used as input for building the wheel.

Anyway, again, thank you – a lot of useful information to me in your answer alone.

sinoroc · September 19, 2024, 7:41pm

I do not recall seeing clear statements along those lines. But I guess it is a valid interpretation.

My interpretation is that the sdist MUST be portable (multi-platform, and so on). If a build step is necessary and the output file of this build step is not portable, then I would indeed place the source file in the sdist and have the build step between sdist and wheel. Obviously. So far, so good. I assume we are on the same page. Now, on the other hand, if the output file of the build step is portable, then I do not see any reason not to have the build step between source tree and sdist, and place the built output file in the sdist. This way users never have to deal with the build step even when installing from sdist. Small gain, I agree, since almost all installations will happen from the wheel, but I do not see the point of not getting the build step out of the way. Of course, my interpretation might be wrong, no reason it should be better than yours.

Setuptools documentation (that I linked in previous post) does not mention cmdclass. My guess is that it is outdated and deprecated. But it does mention entry_points, so that is what I would go for.

Yes.

I guess for the make step to happen between sdist and wheel, you would need to overload the build_py instead of the sdist command.

JamesParrott · September 19, 2024, 9:01pm

I haven’t looked into Make integration. But for CMake integration I found the established very powerful and sophisticated tools, were far far harder than home brewing a simple script runner using a hook in Hatchling, which was super easy following Ofek’s suggestion. It should work for anything that needs to be run from the command line.

I’ve used this on a couple of projects now and it works great. The minimal example is something like:

pyproject.toml

[build-system]
requires = ["hatchling", "hatch-requirements-txt"]
build-backend = "hatchling.build"

# ...
# Normal project metadata
# ...

[tool.hatch.build.hooks.custom]
path = "hatch_build.py"

hatch_build.py


import subprocess

# https://discuss.python.org/t/custom-build-steps-moving-bokeh-off-setup-py/16128/3
from hatchling.builders.hooks.plugin.interface import BuildHookInterface


class CustomHook(BuildHookInterface):
    def initialize(self, version, build_data):
        subprocess.run(command)

I found copying in the env was a good idea for what we needed. And unlike on Windows, building on Linux needed shell=True. But I was calling subprocess.run with carefully crafted build commands, hardcoded in string literals, not arbitrary user input.

github.com

fiftysevendegreesofrad/sdna_plus/blob/Cross_platform/hatch_build.py

import os
import sys
import sysconfig
import subprocess
import pathlib
import dataclasses
import shutil

# https://discuss.python.org/t/custom-build-steps-moving-bokeh-off-setup-py/16128/3
from hatchling.builders.hooks.plugin.interface import BuildHookInterface


REPO_DIR = pathlib.Path(__file__).parent

@dataclasses.dataclass
class Config:
    build_dir: str
    generator: str
    use_zig: str
    shell: bool = False  # Needs to be True on Linux, but adds security on Windows.

This file has been truncated. show original

rmn · September 19, 2024, 11:00pm

I have been trying to get two things working. The first was to run make when building the wheel from the sdist, and I have at least found a way to accomplish that, although I can’t say I am satisfied with how it has to be done, relying now on a very imperative setup.py-based procedure:

from setuptools import setup
from setuptools.command.build_py import build_py
import subprocess

class Foo(build_py):
    def run(self, *args, **kwargs):
        subprocess.check_call(('make',))
        return super().run(*args, **kwargs)

setup(cmdclass = { 'build_py': Foo })

Second, even though I am now running make, I also now need to exclude the Python template and the template instantiation script (both being Python files which I have under the package sources directory, both present in the sdist) from being packaged with the wheel. This second thing has so far proved to be far more difficult than the first.

Together these problems are giving me so much grief I am not sure if I shouldn’t abandon the approach altogether – that I should just instruct people [who want to build the distribution packages] to run make first, then python -m build, just to get rid of the problems I have created with my insistence. Python does likes to impose its own order of things, and once you stray off the beaten path it’s all thorn and bush, and I think I am learning this the hard way.

If anyone has good ideas, I’m still good for trying, but I need to exclude the template and the associated instantiation script if I stick to my original approach, and if I don’t find a way to accomplish that, my only recourse is to go for the described alternative, instead. What I tried (imagine it merged with the snippet above), unsuccessfully:

excluded = ('*/*-template.py', '*/template-instantiate.py')

class Foo(build_py):
    def find_package_modules(self, *args, **kwargs):
        return list(item for item in super().find_package_modules(*args, **kwargs) if not any(fnmatchcase(item[2], pattern) for pattern in excluded))
setup(cmdclass={ 'build_py': Foo }, packages=find_packages(where='src'), package_dir={ '': 'src' })

With the above, although the find_package_modules procedure does gets called, and indeed the list of modules it returns does not include the two Python modules I want excluded, the wheel still ends up including them, and I can’t for the life of me figure why that is and what I am doing wrong.

sinoroc · September 20, 2024, 6:10am

If you were providing us with a “minimal reproducible example” it would be easier to help you.

This is the reference I had written for myself for all “packaging data” related things: https://sinoroc.gitlab.io/kb/python/package_data.html. I had written it a while ago, so maybe it needs a bit of tweaking to follow modern approaches, but in principle it should still be valid. In short if you have something in the sdist that should not be in the wheel (and thus should not be installed), use exclude_package_data.

rmn · September 20, 2024, 9:20am

Unfortunately in my case the files are not package data but Python code, which gets special treatment from setuptools. A minimal reproducible example below (The “.” prefix in paths denotes the directory I invoke python -m build in, i.e. the so called build root directory):

# Contents of "./src/foo/bar-template.py" below

# Some Python code...

# 0140a7b3-0124-4911-b56e-12a69937a77b; This entire line, tagged with the preceding UUID, is replaced by a body of auto-generated code during template instantiate phase, in the input stream, to form the output stream

# Some more Python code....

# Contents of "./src/foo/instantiate-bar-template.py" below

import re
import sys

for line in open(sys.argv[1]):
    if re.match(r'^# 0140a7b3-0124-4911-b56e-12a69937a77b(?:;$)', line):
        sys.stdout.write(autogenerated_code()) # definition of `autogenerated_code` not provided for simplicity, rest assured it's just a procedure that returns Python source code (text)
    else:
        sys.stdout.write(line)

# Contents of "./Makefile" below

src/foo/bar.py: src/foo/instantiate-bar-template.py

src/foo/bar.py: %.py: %-template.py
    python $(lastword $^) -- $< > $@

# Contents of "./src/baz.py" below

# Some Python code

# Contents of "./pyproject.toml" below
[build-system]
requires = [ "setuptools", "setuptools-scm" ]
build-backend = "setuptools.build_meta"

[project]
name = "foo"
description = "Foo"
dynamic = [ "version" ]
requires-python = ">=3.11"

[tool.setuptools_scm]
# Enable `setuptools_scm` build plugin, to automatically derive information from the Git repository.

# Contents of "./MANIFEST.in" below
exclude src/foo/bar.py

I also have the setup.py file as per my earlier post.

The ./src/foo/bar-template.py and ./src/foo/instantiate-bar-template.py should be left out of the wheel, since they’re not really part of the package I am distributing, just part of build tooling in essence, means to produce ./src/foo/bar.py (which is part of the package) in the wheel. I could make this particular problem easier by holding the two former files in ./templates or some other directory that is not part of ./src instead, but that has no effect on the other problem. I do acknowledge that holding these in ./src may be inappropriate in context of what the build framework expects be contained in ./src.

Nodd · September 20, 2024, 11:15am

Another option is to delete the unwanted files during your build process. Not particularely clean, but it should work.

rmn · September 20, 2024, 6:28pm

Currently, I am concluding that my “cleanest” option – while not the tersest – is to use an in-tree build sub-command implementation like instantiate_template (a custom plugin entry-point), while also moving all of the Python code that isn’t source proper out of the “src” directory. The former will get me a rather idiomatic custom [wheel] build step which I can program to instantiate the templates, while the latter will ensure the tooling code doesn’t get distributed as part of the package code. If all of it works, I can still get my holy grail of containing the entire building process from source code to wheel, in the sdist file.

rmn · September 21, 2024, 5:44pm

I ended up adding my own setuptools/distutils build sub-command through setup.py, one that encapsulates running of make which in turn encapsulates template instantiation (or macro processing as I’ve come to refer to it). That alone solves one of the problems I set out to solve, where I wanted building of the wheel to process the macros from sources in the sdist. The setuptools API even takes automatic care of bundling the macro processing artefacts together with the rest of the code in src. By moving of macro processing tooling and sources (aka templates) out of src, I no longer have to worry about removing things from the wheel, and with both of the problems solved, I have arrived at exactly what I wanted.

Because I have had to trawl wide and deep for relevant documentation, which seems to be sorely lacking for setuptools, and with distutils being removed from Python 3.12, even with what information is available at http://setuptools.pypa.io, I remain convinced this was way too hard for something of this scope and value, and I am fairly certain other people will have to repeat my story, before hopefully arriving at something usable. So for posterity at least, my setup.py:

import setuptools.command.build
from setuptools import Command, setup

import os
import os.path
import subprocess

class MakeCommand(Command):
    """Class of `setuptools`/`distutils` commands which invoke a `make` program.

    GNU Make (http://www.gnu.org/software/make) is currently assumed for providing `make`. The program is invoked in a manner where it changes the working directory to a build directory advertised for the command (utilizing `self.set_undefined_options` as hinted at by the [documentation](http://setuptools.pypa.io/en/latest/userguide/extension.html) which defers to `help(setuptools.command.build.SubCommand)`).
    The command is expected to produce build artefacts which will be added to the wheel.
    """
    build_lib: str | None
    def finalize_options(self) -> None:
        self.set_undefined_options('build_py', ('build_lib', 'build_lib'))
    def initialize_options(self) -> None:
        self.build_lib = None
    def run(self, *args, **kwargs) -> None:
        os.makedirs(self.build_lib, exist_ok=True)
        subprocess.check_call(('make', '-C', self.build_lib, '-f', os.path.realpath('Makefile')))

class BuildCommand(setuptools.command.build.build):
    sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build_command`, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)

setup(cmdclass={ 'build': BuildCommand, 'build_make': MakeCommand })

My Makefile isn’t necessarily relevant for posterity, as the build_make command simply invokes make in the directory (the -C switch) setuptools uses for staging the wheel build, so I am omitting my most recent copy. Same concerns my macro expansion logic – it’s encapsulated by make and may not be relevant to anyone else.

The rest of the project (pyproject.toml, MANIFEST.in) are unchanged from earlier.

sinoroc · September 22, 2024, 1:18pm

I have been looking at custom commands for setuptools for years now (rarely for my personal needs, mostly to help others here or on StackOverflow, so I never really did dig very deep), and I must say that finding good information has proven quite difficult. I never managed to find one resource satisfying to work with.

There are bits and pieces here and there. It is almost never clear if the info is up to date (or refers to old distutils code for example). It is often necessary to look deep into the setuptools code itself. Sometimes looking at other projects that do implement custom commands can help, but again it is never clear if these projects follow the current best practices or if as anyone else they struggled writing something that is just good enough for their needs and then left it at that.

For example, since we talked about this in this thread, the cmdclass which is an essential part of this topic, is still quite a mystery, and of course I gave you incorrect info about this in one of my earlier posts. It is documented for old distutils but not for setuptools.

It is somewhat better than it used to, though.

Anyway, it is fair enough to look at other build back-ends.

rmn · September 26, 2024, 11:27am

It’s quite the coincidence I came back here to refer specifically you, @sinoroc, to a particular thread touching upon the entry_points v. cmdclass parameter, which I started with the Discussions page part of the setuptools Github repository pages: What is the difference between the `entry_points` and the `cmdclass` parameters to `setuptools.setup`? · pypa/setuptools · Discussion #4656 · GitHub

Thing is, I share your sentiment expressed with above – that it’s not clear which of the two should fulfill which purpose. I couldn’t get entry_points to work for my use case, the latter being addition of a custom build step which I ended up writing using cmdclass, but I no longer recall why I couldn’t adapt your approach, unfortunately.

The general problem with setuptools documentation is quite common, I am afraid – the shortest way to the goal is to document something in prose covering most bases. The problem then roots in the quality of the source code and the documentation therein (with doc. strings) – if the code isn’t easy to read, one is effectively bound by the prose documentation. And in my case it definitely wasn’t sufficient, but trying to read the code I quickly gave up. The fact that setuptools still uses distutils under the hood, while the latter is about to be removed from Python or in the very least un-documented (the Python 3.12 documentation for distutils is removed: https://docs.python.org/3.12/library/distutils.html), which is quite old, if solid, code – doesn’t make matters better.

sinoroc · September 26, 2024, 5:47pm

Yes, I had looked it up further in the meantime. I did not post here because you are already using cmdclass anyway, which is the correct choice for you use case.

For entry points to work, they need to be “installed”. Entry points are meant for setuptools plugins like setuptools-scm. for example. You would list the setuptools plugin in the build system requirements, so that its entry points are installed in the build environment and discoverable by setuptools.

But for a custom build command or custom build step to be effective for the setup.py of your own project (from inside the setup.py), then cmdclass is definitely the way to go.