Custom build steps / Moving Bokeh off setup.py

Hi, I am interested to move Bokeh away from setup.py [1] in the near-ish future. However, Bokeh is a cross-language project with compiled TypeScript components that need to be identically included in all published packages (wheel, sdist, conda). Our current build automation does this to build the packages before publishing:

def build_sdist_packages(config: Config, system: System) -> ActionReturn:
    try:
        system.run("python setup.py sdist --install-js --formats=gztar")
        return PASSED("sdist package build succeeded")
    except RuntimeError as e:
        return FAILED("sdist package build did NOT succeed", details=e.args)

def build_wheel_packages(config: Config, system: System) -> ActionReturn:
    try:
        system.run("python setup.py bdist_wheel --install-js")
        return PASSED("wheel package build succeeded")
    except RuntimeError as e:
        return FAILED("wheel package build did NOT succeed", details=e.args)

The question comes down to that --install-js option that we pass in. That option is currently handled by code in our setup.py and what it does is copy an existing, built BokehJS into the Python source tree for inclusion in the package. Without that option, BokehJS gets built from scratch [2] every time. This is undesirable from a package automation standpoint:

  • It is somewhat time-consuming to build BokehJS, so it’s preferable to do it only once, rather than once for every package type build.
  • It is crucial that every package type (wheel, sdist, conda) has the exact same BokehJS files (i.e with identical hashes). While the risk of somehow getting slightly different TS build outputs from subsequent BokehJS builds is very small [3], any risk at all here is unacceptable. We simply must use a single source of truth for BokehJS across all packages.

So what are our options here? Looking at build it does not seem sufficient to support a customization like this. Are there other tools that support defining custom steps as part of a build, or have extension APIs that we can leverage? [4]

TLDR; How can we support custom build steps in two cases:

  • A default build should build BokehJS from scratch and move it into the Python package.
  • An “install-js” build should move a pre-buit BokehJS into the Python package.

  1. Why you shouldn't invoke setup.py directly ↩︎

  2. Essentially: cd bokehjs; node make ↩︎

  3. Maybe some datetime-dependent codegen is erroneously introduced somewhere, etc ↩︎

  4. Certainly we could “shell-script” our way out of this but I would much prefer to stick to community standard commands and tools to the extent possible. ↩︎

2 Likes

The “build backend interface” (PEP 517) offers a “config options” argument that tools can use to pass build configuration information like this to the backend. It was intended to cover this type of custom flag, but I’m pretty sure the setuptools build backend API doesn’t use it like this (yet?)

If you want to use standards-based tools to replace setup.py, then config_options would be the way to go - both pip and build have a UI to pass such settings to the backend. But you’ll need help from the setuptools project to implement the backend side of such custom flags. Until there’s something in place for that, I don’t think you can move off invoking setup.py.

Although thinking further, I guess you could change your setup.py so that, as well as (or instead of) accepting an --install-js command line flag, you checked for an environment variable INSTALL_JS. That wouldn’t need the build API to be involved - you could just set the environment variable and invoke build. Would that be an option for you?

1 Like

Project Jupyter had a similar need and just switched to Hatch.

You could configure a custom build hook by making a file named by default hatch_build.py:

from hatchling.builders.hooks.plugin.interface import BuildHookInterface

class CustomHook(BuildHookInterface):
    def initialize(self, version, build_data):
        if self.target_name == 'wheel':
            ...
        elif self.target_name == 'sdist':
            ...

then in pyproject.toml put:

[tool.hatch.build.hooks.custom]

or to be explicit:

[tool.hatch.build.targets.wheel.hooks.custom]
[tool.hatch.build.targets.sdist.hooks.custom]

edit: also for:

Hatch creates reproducible sdists

4 Likes

Hi @bryevdv, please note that having a setup.py file is not deprecated per se. You can still use to customise setuptools commands and build steps… the part that is deprecated is executing it as a script.

You can use setup.cfg sections to pass options to the commands. Maybe you could try that? If that does not work for you, you can also try to use the config-setting in the build command line to pass options…

Hi all, thanks for the replies. Some comments/questions

@abravalheri I am trying to get rid of setup.cfg as well. :slight_smile: I think at this point everything still in there can go in a pyproject.toml and I very much want to reduce the scatter of configurations to make it easier for future contributors. I will say

please note that having a setup.py file is not deprecated per se.

This is actually confusing messaging to me. I don’t use setup.py for anything other than install, develop, sdist etc. So if I’m not using it for those things in the future, I don’t understand why it would be kept around. I guess that’s the frontend/backend thing, so other tools can call setup.py? But that is also confusing, if setup.py commands are going away, why is a script necessary just to define some metadata. Anyway, I digress.

@ofek Hatch looks interesting and promising, I will definitely take a close look! Thank you for the reference.

@pf_moore I suppose an env var could be an option, maybe the simplest thing in the short term. I will experiment. Regarding the config_options are there any relevant issues or PRs that I can follow?

I’m somewhat confused here. Presumably, setup.py is where you’re defining your custom logic to handle the --install-js option. If you want to continue using setuptools, you’ll still need a setup.py to hold that logic. What’s deprecated is not having a setup.py, but rather running it, as a script.

I don’t know. The setuptools maintainers can probably point you at any documentation that exists for how they handle config_options, and how that ties in with customisations like your --install-js. Or if that’s not yet supported, then maybe they’ll know of any feature requests or PRs to add it. @abravalheri can you help?

To be honest, though, hatch with a custom hook to replace your --install-js code may well be your best approach longer term.

The idea is that not always you can manage to do everything using only a descriptive approach, for some small number of use cases you will need to write some Python code with “build time logic”. I assume that this is also the reason behind custom hooks in hatch.

The setup.py file can still be used for that, nothing changes in that regard.

If you really want to get rid of setup.cfg, there is an experimental feature right now that you can use:

  • The equivalent of [sdist] in setup.cfg would be [tool.distutils.sdist] in pyproject.toml (with the appropriate INI => TOML syntax changes).
  • This is not stable (so far I haven’t received any feedback, and to be sincere the naming is not great), so likely to change in the future.

However, since you are providing your own implementation for the --install-js flag, you are not limited to this form of passing arguments…

For example, you can read the file yourself and be in total control of the situation regardless of the changes in setuptools:

# setup.py
from pathlib import Path
from setuptools import Command

import tomli  # Dependency to be added to `[build-system] requires`

project_dir = Path(__file__).parent

class YourCommand(Command):
    ...
    def finalize_options(self):
        super().finalize_options()
        if self.install_js is None:
             config = tomli.loads((project_dir / "pyproject.toml").read_text("utf-8"))
             self.install_js = config.get("tool", {}).get("yourtool", {}).get("install_js")
    ...

(disclamer: untest example, might need some iterations to get it right)

When using build as a frontend, there is a hint on how that can be done in Wheel tags · Issue #202 · pypa/build · GitHub for the --python-tag option of bdist_wheel.

Please feel free to open an issue/PR if you need other features.

I suppose in the long term both solutions should be fine.
If you feel like switching to hatch will be a good thing for your project, go ahead.
If you feel like setuptools is still useful for you and can minimise the amount of changes you have to implement, you can also go for it.

There is no plan to remove support for customisations on setup.py, the only thing that is being deprecated is the ability of using python setup.py as a CLI tool.

1 Like

I’m glad this question is being discussed here. In the past similar issues were raised for Panel and sphinx_rtd_theme. In Nixpkgs we notice these type of issues directly as it results in failed builds.

We prefer to have the build be pure, that is, there is no network access. Basically, that means that when creating a wheel the artifacts should already be there. In case of an sdist, it is my opinion the same should apply. Basically, that means that any artifact collecting should be done prior, outside of the build step.

To avoid artifacts in the repository (like node_modules) I think the best solution is to have a git submodule that contains them or a simple script that can be invoked to create the artifacts prior to using a build frontend for building a wheel or sdist.

There is an increasing amount of packages that would like to package these kind of artifacts. I think it is important that in the packaging user guide we discourage the bundling of artifacts during the build step.

Maybe we want to at some point standardize some kind of entry point for impure build steps so that distributors know there is an impure build step that they need to handle.

Thanks for sharing thoughts @FRidh but I do not agree that those ideas are universally applicable.

I think the best solution is to have a git submodule

Bokeh has used a monorepo for over ten years and there is zero chance we would move away from that. All the most active contributors prefer it, the two “halves” of the project need to be kept in lock sync so having unified commits is vastly preferable. A submodule adds complexity but would buy nothing of note for us (negative value, really) so it’s a non-starter. [1]

I think it is important that in the packaging user guide we discourage the bundling of artifacts during the build step.

I suppose this just comes down to a philosophical difference about where complexity should be distributed. Bokeh has two halves, but it is a single project. We want a single build tool invocation for the project as a whole that can generate everything, in one go, in a repeatable manner. In one sense I agree: We want to build BokehJS once, up front. But we don’t want the BokehJS build to install into the Python source tree, and we also don’t want more steps to explicitly coordinate. I want to point the package build at all the pieces and just say “put everything together”.

But also maybe we are using terminology differently. It’s hard to tell.


  1. In fact, Bokeh started off with submodules but we switched to monorepo after a very short time. It made development (and especially onboarding new contributors) much simpler. ↩︎

2 Likes

I guess what I am saying is that (speculation) for the vast number of users, for the last many years, those two things have been completely identical and indistinguishable. Maybe it would have been cleaner (conceptually) to in fact just deprecate setup.py entirely, and stipulate a new preferred module for the “backend-only” setuptools to consume going forward, because the the current messaging (to me as a plain user) has definitely left me confused on points. But I’m veering off topic at this point.

2 Likes

Interesting to hear you used a submodule in the past. Right, if the assets need to be updated regularly when changes in the Python code occur then that is definitely not going to work.

From a development point of view I understand. You want one entry point to build your entire project. This just gets hard with polyglot projects.

I was chatting with a meson developer about this a bit. If meson were to be used, you could put the npm part in a subproject. That subproject likely would do some run_command invocations, preferably splitting the impure parts (such as downloading with npm) into a separate invocation so they can be easily identified. Subprojects can embed their sources or binaries, which in your case are the node_modules. Downstreams can disable the use of embedded sources if they want to with a flag.

@pradyunsg showed a tool they wrote, GitHub - pradyunsg/sphinx-theme-builder: Streamline the Sphinx theme development workflow. It’s a build-backend specifically for sphinx themes. While I am not sure whether a backend is the right solution for solving this issue, I very much like that it standardizes things. It also comes with a cli for managing those types of projects, including scaffolding using stb new. I wonder whether it would be good to have a template for nodejs + Python packages, say using meson.

(Note I keep pushing for meson because I am afraid we’re otherwise going to see an exponential increase in build systems.)

1 Like

Let’s hold off on advocacy until we standardize, otherwise it’s still just lock-in.

Speaking from the perspective of the Jupyter project, we’d rather not force all Jupyter extension authors to learn a new build system (meson). That’s why we made jupyter-packaging originally, to abstract the hard parts of setuptools. The new hatch_jupyter_builder plugin will allow extension authors to use declarative config in pyproject.toml and ensure that their JS assets are built and included.

2 Likes

First off, to make this as useful for @bryevdv quickly… Broadly, I’m suggesting changing your release build process from:

npm make build
python setup.py sdist --install-js
python setup.py bdist_wheel --install-js

To:

npm make build
BOKEH_COPY_LOCALLY_BUILT_BOKEHJS=1 python -m build

My concrete suggestions are:

  • Don’t try to remove setup.py for now. The blog post you’d linked to as motivation is literally titled “Why you shouldn’t invoke setup.py directly” and not “Why you should get rid of setup.py from your project”. There’s a good reason for the specific wording there.

  • Stop invoking python setup.py ... and instead use python -m build/pip directly.

  • Use an environment variable instead of the --install-js flag. When the environment variable is set and BokehJS is not built locally in the relevant location, error out. If it isn’t set, you can keep the existing behaviour of invoking npm make build.

    The build-system tooling for Python has build configuration mechanisms, but you don’t need them for your usecase (as far as I can tell) – you can move the responsibility of passing this configuration “boolean” to the OS, instead of the Python packaging tooling.

  • There are alternatives to setuptools available but Bokeh doesn’t need them – they can provide a developer experience improvement but switching to them is not a requirement and can bring its own “growing” pains + migration costs.

As for improvements you could make to your build system, I have a one suggestion: Move the logic that invokes npm make build in setup.py and performs the copy of the built JS, into a build_py subclass and override the default build_py class with it (using setuptools.setup's cmdclass argument) – see “Extending the build through an override” below for details.


There’s a few things in the discussion already, so I’m gonna try and group them:

  • Moving off of setup.py

    Realistically, setup.py is not going away as a way to configure Python package builds. It has been here for more than a decade, and will be around for likely longer. OTOH, it gives every user a Turing-complete mechanism to describe every possible key-value pair, which is far from ideal.

    That said, we do want people to stop doing setup.py install and setup.py sdist bdist_wheel and move to pip install . and python -m build – they do a few more things to ensure that builds happen correctly and are better solutions in terms of interoperability and available maintainance bandwidth. See also blog post noted above.

    Personally, I’d like package authors to describe as much of their metadata statically as feasible, in files that don’t need to be executed with a Turing-complete thing to parse and for dependency resolution mechanisms for Python to be able to get this information cheaply.[1]

    Today, this information can be specified statically in the [project] table in pyproject.toml (which is backed by a interoperability standard) and setup.cfg (which is implementation-defined, as are most legacy things in Python Packaging) but neither is used during dependency resolution today. There’s some tooling advantages, eg: it’s easier to parse/modify those files than a setup.py file using an automated tool.

  • Adding a custom build step to setuptools

    1. Moving the build-logic into a dedicated project

      A demonstration for doing this is available in setuptools’ issue tracker, written by one of the maintainers: Support for custom build steps · Issue #2591 · pypa/setuptools · GitHub.

    2. Extending the build through an override (this is what I recommended above)

      You can extend an existing build_py command in setuptools, using cmdclass and do additional build work in there. This has the advantage of being an intended point of extension for the setuptools build system and eliminates the need to look at sys.argv at any point. :slight_smile:

      I recently did something like this in Memray for an example of that (full disclosure: that’s an OSS project from work). That project builds JS assets using an npm run-script build command – it extends build_ext, you can extend build_py since you don’t have extension code. That project has C++, JS and Python build systems and was a fun one to get building correctly.

  • Changing to an alternative build-backend

    As noted by a bunch of folks already, there’s a lot of alternatives available for setuptools today. None of them were popular late last year, except for Poetry which does not have the extensibility you need anyway (AFAIK).

    In my opinion, what you’re seeing is well-meaning enthusiasm (and skepticism) from the folks around here, about the new build-backends in Python’s packaging ecosystem. In broad strokes, it took a lot of effort to get to this point and folks prefer the newer build-backends over setuptools for both “simple” and “complex” use cases; since they’re being built without the backwards compatibility constraints of setuptools and are able to innovate + improve various aspects of the developer experience.

    I’m not familar with any of the ones relevant for this discussion as a regular user though – so no real suggestions on that front. Mostly just wanted to provide context for why alternatives to setuptools are being enthusiastically mentioned. :slight_smile:


  1. I know a bunch of other folks want this too but I don’t wanna speak for anyone else. ↩︎

6 Likes

@pradyunsg Thanks for the detailed feedback. I have followed your approach in

I did have a few questions at the end of the PR, in case you (or anyone else) has a few minutes to offer any comments.