What do you want to see in tomorrow’s CPython build system?

big +1 for build system cross-platform consistency.

predictability about the outputs and their locations is nice. having the build system produce a manifest or build report (e.g. --build-report build.json) containing this information is a good way to achieve this.

2 Likes

Yeah, CMake can be a big footgun. But let’s not get ahead of ourselves; it’ll take a while before we get to the discussion(s) where we gauge the various contenders.

But since we’re there, let me quote Ned:

2 Likes

Note that no other build system gets called out, simply because no other one was listed. :wink:

Unfortunately we don’t have specifics as to why people answered this. What you can do is read what cross-platform C++ developers have answered on this thread already.

2 Likes

They also listed Make and some MSBuild thing. But if you look at the other questions there’s a correlation between “I’m using build system A” and “pains with build system A”. Unless I’m misreading that survey, it seems both Make and MSBuild get the same level of disapproval, based on the number of responses :slight_smile: But now we’re digressing!

Ah! I wouldn’t really call Make a build system at this point. Most non-trivial projects seem to generate their Makefiles using either autoconf, CMake, meson or any other dedicated tool…

1 Like

I wouldn’t read much into that. Sampling bias. That just means it gets used a lot. Which their survey confirms: it is by far the most popular among their audience. Thus the source of the most annoyance.

It also didn’t ask about cross platform use of cmake.

And cross platform isn’t well defined, that audience has far more platforms to care about than CPython ever will.

4 Likes

While it ranks a long way down in the grand scheme of things, it would be nice if the process of bootstrapping the build environment from scratch could be made less confusing (mostly thinking about the frozen Python bits here, but any build system is likely to find the actions taken by “make regenerate-all” a bit strange)

1 Like

This would be good thing to have and I think it would be best to start with it as a requirement. From what I see in other projects, fixing a non-reproducible build system is harder than writing a reproducible one from the start. Does Bazel or Buck2 work on Windows as well? How well does CMake support this requirement?

Making the build fully reproducible should help with Guido’s wish for fast builds as well. Because the dependency graph must be fully captured, you should be able to efficiently rebuild, e.g. by caching intermediate outputs.

3 Likes

Starting with this as a primary requirement helps. While Bazel is capable of letting you construct such a thing (and does work on Windows, but I doubt anything we’d consider does not), I assume it is not alone in this.

Being reproducible involves a lot more than just a fully captured dependency graph of every action.

It requires that every action itself produces a canonical output given the same inputs. That means no timestamps or pids or system information or randomness can be embedded in outputs. ex: if generating .pyc files is part of your build, you can’t use the default timestamp variant. All serialized data needs a fixed order. Doing things like setting PYTHONHASHSEED=0 and disabling ASLR for the build can help. As can sorting.

I expect any build tool can be used to express the full dependency graph (even a Makefile). It’s a matter of how easy that is to maintain and set it up in a way that it remains fully specified despite us humans. Ideally it should actually break something when it is not. Bazel can run in a mode that helps enforce this (on Linux anyways). Builds run that way can be slower due to the per-action declared input filesystem sandbox trees that need creating. A slower build to check declared input and output correctness can be done as its own CI task rather than requiring all builds be run this way though, probably good enough.

Pedantically you’d also need all C headers, external libraries, and your entire compilation tool chain and the build tool itself to be considered an input to the build - those are dependencies whose change impacts the output. Or just be practical and declare which of those are off limits from our POV and just ignore them changing with that listed as a known limitation caveat. Distributors (and RMs) should already be able to control all of those within their binary package builder environments.

1 Like

For example, Reproducible builds

1 Like

cc. @freakboy3742: I’d be interested in hearing your wishlist for the “perfect CPython build system”, given your work with iOS/Android :slight_smile:

I think this is putting the cart before the horse. I would wager that reproducibility has far less impact in terms of concrete improvements or enablement for usecases & workflows, than having (say) a uniform, cross-platform build system that allows easily pointing to system libs (bonus points for cross-compilation).

I’m very excited about the latter, and from the comments (and hearts) above, so are several other people. Reproducibility is clearly something useful, and having nicely cached intermediate build steps would be great, but having that as a primary requirement seems to me to be in serious misalignment with much more urgent needs (that would be more important IMO even if it costs us reproducibility).

1 Like

ASLR happens at runtime so shouldn’t be a deterrent to reproducible
builds. You had to generate position-independent code to begin with, of
course.

What is probably meant here is subtle situations where ASLR changes the output order of a compile-time generation utility (such as a grammar generator), for example because of keying an associative container on pointer values - similar to the variations that PYTHONHASHSEED can produce :slight_smile: . Or perhaps I’m overinterpreting what @gpshead said.

1 Like

The most important feature is a clear distinction between the current host environment and architecture, and the target host and architecture. Relatedly: the ability to clearly control where libraries and headers used in the build process are sourced. If I’m compiling for iOS, I don’t want to accidentally link a macOS binary, even if they are both ARM64. I’ve lost track of the number of times Homebrew’s gettext implementation has found it’s way into a build simply by virtue of existing on the computer that is doing a compile.

6 Likes

The incorporation of meta-builds. iOS and Android both need to target multiple binary architectures, and there’s some merging process involved after the fact. This could be as simple as “putting all the artefacts in a single zip file”; or, in the case of iOS, it may require post-processing of platform/ABI specific binaries. Being able to put those instructions into a “meta-target” that involves multiple complete CPython sub-builds would be nice.

3 Likes

Just for the sake of technical correctness: Meson does not generate Makefiles. Meson has different build backends, ninja being the default one, but no make build backend exists.

4 Likes

To put this in context with comments above (ex.), this is exactly what’s necessary to allow cross-compilation.

I am the creator and current project lead of Meson and I have been encouraged to write some information here about Meson and using it to build Python.

I wrote a simple PoC that built Python with Meson ages ago. It did not take much effort, but polishing it to “release quality” would take a fair bit of work.

Meson is being used by fairly large existing Python projects like Scipy and Numpy so people with skills necessary should be readily available.

An issue that seems to pop up in the threads I read was the question of external dependencies, especially whether they should be downloaded and built automatically or provided by the system (i.e. Linux distro deps). Meson supports both of these at the same time without needing to edit build definition files. There is a toggle option for, basically, “never download third party deps automatically”. Distro packagers really seem to like it. From the opposite side the Git project recently merged Meson build definitions (not used by default) and one of the things they seemed to like was how it made the dev experience nicer for Windows people.

For people worried about the bootstrapping case, there exists a plain C reimplementation of Meson called Muon.

Meson supports cross compilation and can build “native” and “cross” targets at the same time. The most common use case being where you build a code generator executable and then use it to create source files.

If you have any other questions on the issue, I’ll be more than happy to answer them and even help people with the porting effort. Sadly I’m not going to do it on my own due to practical reasons. I have participated in several build system conversions. If there is not someone on “the inside” driving the change, getting something like this merged is pretty much impossible.

18 Likes

I’ve written a meson build script for a newish cpython (should be able to be rebased without much effort) and I’m looking if there is any interest in having it upstreamed (possibly replacing the autotools scripts).

It works on my machine (Linux) and with a bit of hacking (have the libffi/openssl subprojects use gnu std) also on Android. It’s written to be fairly generic, so it should be possible to update it for more architectures and systems without too much effort.

The changeset is here: wip: Introduced meson buildsystem · sp1ritCS/cpython@5fe70a8 · GitHub, if anyone wants to look at it. It certainly doesn’t fully replicate the autotools setup yet (some evaluations are still missing, there is no pyc generation for Lib/) and getpath.py can’t cope with the freestanding build dir, I do see potential in upstreaming it.

Also notable are the reduced configure and build times. Some very unscientific testing on my machine shows, that configure time is down to 38s (from autotools 1m 3s) and build time is down to 19s (from autotools 46s).

7 Likes