Revisiting the case for CMake as a primary, cross-major-plats, build config

kfsone · December 9, 2021, 9:35pm

CMake only needs a C++ compiler if you don’t specify PROJECT(Python LANGUAGE C). At which point, what it requires is a C compiler.

erlendaasland · December 9, 2021, 9:37pm

I think what MAL meant was that CMake needs a C++ compiler to bootstrap itself.

malemburg · December 9, 2021, 10:06pm

Right, sorry for not being clear on this.

kfsone · December 9, 2021, 10:21pm

For my part, I’m approaching from the perspective of a C and C++ developer looking at CPython, which is written in C, not Python. I’m currently having to try and migrate a set of applications that embedded python 2.7 to a rather frightening (for the project owners) world of Python3, where different build systems have begun making different assumptions about python source availability and buildability.

For instance: The MacOS build machines ship with a Python interpreter but not the development components.

Anyone looking to build CPython can reasonably be expected to require a C compiler. This is why autoconf has gotten us this far. CMake reasonably works on a large number of platforms natively, with no further dependencies.

Meson “runs” on platforms Python runs on, but it does not run equally well on all platforms. It is has had a bumpy run with Windows, and it’s own documentation notes that it’s not good at generating Visual Studio project files.

That leaves the matter of embedding CPython and leveraging the C API. It seems unlikely that any projects currently doing this would benefit from a transition to autoconf to Meson.

When it comes to C and C++ language build systems, CMake has become predominant, and in general other build systems have some way of integrating CMake easily.

Meson depends on Ninja, whereas CMake retains the ability to build project files for multiple back-ends including Make, Ninja, Visual Studio (if you really need .vcproj’s) and so on, as well as having introduced an API to allow other build-tools/IDEs to understand a cmake generated build tree in CMake 3.20.

How compelling of a distinction this is really depends on whether the intent is to retain ultra-broad coverage of out-of-the-box CPython buildability or embrace the current [accidental?] Linux-first posture.

malemburg · December 10, 2021, 9:48am

I think we have to differentiate between building CPython and building
Python C/C++ extensions, including their wrapped libraries.

For the latter, tools such as Meson are a valid choice, since the target
systems will already support Python.

For getting Python to run on a new platform, I think it is important to
only rely on tooling which is widely available. For building CPython,
we require a C99 compiler. That’s the base line for build tooling
as well, provided that the build files have to be generated on the
target system.

Note that we already do ship those autoconf generated files together
with Python (the configure shell script), so the generators themselves
are not needed on the target platform. You only need a simple shell.
autotools themselves are not required on the target platform when
installing CPython. They don’t even need to run on those platforms,
as long as the generated files do support the needed introspection
on the target.

If the same is possible with CMake or Meson or some other system,
I guess the requirements for building those build tools are less
important.

For CMake, you typically have to run cmake directly on the target
system, so not sure whether it supports generating introspection
scripts. I’ve so far only used CMake for building applications/libraries,
not in development.

kfsone · December 10, 2021, 10:02pm

My intent with this entire thread was discussion of building CPython for a bespoke interpreter executable or development libraries/headers for embedding.

The default behavior for CMake is to build absolute paths into the generated configuration, but it can be used to generate portable configurations. I’m not sure how flexible that would be. It does have strong support for cross-compilation, but this remains an action performed on a host machine and not the developers target machine.

However, if we’re talking about building CPython, does it seem likely there would be many scenarios where an embedded developer would want to build CPython or it’s libraries on as opposed to for the target hardware?

While there could certainly be a scenario where a developer needs to natively build CPython but does not have access to CMake binaries, it seems likely this would also be because there are no Python binaries built for that platform, and like python, CMake is available in source form and supports a very broad set of platforms - although possibly not as wide as autoconf.

So, then, the argument here in favor of autoconf over CMake (specifically) is that it supports a larger number of tail-end edge cases at the cost of not supporting Windows, half-support for MacOS, and minimal facility for developers trying to embed-cpython (as opposed to doing embedded-systems development with python) from different build systems.

gpshead · December 15, 2021, 11:01pm

Honestly the only argument in favor of autoconf over CMake these days is inertia. We’ve already got autoconf working.

CMake would be a change and will take someone putting in the effort to get close enough to equivalent autoconf parity before it could be adopted. Until that is demonstrated in the form of a PR that passes CI & buildbots (ironically by providing a temporary trivial dummy configure & Makefile for CI and buildbot use that just invokes cmake…?) this is mostly just talk. Substitute whatever you want for CMake in the above text and it’s the same story.

When starting a big new non-GNU project today, one would not likely choose autoconf. (Or C for that matter)

kfsone · December 22, 2021, 6:29pm

“We’ve already got autoconf working.”

The question would have to be: working for whom?

If we look just to Linux and MacOS: nobody out there is using a python that was built just by running ./configure && make. There is no system out there running an installed-python that was built by running ./configure $LOCALARGS && make.

Apple, the ultimate evangelists of everything-shiny-new and live-from-HEAD, were uncommonly late even shipping Python 3, but the really weird thing is that they ship executable only.

Let me rally back around here – this thread of mine is titled “Revisiting the case” not “if I offered to make a cmake project”.

This is one of those things that touches on a bunch of different cyber-religious matters and has probably started multiple flame wars. The kind of thing you would expect to leave any team presenting a seemingly hostile outward facing posture to the topic.

There have been numerous others in the last decade+ who were ready to volunteer to do the work porting to one build system or another but fled and built and often frequently abandoned their own standalone alternative. Every distro out there has built their own wrapper or wrapper-of-wrappers for building their python package.

I think there has been a game-of-telephone about the entire question. I think there have been some core developers operating under the belief that sticking to autoconf is a done decision, an active choice, alongside others who don’t mind not having to potentially learn new commands to build their dev builds (that’s what cmdline aliases are for, lana ), and others who feel they’ve had this discussion and lost.

I actually think this is a matter that needs an open debate/decision by the core development team: Should Python look to modernize its build system? Does the team have criteria for accepting proposals to champion/implement such a change? Revisit the case and have a policy decision.

This IS something the Python community will do, but first the team needs to provide clear guidance as to what they will and won’t accept so that the minefield/flamewar aspect can be navigated/avoided. If the team determines cmake, bazel, 2 rubber chickens and a piece of string or something is totally out of the question, get that out of the way before the next someone steps in and gets their hopes up only to find their preferred path was never going to happen.

stoneleaf · December 22, 2021, 7:05pm

Nevertheless, somebody is going to have to step up to:

write the PEP
champion the PEP
create the implementation (assuming acceptance)

Looking back at the VCS changes (first to Mercurial, and then to Git), it went more or less according to that pattern. So if you aren’t willing/able to do it yourself, you’ll need to find someone who is.

fungi · December 22, 2021, 8:30pm

[…]

If we look just to Linux and MacOS: nobody out there is using a
python that was built just by running ./configure && make. There
is no system out there running an installed-python that was built
by running ./configure $LOCALARGS && make.
[…]

Actually, I build Python from source precisely this way on my
systems, so that I can rule out distro-applied patches and similar
divergences when running tests, and so that I can have new versions
of the interpreter on the very same day they’re tagged in Git. That
makes me nobody, I suppose.

Linux distros are also using ./configure && make when creating
binary packages of the interpreter as well. Are they, and their
users, nobody too?

gpshead · December 22, 2021, 11:08pm

Misleading as stated. All Debian based distros use configure and make to build their Python. debian/rules · python3.9 · Python Interpreter / python3 · GitLab etc… You do imply that someone invoking configure and make from their own package building mechanism like the debian/rules file doesn’t count? That does not make sense. Both configure & make are being used in the critical path on all such builds. Thus those are all ./configure && make users meaning most posixish systems in the world are built that way.

But so what. If we adopt a new build system for their platforms, those get changed to call the new thing on a per distro basis by their maintainers. Or they decide not to and take on maintainership of autoconf stuff as a local patch within their own repo. Not our problem at that point. Our build system goal is not to appease all OS distros. Just not to prevent supported OSes from having one reasonable way to build.

You appear annoyed that work on this might be work that gets rejected or ignored? That is the real world. We all learn from proof of concept work regardless of if it gets adopted. It sounds like you want a spec definition of “to get close enough to equivalent autoconf parity” to work from. We don’t have one. The spec is effectively “do enough of what autoconf allows us to do today”. It is important for “enough” to be vague. We’ll know it when we see it is very true here as this isn’t contracted work to solve a specific well defined problem. Defining and specifying the problem is actually part of the work required!

For example “enough” would presumably mean having solutions for the first tier and second tier platforms that autoconf works on today. Including debug builds, and PGO/FDO & LTO release builds. Including cross compilation. Those are just examples, but coming up with something that can support that (even if it doesn’t for a demo, the path to do so needs to be clear) makes it a strong demonstration. It does not imply acceptance.

Keep in mind the question of WHY we’d be doing this. What actual problem is being solved?

Computers are stupidly fast today. So on the posix side we aren’t honestly hurting due to our build system taking too long even though ./configure “wastes” half the total configure+make build time performing repetitive serialized rarely changing operations (even when using ccache). More compelling reasons to change would be nobody understanding autoconf and automake anymore (unlikely, at least this decade), or a much easier more natural cross compilation story (that is painful today). Merely speeding up the build isn’t so exciting to everyone right now.

[sidenote: There’s more posix bang for the buck developer productivity wise today in speeding up the test suite. Specifically the dozen slowest tests that consume 80+% of the wall clock time on a modern system. Measure the ultimate total latency of CI and buildbot runs as well as developers doing the equivalent. Work aiming to reduce that would make a lot of devs happier.]

I’d love a modern build system. But I see no compelling reason to prioritize such work on the posix side. I expect others feel the same. If that is pervasive among core devs, it effectively raises the bar for accepting one: In absence of a strong need, it needs to win over people who weren’t even looking for one by providing something new and nice that we may not have realized that we were missing or dismissed as infeasible.

You started this thread off by complaining about the state of the visual studio windows build setup. That sounds like a much more target rich environment where significant change is compelling. It could well make sense to start with cmake as a solution just for Windows alone. With the obvious thought that it might grow to be used on the posix side as well in the long run, displacing autotools.

I haven’t seen a compelling reason to have replacing autotools as a primary goal.

(too much time spent writing and editing this, meaning i probably left typos and glaring editorial WTFs above… just point them out and i’ll repair 'em)

kfsone · January 1, 2022, 8:02pm

As @gpshead points out, “misleading as stated” because I didn’t repeat or put enough emphasis on “just”.

I did not mean to imply nobody uses configure, nor that nobody uses make, nor that literall “./configure &&make” has never been typed.

All the distros I looked at - inc debian, darwin, macports and homebrew, apply some number of patches first, and many of the ones I looked at appeared to contain (not exclusively, but some amount of) enforced configuration behavior - and for this I include specific #include machinations that ought to be handled by an autoconf detection or a config setting.

When they use configure, I was not able to find any that used just “./configure” without some number of arguments.

Finally, to the second half of the statement, what I’m referring to is the manual dependency resolution you wouldn’t expect with a modern build system:

Python build finished successfully!
The necessary bits to build these optional modules were not found:
_bz2                  _curses               _curses_panel
_dbm                  _gdbm                 _hashlib
_lzma                 _sqlite3              _ssl
_tkinter              _uuid                 readline
zlib
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

...

Failed to build these modules:
_ctypes


Could not build the ssl module!
Python requires a OpenSSL 1.1.1 or newer

(from a docker container that has the equivalent of ubuntu’s “build-essential” installed)

At 50, I remember the days when figuring these out was part of the “fun”. But I’d rather live in the now, when my expectation is:

$ docker run --rm -it $distro /bin/bash
# $pkgmgr install build-essential cmake ninja-build
# wget $source.tar.gz
# tar xf $source.tar.gz
# cmake -G Ninja -H $source -B build
# cmake --build build
< done >

… because Python tool (pip, conda) taught people to expect this kind of luxury. Rust, Go and others embraced this and it’s even become fairly standard for C++ of all things.

smontanaro · January 2, 2022, 1:36am

Finally, to the second half of the statement, what I’m referring to is the manual dependency resolution you wouldn’t expect with a modern build system:

Python build finished successfully!
The necessary bits to build these optional modules were not found:
_bz2                  _curses               _curses_panel
_dbm                  _gdbm                 _hashlib
_lzma                 _sqlite3              _ssl
_tkinter              _uuid                 readline
zlib
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

Failed to build these modules:
_ctypes


Could not build the ssl module!
Python requires a OpenSSL 1.1.1 or newer

One thing I’ve noticed in the past few years is that the Linux distros I’ve run into have a much more fine-grained package system. I wind up having to interatively locate runtime and dev packages, e.g. libsqlite3 and libsqlite3-dev. In the “old days” (when it was kind of “fun” to answer Perl configure questions or rummage around in setup.py or Modules/SETUP) a Linux distro often had all the necessary bits installed.

This may very well not be what you’re running into, just my 2¢.

gpshead · January 2, 2022, 8:10am

There’s an easy hack for anything Debian derived: sudo apt-get build-dep python3.9 - that pulls in all of the packages necessary to build the distro’s own python3.9 package. (substitute whatever version is current in your given distro.) Some of the more esoteric ones bits be left out as distros split up what it means to even be Python as well, but who honestly cares about the gdbm module?

gpshead · January 2, 2022, 8:47am

I don’t want a C/C++ build system to automatically fetch and build third party dependencies for me and especially not install. That just raises questions. Rhetorical and not to be answered or picked apart examples: “Where’s it going to put them? Are they static or shared? Do they have compatible licenses for how they’ll be used? What’s the provenance of their code? Is it a fixed in time secure hash specific revision or is it going to get the package with the latest security updates? Did you just download code from the internet and execute it when I typed build?” My point is more that we shouldn’t make anyone ever have to raise such questions towards us. Those are problems we cannot solve.

I agree that is decidedly not a definition of modern. Just like the C programming language. A C build system is not a dependency management system. A C build system is not a package management system. Keep those separate. Continue to require people to have the libraries necessary to build already on their system. If someone wants to provide convenient ways to solve obtaining CPython dependencies, good for them, but it still belongs outside of our build system.

I realize much of the high level language world lives with those concepts intertwined (including Python with pip). Designing and maintaining a C/C++ library distribution, build, and packaging system is not our void to fill within the CPython project.

tiran · January 2, 2022, 10:43am

Or use my script builddep.sh.

erlendaasland · January 2, 2022, 10:48am

… or .github/workflows/posix-deps-apt.sh from the CPython repo.

malemburg · January 3, 2022, 10:57am

FWIW, I would not want a build mechanism to automatically install
3rd party dependencies during a build run.

This should always be the responsibility of the system maintainer and
not happen implicitly by a build tool that is not fully integrated
into the OS packaging mechanism.

But I think we’re diverging from the original idea. The key argument
for a new build system has to be lowered maintainance efforts for
the core devs.

If I look at an existing CMake based build system for Python such
as e.g. GitHub - python-cmake-buildsystem/python-cmake-buildsystem: A cmake buildsystem for compiling Python
this doesn’t really strike me as easier to maintain.

It does provide a few extra features, but are those relevant enough
to warrant the learning curve and possible destabilization of Python’s
build system on less mainstream platforms ?

kfsone · January 4, 2022, 1:35am

Well, it is and it isn’t: mea-culpa the deliberate, obnoxious choice of a pristine Docker image with just dev-essential installed.

On the other hand it’s not the packages themselves (e.g zlib) missing, it’s the -dev packages that need to be installed, and in some cases you also need to be careful about which versions of those packages you have (hence including the OpenSSL line at the end)/install.

“Wen ah wer a lad”, I’m fairly sure that they weren’t separated like that…

rumages around box of ancient slackware drives and gets bitten by either an IDE connector or sharp pointy teeth

kfsone · January 4, 2022, 1:50am

These seem to be contradictory statements.

This is a little like walking onto an aircraft and having the stewardess frown at you and say “If we were mean’t to fly, wouldn’t god have given us wings”. You don’t need to

precisely because that’s the problem insert not-autoconf build system solves for you.

FetchContent — CMake 3.22.1 Documentation
cpm-cmake/CPM.cmake: CMake’s missing package manager. A small CMake script for setup-free, cross-platform, reproducible dependency management. (github.com)

It’s simply an execution of the concept of vendoring as it’s been implemented in other languages (rust, golang, etc)

[edit]

This is also exactly what python’s own packaging systems - pip, anaconda etc - do by default when a binary is not available.