Is there prior discussion around the build system of CPython itself?

tamas · December 6, 2019, 1:01pm

(I hope I’m posting in the right section)

As with all old, large codebases it’s difficult to know the complete picture as an outsider, so pardon my ignorance if I’m bringing something up that was discussed and decided before. I couldn’t find any discussion related to this.

CPython has a build system which is a combination of autotools, a handwritten Makefile template and a ~2500 line setup.py script. On top of that, the Windows/MSVC build duplicates a big portion of this. For context: I’m bringing this up because I build CPython for a lot of different configurations (architectures X operating systems - around 18 total), with custom toolchains, dependencies, flags; and the assumptions in this build system make it more difficult than it should be. Very often this results in having to patch the build to accomplish something. There are build systems that allow customizing all that without modifying the build which is preferable. The volume of the code in the build system makes it very hard to approach and reason about the logic in it.

I think there is definitely room for improvement on this front and I was wondering if there is openness in the CPython project for revamping the build system (perhaps with CMake) - I would be happy to contribute.

pitrou · December 6, 2019, 2:52pm

CMake is not panacea but it would still be much better than the current setup IMHO.

Another possibility to explore might be the Meson build system, which is written in Python. I have no personal experience with it, but it seems they’ve ported Python as an experiment (perhaps one could actually contact the original author)

brettcannon · December 6, 2019, 6:24pm

I brought up the idea of moving to CMake back at PyCon US 2009 and if I remember correctly it was dismissed as not having enough people who knew CMake and whether CMake supported enough platforms as Make itself is.

And I believe Meson uses Ninja to do the actual building, so one question would be how portable Ninja is.

pitrou · December 6, 2019, 6:41pm

How many people actually know autotools? I probably edited our autoconf file a few times but I have no actual idea how the whole thing works under the hood (I’m not even able to tell the difference between “autoconf”, “autotools”, “autoreconf”… or whether there is a difference at all). Having worked with CMake for two years, I must say that I find it much more approachable - even if it’s still not high-level enough for my tastes.

Interesting. Ninja is a simple tool, so it should be quite portable. But of course the devil’s in the details.

tiran · December 6, 2019, 7:46pm

How would you bootstrap and build Python on a CPU architecture or platform that does have a working interpreter yet?

pitrou · December 6, 2019, 7:58pm

I suppose that could be done through cross-compiling, though it’s an open question.

gpshead · December 6, 2019, 11:18pm

CMake and Ninja are used for Chromium and LLVM/Clang. They’re plenty capable.

What is really needed is for someone to untangle what actually matters from our autoconf and setup.py pile of code. Turn that subset of needed checks into something other build systems can use to make pyconfig.h and module building decisions.

An implementation of a new CPython build system that works on non-current macOS/xcode (whatever version our actual binaries use), a few Linuxes on notably different architectures, and a couple of pesky not-dead-yet BSDs would go a long way toward convincing folks of the benefits of such change. Until that work is done, we’re all just pontificating.

For Windows it was “easy” in that the platform is so constrained that a pre-made pyconfig.h or three makes sense.

We could ship pre-made pyconfig.h files for supported platforms, but that gets annoying fast when the number expands – and our number is large with esoteric platforms and architectures. I suspect there isn’t a large actual set of generated pyconfig values, but they’d need to be gathered, considered and managed. Code is going to be needed to detect and decide which to use regardless. A hardcoded mapping of os-platform tuples to configs would be a major source of maintenance pain.

tamas · December 7, 2019, 9:15pm

I think in the 10 years that passed since then, CMake has gained a lot of ground in terms of people knowing it, and as it was pointed out in the thread, the cumulative knowledge of Autotools is most likely a fraction of CMake’s.

In terms of portability, both CMake and Ninja are pretty good. But CMake also doesn’t require ninja, it’s just nice to have (though I’m sure that’s not going to be the culprit). Is there a document listing the current officially supported platforms?

tamas · December 7, 2019, 9:21pm

Meson is an interesting choice. I’ve heard good things about it, but frankly, as a grumpy build engineer I would strongly prefer the boring choice, which is CMake today. CMake is more likely to be already installed in an environment and it’s something that more people are comfortable with. Don’t get me wrong, it’s a pretty terrible language, but as a build system it’s proven to be very successful. LLVM tooling & major IDEs support it out of the box as well.

Apparently someone already started a CMake build: https://github.com/python-cmake-buildsystem/python-cmake-buildsystem (no idea about the quality of implementation, but I’ll check it out).

tamas · December 7, 2019, 9:24pm

Agreed.

An implementation of a new CPython build system that works on non-current macOS/xcode (whatever version our actual binaries use), a few Linuxes on notably different architectures, and a couple of pesky not-dead-yet BSDs would go a long way toward convincing folks of the benefits of such change. Until that work is done, we’re all just pontificating.

Apparently there is cmake build in the works here:

It seems pretty extensive; I’ll play around with it a bit and try to contact the author.

pitrou · December 8, 2019, 11:08am

Here: Supported platforms and architectures — Unofficial Python Development (Victor's notes) documentation

More generally, Python is expected to build with little hassle (perhaps minor porting work) on platforms with a C99 compiler and a POSIX stack (including POSIX threads). “Supported” platforms are a subset of that because that also implies passing the test suite, which is not something the build system is normally concerned with.

pitrou · December 8, 2019, 11:10am

Also, a reasonable course of action could be to keep the CMake and autoconf-based builds in parallel for a couple versions, until all issues are solved on the CMake side.

tamas · December 9, 2019, 10:47am

Does that mean bootstrapping the build tools as well? CMake is written in C++, so that would be a no-go in that case.

pitrou · December 9, 2019, 10:59am

That’s a good question. I would hope the answer is “no”, but that would have to be discussed, I guess

gpshead · December 11, 2019, 3:34am

Easy: We shouldn’t waste our time supporting platforms that do not have a modern C++ compiler. AFAICT: g++ and clang are available for anything that matters.

sumanah · October 9, 2020, 9:20pm

I ran across this thread because I am working on rejuvenating Autoconf and Autotools. A new release (2.70) will be coming in a few weeks, and there’s a testable beta out now. It would be good for folks interested in CPython’s usage of Autoconf to test it with the Autoconf 2.69c and file bugs, and notice if the concomitant friction is enough to swing their opinion towards or away from staying with Autotools.

Are there specific platforms that CPython officially supports, wants to continue supporting, and are not supported by Meson and/or CMake?

nad · October 9, 2020, 11:28pm

As brought up in earlier replies, the main problem remains that the current Python build system is a hybrid that has evolved over the past decades. It is also two separate build systems, one for Windows and one for all other (Unix-y) platforms with various platform-specific tweaks. The various competing build systems, like Autoconf, Autotools, and CMake have evolved greatly over the years, for example in support of cross-compilation something that we have to painfully hack on in the current build system every time someone proposes supporting a platform via cross-compilation.

A big complication is the Unix-y build system’s bootstrap use of the interpreter being built, and its copy of Distutils, to build much of the rest of the standard library. Because of the many ways Python’s Unix-y build system has evolved its own solutions to various issues over the years, it doesn’t necessarily automatically take advantage of new features provided by new releases of Autoconf and trying to hack them into the current state of things is likely more painful than starting from scratch. There is also still cruft in the current build system that was a result of support back in the day for platforms and releases that we no longer support.

Especially now with the current efforts to deprecate and eventually move Distutils out of the standard library into setuptools while possibly leaving a version of Distutils behind as a build tool just to build the standard library itself, it makes sense to start with a bit of a blank page and try implementing potential build system replacements using one or more build systems using each’s recommended best practices and see which seems to work out best. That would give us a much more solid base to build on for the future. It might also be an opportunity to move to more of one build system, rather than the two we have today. although that isn’t necessarily an over-arching requirement as the current Windows build system is platform-focused and thus likely has much less residual cruft. But that’s all going to take some focused effort probably best done as an official, funded project. Doing anything less at this point would be mainly wasted effort, IMO.

vstinner · November 2, 2020, 11:29pm

configure.ac, Makefile.pre.in and setup.py don’t rely much on autotools features. It’s mostly a long list of corner cases to support specific platforms.

For example, the detect_openssl_hashlib() code to configure OpenSSL and build _ssh and _hashlib modules take 55 lines of code in setup.py. Another worse example is detect_readline_curses() in setup.py which takes 150 lines to build readline, _curses and _curses_panel extensions.

configure.ac has a lot of code to tune compiler and linker flags for best performances, to implement PGO and LTO optimizations, etc. We attempt to support GCC, clang, ICC and XLC compilers.

Yet another example, I wanted to use clock_gettime() in Python for better clocks. But this function is not always available in the C library. In old glibc versions and on Solaris, Python must be linked to librt to get the function. configure.ac now takes care of that.

I understand that rewriting the Python build system is appealing, but how do you plan to reimplement all these quirks in the new system? Do you expect less and simpler code?

In my experience, a migration is always painful and comes with its own set of new issues. For example, we might lose support of some obscure platforms.

Also, does the migration idea is to remove the old build system, or to add a new one? I understand that one of the reason for a new system is that we maintain two build system (autotools on Unix and a Visual Studio project on Windows). Adding a third one would increase the maintenance burden.

If someone wants to experiment a new build system for Python, would it make sense to first attempt to develop it outside Python? Or do these tool require that the build scripts are in the main source tree directory?

I suggest to write down a PEP if someone wants to change the build system since it impacts many people and there are many trade offs to discuss! I’m not strongly opposed to change it, I’m frequently annoyed by its complexity and having to maintain PCbuild/ as well.

On my side, I refactored setup.py to split the code into many sub-functions.

Recently, I modified distutils to use the regular subprocess module to spawn child processes. I had to implement a _bootsubprocess.py module to bootstrap Python (setup.py requires subprocess to build subprocess…).

For configure.ac: well, I’m trying hard to avoid it. For exmaple, to support old GCC and clang versions, I’m trying to use the preprocessor rather than configure.ac! See the https://bugs.python.org/issue41617 issue.