Pre-PEP: Redesigning CPython Source & Binary Dependencies

Building CPython from source, whether for local development by contributors or releases, requires various dependencies (like zlib, tcl/tk, etc).

Having run into technical limitations with the status quo recently (specifically with LLVM), we (@emmatyping, @savannahostrowski, and @itamaro) figured it would make sense to revisit the entire architecture and workflow from first principles, making it simpler, more consistent, and easier to maintain and sustain.

Status Quo: Developer Builds (in a nutshell)

For “regular developer builds” on Unix-y systems (including Linux, MacOS, FreeBSD, etc.), dependency management is mostly up to the developer and their system package manager.

Typically, tools and libraries required for the build will be set up by the developer using a package manager (e.g., dnf, apt, brew), and picked up by the CPython build using autoconf and pkg-config, with the developer able to customize things using configure options and environment variables.

On Windows, “regular developer builds” include fetching external-deps from cpython-source-deps and cpython-bin-deps and caching them locally.

As support for new platforms is added, similar patterns are used (e.g., android-source-deps, apple-source-deps), with similar approaches to fetching the dependencies during the build (e.g., android build script, emscripten build script). However, each platform’s method of building and fetching dependencies is different.

Status Quo: Release Builds (in a nutshell)

  • CPython doesn’t provide official Linux packages, so this is out of scope.

  • The Windows release build uses the external-deps script to fetch its dependencies.

  • The macOS release build uses a build-installer script to fetch third-party libraries and build them from source. Neither cpython-source-deps nor cpython-bin-deps are used. For JIT builds, macOS builds rely on Homebrew for LLVM.

What We Want to Retain From the Status Quo

  1. Flexibility for development builds - Contributors can build CPython against system-provided libraries and should not be forced to use vendored dependencies in dev builds.

  2. Separation between CPython source and vendored dependencies - Keep vendored dependencies outside of the main CPython repo so that we keep the main tree small for contributors who don’t need the dependencies.

  3. Single point of failure - On Windows, the cpython-bin-deps model centralizes all dependency downloads from GitHub. This consolidation helps avoid CI flakiness by removing reliance on multiple registries (e.g., LLVM downloads from Chocolatey fail during Windows registry outages, causing jobs to fail).

Issues With the Status Quo

  1. Technical limitations with cpython-bin-deps – storing binary artifacts in source control is not the best idea, and GitHub enforces a file size limit of 100MB.

  2. Consistency & Maintainability – using different approaches for different platforms (or between development and release builds) complicates changes, thereby increasing the maintenance burden and risk exposure.

  3. Reproducibility & Provenance – it is not always possible to tell which versions of which dependencies with what additional patches are used, or even what is the authoritative source of vendored source deps or checked-in binary blobs. Downstream CPython redistributors often need to “reverse engineer” the CPython build process and patched dependencies.

Goals For a Redesign

By redesigning dependency management, we aim to reach a simpler end state while addressing the shortcomings of the status quo. We’d like to see:

  • A single place and a single workflow for keeping track of dependencies across versions and target platforms.

  • A well-documented process for adding and changing dependencies that lends itself to automation, reproducibility, and supply chain security.

  • A naturally extensible design that can accommodate future platforms, or if we choose to start providing Linux binary builds at some point in the future.

  • A system that can be consumed and reused by redistributors if they choose to.

  • By redesigning dependency management, we do not intend to take away control from contributors or redistributors building CPython from source.

Proposal Highlights

While the full details are still work-in-progress and should be ironed out in the eventual PEP, we would like to share the core principles and direction we have in mind so far, to solicit early feedback and suggestions.

  • Use a single GitHub repo (e.g., “cpython-deps”) that will replace the various source/bin/android/apple deps repos.
    • Keep the repo separated from the main cpython repo to avoid unnecessary bloat for all contributors that don’t need all the deps.
  • The repo source tree will contain sources, patches, metadata, and build definitions.
  • Binary artifacts will be uploaded as GitHub Releases.
    • If we build the binary artifacts from the checked-in sources, then the published GitHub Releases should be built and uploaded using GHA Workflows.
  • Dependencies will be fetched in the CPython repository using a single tool (superseding the existing fragmented scripts) and can be reused by platform-specific scripts
  • Dependency metadata (e.g. version, download URL, hash) will be stored in the main CPython repo as manifest files pointing to specific releases (for binaries) or tags (for source) in the deps repo.
    • This allows tracking dependencies (and their evolution) using standard Git branching and history.

Some possible implementation details in the cpython-deps repo:

  • A single branch per library+version? Or a single-branch repo with top-level directory per dependency and sub-directories per version?

  • Dependency metadata should include upstream source info that can be independently verified (e.g., git tag + commit sha + hash over content).

  • Currently, (some) Windows binary dependencies are pre-signed by RMs. We propose moving the RM-signing to the release process instead of the binary-dep-publishing process.

23 Likes

Sounds like a great thing to put your combined intellectual powers to work!

(Less glamorous but more useful than a notaofor the empty set. :slight_smile:

5 Likes

This sounds like a really useful endeavour. A couple of suggestions for existing projects that may be worth investigating for ideas:

  • the python-build-standalone build process
  • the dist-git artifact management tooling used in Fedora and its derivatives (specifically, the lookaside cache design)
4 Likes

I think this proposal needs to distinguish between build requirements and run requirements. In particular, AFAIK, the LLVM requirement driving some of this is only a build-time requirement: to build the JIT stencils. Outside of the build of cpython, I believe there is no need for runtime components from LLVM for JIT usage.

This is one example of where it is important to keep separate the requirements for building and for running Python. Another that comes to mind is with cross-compiling and with the macOS special case of universal builds (a special variant of cross-compiling). Life is much simpler when building and executing in the same machine environment; for example, taking advantage of operating system-supplied or third-party-supplied libraries. When building something to be executed on other or multiple systems, like as part of a distribution (like for our Windows or macOS binaries or by other third-party distributors of Python), one has to be careful about the execution environment; for example, a build for macOS that makes use of, say, Homebrew-supplied packages can’t automatically assume that those packages are installed in the targeted execution environment, which might not even be the same OS release and or CPU architecture. Any potential builds of third-party libraries need to take these requirements into account or be clear about what targets are or are not supported.

4 Likes

Actually, in openSUSE we have split our Python package into two (well, three, documentation is separate as well, and now we have no-gil version of packages as well): python-base and python, where the first was originally meant to have no external dependencies at all (we had to relaxed this design, because OpenSSL is so central to everything that we just cannot go without it), and the rest (with dependencies like libexpat, sqlite, curses, dbm, whole Tcl/Tk business) is in the latter. We have to do rather awkward balancing acts of removing build .so modules, skipping tests etc., because of that, which is brittle and error-prone, and if the upstream Python building scripts allowed for this split naturally, I would be very glad to simplify our SPEC file.

2 Likes

One current feature that I rely on that I would like to preserve is being able to override the external packages at build time without modifying the sources (pre-download my own builds of the externals and override this property with an environment variable to locate them).

4 Likes

Also relevant here: there have been earlier discussions about the possibility of keeping updated versions of the generated JIT stencils in the cpython repo to avoid the requirement that everyone building a JIT-enabled cpython need have LLVM (and specific versions of LLVM at that) installed on their build system. That discussion stalled without reaching a conclusion. See Hosting `jit_stencils.h` · Issue #115869 · python/cpython · GitHub

1 Like

Well, I did write a PEP proposing a solution for hosting the stencils/removing the build time dependency. The PEP is currently in a deferred state until the JIT is able to pay perf dividends - see PEP 774: Removing the LLVM requirement for JIT builds - #34 by gpshead

4 Likes

Managing dependencies for building CPython across platforms is often the hardest part — keeping build-time requirements like LLVM, Tcl/Tk, or OpenSSL aligned and reproducible is a constant pain, especially with cross-compiling or universal builds. What uv does for Python package installs with a fast, locked, reproducible environment would be amazing to have in the CPython build process itself, so we can get simpler, consistent, and predictable builds while still preserving the flexibility to override system packages when needed.

1 Like

Thanks Alyssa! We’re definitely interested in talking with the PBS folks (as well as any other redistributors) to make their lives easier.

The dist-git model seems pretty similar to what we’ve discussed as a potential solution, so glad to see other people are finding success with it!

I agree that build and run requirements (or as we’ve been describing them, library and tool requirements) have different use cases and workflows. The main tools in use today are LLVM (for the JIT) and Wix (I believe this is used for the legacy Windows installer). Even though they are used differently from runtime libraries, using similar methods to cache and download them is handy.

One thing we’ve discussed is that tools should not be compiled from source, but rather downloaded and re-hosted (with metadata about where the binaries come from) to improve reliability of CI. See the “Upstream Binaries” portion of the diagram. This matches what we do already for LLVM and Wix.

Also agree here. I was thinking we’d start with generating binaries for platforms we already have/produce binaries that we host for (subject to change of course):

  • Windows

    • win32
    • amd64
    • arm64
  • Android

    • aarch64-linux-android
    • arm-linux-androideabi
    • i686-linux-android
    • x86_64-linux-android
  • Apple

    • appletvos.arm64
    • appletvsimulator.arm64
    • appletvsimulator.x86_64
    • iphoneos.arm64
    • iphonesimulator.arm64
    • iphonesimulator.x86_64
    • macabi.arm64
    • macabi.x86_64
    • watchos.arm64
    • watchsimulator.arm64
    • watchsimulator.x86_64
    • xros.arm64
    • xrsimulator.arm64

The Apple ecosystem targets definitely could be expanded to include universal/framework builds.

This is really useful to know! Thank you for the context.

I don’t think we want to specifically call to unvendor expat or other libraries that are already vendored in CPython in the PEP, but I do think we want to define the criteria for vendoring a library in the cpython sources.

Makes sense! I’ve used the –organization setting of get_externals.bat several times myself. Overriding dependencies is definitely going to be important in all sorts of distribution scenarios.

1 Like

Reading what’s being proposed from a WASI perspective, I don’t see anything that would be an issue for when I start supporting our external dependencies for WASI. The only odd thing in that situation is WASI wanting static linking, so ending up with .o files instead of .so as the GH Actions output for bin-deps, but I’m not seeing anything that suggests that wouldn’t be doable.

I’m also fine being a test subject if necessary for any workflow as I was planning to automate this behind a Tools/wasm/wasi externals command that pulled in source-deps, got pkg-config working, and then compiled everything but OpenSSL statically.

2 Likes

How portable across different compiler versions[1] are those .o files?


  1. Not different compilers, but different versions of the same compiler. I’m used to the Windows world where the answer is “not at all”, but maybe LLVM has a more stable interface for .o files. ↩︎

I would assume reasonably, but I honestly don’t know for sure as I haven’t gotten to the point of having to worry about it. :sweat_smile:

I also think that’s a @pablogsal question. :wink:

AFAIK no compiler have guaranteed cross-version .o compatibility—it’s looser and more likely to work than MSVC, but still fragile across significant version jumps. That being said normally the .o are consumed by the linker and generally a .o from Clang 10 will usually be syntactically readable by Clang 18’s linker (or lld). Buteven if the object file format is fine, the C/C++ ABI (name mangling, struct layout, exception tables, etc.) can change subtly.

Si I assume the answer is “it depends” and as far as a I know is not guaranteed :slight_smile:

3 Likes

I don’t want to derail this into a WASI-specific thing, but the WASI SDK is not forwards or backwards compatible with other versions. As such, if we say upfront what WASI SDK version a Python version targets, then we could build dependencies as .o files that would work for a specific Python version since they would be pinned to a WASI SDK version anyway.

But building from source is obviously also fine and what I was initially planning on doing.

In short - a big +1 from me!

I’m also interested in contributing to this overall effort. Using the BeeWare repos for Android and iOS as a source of binary artefacts is definitely a short-term solution, and not one that I want to maintain long term, so I’m happy to contribute to any efforts that will allow me to divest that ownership/responsibility (…or, at least, move that responsibility to a different repository).

In terms of the open questions on the design: my general preference would be to use a single repo with a subdirectory for each library, rather than branch-per-library - but that’s mostly born out of the convenience of being able to run make all in my local copy. However, as long as the developer workflow is relatively clean, I’m not mortally bound to that design choice.

Regarding the Apple, non-macOS ecosystem targets - the current build system requires exclusively static library builds; however, if CPython is providing a reliable source of binaries, then switching to framework builds (especially for OpenSSL) might be worth looking into.

Lastly - with my Emscripten hat on: the current situation is doing a dependency build as part of every release, rather than storing binaries. We’ve been contemplating reworking the build script to add caching of these built products; but if there’s an effort underway to add a single repository of prebuilt dependency binaries, that’s even better.

The compilation is that we need to release binaries for specific Emscripten versions. Emscripten doesn’t provide ABI compatibility between releases (even between micro releases), so CPython is adopting a policy of pinning the Emscripten version for each Python version (it will be EMSDK 4.0.12 for Python 3.14, for example). As new Python versions become available, we’ll need to add additional Emscripten SDK versions to the build set, and produce a binary for each Python release. We’ll also need to be able to add new SDK targets easily as CPython’s main develops - if we bump the SDK used by CPython, we’ll need to be able to easily trigger the addition of binaries for that new SDK.

10 Likes

Just wanted to drop a note here that @emmatyping @itamaro and I are going to start drafting a PEP for this now that some time has passed and folks have had a chance to provide feedback.

@freakboy3742 We’ll follow up with you as well to ensure that the proposal accommodates both Android and iOS deps.

9 Likes