Building CPython from source, whether for local development by contributors or releases, requires various dependencies (like zlib, tcl/tk, etc).
Having run into technical limitations with the status quo recently (specifically with LLVM), we (@emmatyping, @savannahostrowski, and @itamaro) figured it would make sense to revisit the entire architecture and workflow from first principles, making it simpler, more consistent, and easier to maintain and sustain.
Status Quo: Developer Builds (in a nutshell)
For “regular developer builds” on Unix-y systems (including Linux, MacOS, FreeBSD, etc.), dependency management is mostly up to the developer and their system package manager.
Typically, tools and libraries required for the build will be set up by the developer using a package manager (e.g., dnf, apt, brew), and picked up by the CPython build using autoconf and pkg-config, with the developer able to customize things using configure options and environment variables.
On Windows, “regular developer builds” include fetching external-deps from cpython-source-deps and cpython-bin-deps and caching them locally.
As support for new platforms is added, similar patterns are used (e.g., android-source-deps, apple-source-deps), with similar approaches to fetching the dependencies during the build (e.g., android build script, emscripten build script). However, each platform’s method of building and fetching dependencies is different.
Status Quo: Release Builds (in a nutshell)
-
CPython doesn’t provide official Linux packages, so this is out of scope.
-
The Windows release build uses the external-deps script to fetch its dependencies.
-
The macOS release build uses a build-installer script to fetch third-party libraries and build them from source. Neither cpython-source-deps nor cpython-bin-deps are used. For JIT builds, macOS builds rely on Homebrew for LLVM.
What We Want to Retain From the Status Quo
-
Flexibility for development builds - Contributors can build CPython against system-provided libraries and should not be forced to use vendored dependencies in dev builds.
-
Separation between CPython source and vendored dependencies - Keep vendored dependencies outside of the main CPython repo so that we keep the main tree small for contributors who don’t need the dependencies.
-
Single point of failure - On Windows, the cpython-bin-deps model centralizes all dependency downloads from GitHub. This consolidation helps avoid CI flakiness by removing reliance on multiple registries (e.g., LLVM downloads from Chocolatey fail during Windows registry outages, causing jobs to fail).
Issues With the Status Quo
-
Technical limitations with cpython-bin-deps – storing binary artifacts in source control is not the best idea, and GitHub enforces a file size limit of 100MB.
-
Consistency & Maintainability – using different approaches for different platforms (or between development and release builds) complicates changes, thereby increasing the maintenance burden and risk exposure.
-
Reproducibility & Provenance – it is not always possible to tell which versions of which dependencies with what additional patches are used, or even what is the authoritative source of vendored source deps or checked-in binary blobs. Downstream CPython redistributors often need to “reverse engineer” the CPython build process and patched dependencies.
Goals For a Redesign
By redesigning dependency management, we aim to reach a simpler end state while addressing the shortcomings of the status quo. We’d like to see:
-
A single place and a single workflow for keeping track of dependencies across versions and target platforms.
-
A well-documented process for adding and changing dependencies that lends itself to automation, reproducibility, and supply chain security.
-
A naturally extensible design that can accommodate future platforms, or if we choose to start providing Linux binary builds at some point in the future.
-
A system that can be consumed and reused by redistributors if they choose to.
-
By redesigning dependency management, we do not intend to take away control from contributors or redistributors building CPython from source.
Proposal Highlights
While the full details are still work-in-progress and should be ironed out in the eventual PEP, we would like to share the core principles and direction we have in mind so far, to solicit early feedback and suggestions.
- Use a single GitHub repo (e.g., “cpython-deps”) that will replace the various source/bin/android/apple deps repos.
- Keep the repo separated from the main cpython repo to avoid unnecessary bloat for all contributors that don’t need all the deps.
- The repo source tree will contain sources, patches, metadata, and build definitions.
- Binary artifacts will be uploaded as GitHub Releases.
- If we build the binary artifacts from the checked-in sources, then the published GitHub Releases should be built and uploaded using GHA Workflows.
- Dependencies will be fetched in the CPython repository using a single tool (superseding the existing fragmented scripts) and can be reused by platform-specific scripts
- Dependency metadata (e.g. version, download URL, hash) will be stored in the main CPython repo as manifest files pointing to specific releases (for binaries) or tags (for source) in the deps repo.
- This allows tracking dependencies (and their evolution) using standard Git branching and history.
Some possible implementation details in the cpython-deps repo:
-
A single branch per library+version? Or a single-branch repo with top-level directory per dependency and sub-directories per version?
-
Dependency metadata should include upstream source info that can be independently verified (e.g., git tag + commit sha + hash over content).
-
Currently, (some) Windows binary dependencies are pre-signed by RMs. We propose moving the RM-signing to the release process instead of the binary-dep-publishing process.