Build Dependency Specification for Manylinux Wheels (Idea for a PEP)

Hello! I’ve been working on a PEP that I want to bring to the attention of the python community for feedback and critique.

Titled Build Dependency Specification for Manylinux Wheels , it proposes a data spec for capturing the steps required to setup a container for compiling a manylinux wheel. As some packages require modifications to the environment to build correctly (system dependencies installed, python packages installed etc) there would be great benefit to standardizing this data as it could be read by builders / installers, and contributed to by the open source community.

This PEP was contributed to the python-ideas mailing list as an idea here

and was contributed as a draft PEP here

Any and all feedback appreciated! Thank You


Abstract

This document specifies how Python software packages should specify
what actions are required to modify a standard manylinux environment
to correctly build and bundle the package into a manylinux wheel.

Motivation

Python wheels with compiled extensions may link to system libraries,
requiring that the system libraries be available on the host systems
(the system the wheel is installed on) to operate correctly. The
manylinux project solves this problem through the auditwheel package,
which identifies system libraries at compile time and bundles
required libraries with the python wheel. This allows the python
library, its compiled extensions, and any required system libraries
to be installed on a host system without having to install the system
library directly.

In an ideal world, all package authors would make use of
the manylinux project and all python packages that require
system libraries would provide compiled, bundled distributions on
PyPI. However, this is not the case, and many packages do not.
There are valid instances where an author may not provide a manylinux
wheel by choice: for example when a required system library cannot be
bundled due to licensing. However, there are packages on PyPI that
do not provide a wheel when one could be provided.
This
means that these libraries require local compilation prior to use,
resulting in multiple negative side effects for the end user:

  • Required system libraries are not easily determined. They must be
    gleaned from project-specific documentation with no
    standardized format.
  • Compiling extensions can take a long time, adding additional
    expense to rebuilding environments.
  • Compiling a wheel that requires system libraries is non-trivial;
    it is easy to mismatch system library and python library version
    and be presented with cryptic error messages.

Some authors do provide manylinux wheels on PyPI by making use of
the manylinux project. However, the manylinux project does not
provide a standardized way to capture environment setup data.
This results in package authors keeping this data in project
documentation or sometimes not recording it at all.

Rationale

This PEP proposes a common format for the data required to correctly
setup manylinux environments to compile a wheel with required system
libraries. This concept is borrowed from package managers such as
RPM which make use of a .SPEC file to capture this data. This data
can be used in a manylinux container to set up the environment prior
to compiling, resulting in a valid manylinux wheel. This data can
be standardized to allow for automated building of manylinux wheels.
Standardization of this data will allow package consumers to more
easily contribute to building manylinux wheels when an existing
distribution is lacking or not available.

Specification

The data will be located in the pyproject.toml file of a python
project, in a main table titled manylinux_build_specification. The
data will be grouped in sub tables titled per the manylinux version
they are targeting, ex manylinux2014.

pyproject.toml

[manylinux_build_specification.manylinux2014]
extra_base_system_repositories = ["http://foo.com/packages/"]
system_dependencies = ["foo-1.0.0", "bar-1.0.0"]
python_dependencies = ["foo==1.0.0", "bar==1.0.0"]
environment_variables = ["FOO=BAR"]
steps = [
  "./scripts/build_and_upload.sh --my_option"
]

All actions will be performed within a manylinux image. Given that
the manylinux project uses CentOS as the base linux flavor, we can
assume the following:

  • Use of yum for system package management
  • Python versions available in /opt/python/

extra_base_system_repositories

Repositories to add to yum prior to installing system dependencies
Additional repositories from which to download system dependencies.
This allows access to builds of system libraries with the most up to
date patches etc.

system_dependencies

System dependencies to install with yum prior to building.
Entries are expected to be in yum name-version format.

environment_variables

Environment variables to set prior to building.

python_dependencies

Python libraries to install with pip prior to building.
Will be installed for each version of python available in
/opt/python/.

steps

Steps to be executed sequentially with bash. The entire build process
can be captured here, or this can be a call to a separate script.

How to Teach This

This will be taught through examples and documentation provided in
a reference implementation.

Reference Implementation

A reference implementation is currently in development. This will
include the following:

  1. The data spec will be defined in an example python package.
  2. A python package will be created that consumes the data spec
    and sets up a manylinux container appropriately.
  3. A manylinux docker image will be created that runs the
    python package that consumes the data spec prior to building
    a wheel.

References

  • PEP 508 -- Dependency specification for Python Software Packages <https://www.python.org/dev/peps/pep-0508/>_
  • PEP 518 -- Specifying Minimum Build System Requirements for Python Projects <https://www.python.org/dev/peps/pep-0518/>_
  • PEP 571 -- The manylinux2010 Platform Tag <https://www.python.org/dev/peps/pep-0571/>_
  • PEP 599 -- The manylinux2014 Platform Tag <https://www.python.org/dev/peps/pep-0599/>_
  • PEP 600 -- Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions <https://www.python.org/dev/peps/pep-0600/>_
  • PEP 631 -- Dependency specification in pyproject.toml based on PEP 508 <https://www.python.org/dev/peps/pep-0631/>_
  • RPM Packaging Guide: What is a SPEC File? <https://rpm-packaging-guide.github.io/#what-is-a-spec-file>_

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

Is anyone interested on commenting on the work above? I’m still looking for someone to act as a Sponsor for this PEP. Thank You!