PEP 652: Maintaining the Stable ABI

Hello!
Here is a text that I intend to turn into a PEP soon.
It renders OK-ish on Discourse, but it’s not Markdown. You can get a better rendered (and possibly updated) copy in my repo.

An earlier version was meanderingly discussed on capi-sig.

Update: this is now PEP 652.

Abstract

Python’s Stable ABI and Limited API, introduced in :pep:384,
will be formalized in a single definitive file, tested, and documented.

Motivation

:pep:384 defined a limited API and stable ABI, which allows extenders and
embedders of CPython to compile extension modules that are binary-compatible
with any subsequent version of 3.x.
In theory, this brings several advantages.

A module can be built only once per platform and support multiple versions
of Python, reducing time, power and maintainer attention needed for builds
(in exchange for potentially worse performance).

Binary wheels using the stable ABI work with new versions of CPython
throughout the pre-release period, and can be tested in environments where
building from source is not practical.

Also, if generators like Cython support the stable ABI, some projects
could offer stable-ABI wheels in addition to specific-ABI ones, in order
to support future (alpha/beta/early release) interpreters, while still
using version-specific optimizations on CPython versions they target
specifically.

As a welcome side effect of the limited API’s hiding of implementation details
is that this API is becoming a viable target for alternate Python
implementations that would be incompatible with the full C API.

However, in hindsignt, PEP 384 and its implementation has several issues:

  • It is ill-defined. According to PEP 384, functions are opt-out:
    all functions not specially marked are part of the stable ABI.
    In practice, for Windows there’s a list that’s opt-in.
    For users there is a #define that should make only the stable ABI
    available, but There is no process that ensures it is kept up-to date.
    Neither is there a process for updating the documentation.
  • Until recently, the stable ABI was not tested at all. It tends to break.
    For example, changing a function to a macro can break the stable ABI as the
    function symbol is removed.
  • There is no way to deprecate parts of the limited API.
  • It is incomplete. Some operations are not available in the stable ABI,
    with little reason except “we forgot”.
    (This last point is one the PEP will not help with, however.)

This PEP defines the limited API more clearly and introducess process
designed to make the stable ABI and limited API more useful and robust.

Rationale

This PEP contains a lot of clarifications and definitions, but just one big
technical change: the stable ABI will be explicitly listed in
a human-maintained “manifest” file.

There have been efforts to collect such lists automatically, e.g. by scanning
the symbols exported from Python.
Such automation might seem easier to maintain than a handcrafted file,
but has major issues: for example, the set exported symbols has
platform-specific variations.
Also, the cost of updating an explicit manifest is small compared
to the overall work that should go into changing API that will need to
be suppported forever (or until Python 3 reaches end of life, if that
comes sooner).

This PEP proposes automatically generating things from the manifest:
initially documentation and DLL contents, with later possibilities
for also automating tests.

Stable ABI vs. Limited API

:pep:384 and this document deal with the Limited API and the Stable ABI,
two related but distinct concepts.
This section clarifies what they mean and defines some of their semantics
(either pre-existing or newly proposed here).

The word “Extensions” is used as a shorthand for all code that uses the
Python API, e.g. extension modules or software that embeds Python.

Stable ABI

The CPython Stable ABI is a promise that extensions built against
a specific Stable ABI version will be usable with any newer interpreter of the
same major version, on the same platform and with the same compiler & settings.
For example, a extension built with the CPython 3.10 Stable ABI will be usable
with CPython 3.11, 3.12, and so on, but not necessarily with 4.0.

The Stable ABI is not generally forward-compatible: an extension built and
tested with CPython 3.10 will not generally be compatible with CPython 3.9.

… note::
For example, starting in Python 3.10, the Py_tp_doc slot may be set to
NULL, while in older versions, a NULL value will likely crash the
interpreter.

The Stable ABI trades performance for its stability.
For example, extensions built for a specific CPython version will automatically
use faster macros instead of functions in the stable ABI.

Future Python versions may deprecate some members of the Stable ABI.
Deprecated members will still work, but may suffer from issues like reduced
performance or, in the most extreme cases, memory/resource leaks.

Limited API

Stable ABI guarantee holds for extensions compiled from code that restricts
itself to the Limited API, a subset of CPython’s C API.

Extensions that target the limited API should define the preprocessor macro
Py_LIMITED_API to either 3 or the current PYTHON_API_VERSION.
This will enable stable ABI versions of several functions and limit definitions
to the limited API.
(However, note that the macro is not perfect: due to technical issues or
oversigt, some non-limited API might be exposed even with it defined.)

The Limited API is not guaranteed to be stable.
In the future, parts of the limited API may be deprecated.
They may even be removed, as long as the stable ABI is kept
stable and Python’s general backwards compatibility policy, :pep:387,
is followed.

… note::

For example, a function declaration might be removed from public header
files but kept in the library.
This is currently a possibility for the future; this PEP does not to propose
a concrete process for deprecations and removals.

The goal for the limited API is to cover everything needed to interact
with the interpreter.
The main reason to not include a public API in the limited subset
should be that it needs implementation details that change between CPython
versions (like struct memory layouts) – usually for performance reasons.

The limited API is not limited to CPython. Other implementations are
encouraged to implement it and help drive its design.

Specification

To make the stable ABI more useful and robust, the following changes
are proposed.

Stable ABI Manifest

All members of the stable ABI – functions, typedefs, structs, data, macros,
and constants – will be explicitly listed in a single “manifest” file,
Misc/stable_abi.dat.

For structs, any fields that users of the stable ABI are allowed to access
will be listed explicitly.

The manifest will also serve as the definitive list of the Limited API.
Members that are not part of the Limited API, but are part of the Stable ABI
(e.g. PyObject.ob_type, which is accessible by the Py_TYPE macro),
will be annotated as such.

For items that are only available on some systems, the manifest will record the
feature macro that determines their presence (such as MS_WINDOWS or
HAVE_FORK).
To make the implementation (and usage from non-C languages) easier,
all such macros will be simple names; if a future item needs a “negative” macro
or complex expression (such as a hypothetical #ifndef MACOSX or
#if defined(POSIX) && !defined(LINUX)), a new feature macro will be derived.

The format of the manifest will be subject to change whenever needed.
It should be consumed only by scripts in the CPython repository.
If a stable list is needed, a script can be added to generate it.

The following wil be generated from the ABI manifest:

  • Source for the Windows shared library PC/python3dll.c.
  • Input for documentation, Doc/data/stable_abi.dat.
  • Test case that checks the runtime availablility of symbols (see below).

Runtime availablility of the ABI symbols will be checked using ctypes,
see :ref:Testing the Stable ABI below.

The following will be checked against the stable ABI manifest as part of
continuous integration:

  • The reference count summary, Doc/data/refcounts.dat, includes all
    function in the stable ABI (among others).
  • The functions/structs declared and constants/macros defined
    when Python.h is included with Py_LIMITED_API set.
    (Initially Linux only; checks on other systems may be added in the future.)

After the initial implementation, details such as function arguments will be
added and the manifest will be checked for internal consistency (e.g. all
types used in function signatures are part of the API).

Contents of the Stable ABI

The initial stable ABI manifest will include:

  • The Stable ABI specified in :pep:384.
  • Everything listed in PC/python3dll.c.
  • All structs (struct typedefs) which these functions return or take as
    arguments. (Fields of such structs will not necessarily be added.)
  • New type slots, such as Py_am_aiter.
  • The type flags Py_TPFLAGS_DEFAULT, Py_TPFLAGS_BASETYPE,
    Py_TPFLAGS_HAVE_GC, Py_TPFLAGS_METHOD_DESCRIPTOR.
  • The calling conventions METH_* (except deprecated ones).
  • All API needed by macros is the stable ABI (annotated as not being part of
    the limited API).

Items that are no longer in CPython when this PEP is accepted will be removed
from the list.

Additional items may be aded to the initial manifest according to
the checklist below.

Documenting the Limited API

Notes saying “Part of the limited API” will be added to Python’s documentation
automatically, in a way similar to the notes on functions that return borrowed
references.

Testing the Stable ABI

An automatically generated test module will be added to ensure that all symbols
included in the stable ABI are available at compile time.

Additionally, a test will be added that aims to call each function
in the stable ABI using ctypes, with exceptions for e.g. functions related
to fatal errors and intepreter initialization/shutdown.
This should prevent regressions when a function is converted to a macro,
which keeps the same API but breaks the ABI.
It should also help ensure that the ABI of function signatures doesn’t change.
(Creating this test is expected to take longer than the rest of this PEP to
implement, possibly it’ll need several releases.)

Changing the Limited API

A checklist for changing the limited API, including adding new items to it
and removing existing ones, will be added to the Devguide_.
The checklist will 1) mention best practices and common pitfalls in Python
C API design and 2) guide the developer around the files that need changing and
scripts that need running when the limited API is changed.

Below is the initial proposal for the checklist.
(After the PEP is accepted, see the Devguide for the current version.)

Note that the checklist applies to new changes; several items
in the existing limited API are grandfathered and couldn’t be added today.

Design considerations:

  • Make sure the change does not break the Stable ABI of any version of Python
    since 3.5.

  • Make sure no exposed names are private (i.e. begin with an underscore).

  • Make sure the new API is well documented.

  • Make sure the types of all parameters and return values of the added
    function(s) and all fields of the added struct(s) are be part of the
    limited API (or standard C).

  • Make sure the new API and its intended use follows standard C, not just
    features of currently supported platforms.
    Specifically, follow the C dialect specified in :pep:7.

    • Do not cast a function pointer to void* (a data pointer) or vice versa.
  • Make sure the new API follows reference counting conventions. (Following them
    makes the API easier to reason about, and easier use in other Python
    implementations.)

    • Do not return borrowed references from functions.
    • Do not steal references to function arguments.
  • Make sure the ownership rules and lifetimes of all applicable struct fields,
    arguments and return values are well defined.

  • Think about ease of use for the user. (In C, ease of use itself is not very
    important; what is useful is reducing boilerplate code needed to use the
    API. Bugs like to hide in boiler plates.)

    • If a function will be often called with specific value for an argument,
      consider making it default (used when NULL is passed in).
  • Think about future extensions: for example, if it’s possible that future
    Python versions will need to add a new field to your struct,
    how will that be done?

  • Make as few assumptions as possible about details that might change in
    future CPython versions or differ across C API implementations:

    • The GIL
    • Garbage collection
    • Memory layout of PyObject, lists/tuples and other structures

If following these guidelines would hurt performance, add a fast function
(or macro) to the non-limited API and a stable equivalent to the limited API.

If anything is unclear, or you have a good reason to break the guidelines,
consider discussing the change at the capi-sig_ mailing list.

… _capi-sig: Mailman 3 Info | capi-sig@python.org - python.org

Procedure:

  • Move the declaration to a header file directly under Include/, into a
    #if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x03yy0000 block
    (with the yy corresponding to the target CPython version).
  • Make an entry in the stable ABI manifest, Misc/stable_abi.dat.
  • For functions, add a test that calls the function using ctypes
    (XXX: mention filename).
  • Regenerate the autogenerated files using make regen-all.
    (XXX: check non-Linux platforms)
  • Build Python and run checks using make check-abi.
    (XXX: check non-Linux platforms)

Advice for Extenders and Embedders

The following notes will be added to documentation.

Extension authors should test with all Python versions they support,
and preferably build with the lowest such version.

Compiling with Py_LIMITED_API defined is not a guarantee that your code
conforms to the limited API or the stable ABI.
Py_LIMITED_API only covers definitions, but an API also includes other
issues, such as expected semantics.

Examples of issues that Py_LIMITED_API does not guard against are:

  • Calling a function with invalid arguments
  • A function that started accepting NULL values for an argument.
    in Python 3.9 will fail if NULL is passed to it under Python 3.8.
    Only testing with 3.8 (or lower versions) will uncover this issue.
  • Some structs include a few fields that are part of the stable ABI and other
    fields that aren’t.
    Py_LIMITED_API does not filter out such “private” fields.
  • Code that uses something that is not documented as part of the stable ABI,
    but exposed even with Py_LIMITED_API defined, may break in the future.
    Despite the team’s best efforts, such issues may happen.

Backwards Compatibility

Backwards compatibility is one honking great idea!

This PEP aims at full compatibility with the existing stable ABI and limited
API, but defines them terms more explicitly.
It might not be consistent with some interpretations of what the existing
stable ABI/limited API is.

Security Implications

None known.

How to Teach This

Technical documentation will be provided in Doc/c-api/stable
and linked from the What’s New document.
Docs for CPython core developers will be added to the devguide.

Reference Implementation

Nothing presentable yet.

Rejected Ideas

Defining a process for deprecations/removals

While this PEP acknowledges that parts of the limited API might be deprecated
or removed in the future, a process to do this is not in scope, and is left
to a possible future PEP.

Open Issues

None so far.

References

… _Devguide: https://devguide.python.org/

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

3 Likes

Looks good.

I’d much prefer the list to be a txt file rather than dat, though I wonder if we have the parsers around now to just make it an actual header file? Might make some processing a little more complex, but could help us ensure that signatures don’t change over time either, as well as names.

I’d also vote against designing it for ease of use. Rather, make it easy for tools like Cython to generate code against it. Long term, we all seem to prefer the world where we transpile or generate the C code. This design call is unlikely to have much impact in reality, but I think it indicates our intent better.

1 Like

What’s the difference? My precendent for .dat is refcounts.dat.
(But maybe my Linux background makes me care less about extensions than I should.)

A C header is not good at storing additional details:

  • is a function ABI-only (to be only to be used from macros)?
  • when was it added? (hopefully I can put that info in the docs)
  • is it deprecated? When will it be removed?

And then you have the pesky #ifdef: something that can give me info for both MS_WINDOWS and HAVE_FORK defines will not be a normal C parser, and now you get into what subset of C the parser can support (and how formally the subset can be defined), … and none of this is actually solving the problem at hand.

So, while I did experiment with C syntax a bit, I decided to avoid this rabbit hole for now, instead make a simple format with a simple parser. Both are implementation details that can easily be changed later (perhaps to a C header file), but I’d like to leave that out of scope now.

That said, C syntax would be very helpful for expressing C types, especially function types where the name of a thing is buried in the middle of a declaration. When that info is needed in the manifest, I’ll allow myself to start wondering if pegen would be a good tool here.

Excuse me for the newbie question. Can you clarify the difference between the limited and the stable API? Is one a subset of the other? I presume both are subsets of the public C API?

I do need to make the text more approachable, don’t I. Thank you for the suggestion!

I’ll add more explanation to the PEP before posting it, but in short: It’s limited API but stable ABI (application binary interface). ABI concerns the compiled extensions, not the source code.
The limited API is a subset of the public C API. Code that only uses this subset it will compile into an extension that only uses the stable ABI. And that means it will be compatible with all future CPython 3.x versions (if all the lower-level details that affect ABI are the same – OS, compiler settings, etc.).

1 Like

Thanks, so the limited API and stable ABI are closely related, though the limited API may contain macros that translate into other functions or translate into direct object accesses where the object layout is public. (Are there macros that do the latter? And so, are there parts of the object layout that are public? Is e.g. the layout of struct { ssize_t ob_refcnt; PyTypeObject *ob_type; } considered public? (I am beginning to care because I am beginning to be more interested in performance. :slight_smile:

1 Like

Yes, that layout is part of the stable ABI and limited API, as specified in PEP 384. Issues like that are a problem for some proposed optimizations.

And while I am against breaking working code (besides this proposal, I’ve been known to annoy people by insisting on following PEP 387 “Backwards Compatibility Policy” to the letter), you might also want to check out Victor’s point of view and his PEP 620 “Hide implementation details from the C API”. He argues for being way more aggressive in chasing optimizations.

And I forgot to answer this:

Yes. Incref/decref are the best examples. Essentially, adding indirection there would make things unbearably slow.

Maybe we can consider using ABI Compliance Checker instead or together with such test module

Technically, the full ABI can be validated checking the DWARF information generated if Python is compiled with -g3, at least in Linux.

That sounds like a good idea! But I’m not comfortable committing to the CI changes (hooking up an external tool; having access to previous Python versions).
Instead, I think I’ll just remove the ctypes function-calling tests from this PEP – they’ll be redundant if the compliance checker is added. Either can be added later, without a PEP.
(I’ll keep the test that will look up symbols using ctypes: that’s cross-platform and easy to run, so it has its place.)

Right, I was thinking if having this in a buildbot for instance

I’d like to share my 2 cents here, in HPy we have a similar thing and we are very happy to use C syntax for it: it turns out that there is nothing better than C syntax to express C structs and signatures :sweat_smile:.

public_api.h is the input for our “autogen” tool which – as the name suggests – autogenerates a lot of stuff. Note also that although it looks like a header, it’s never seen by a real C compiler (that’s also why we have all those typedef int XXX at the beginning – we just need the parser to know they are types, and for our purpose it’s enough).

For parsing, we use pycparser (which is also used by cffi, FWIW). An example of autogen code is this, which autogenerates the definition of a big C struct.

you can put some lightweight syntax and/or convention on top of C. For example, look at the typedef of HPySlot_Slot at the end of public_api.h: it contains “calls” to a SLOT “function/macro”, although this function doesn’t exist, but it’s special-cased by our tool. Following the same principle, I can imagine having something like this in a hypotetical stable_abi.h:

STABLE_ABI_ADDED("2000-01-01");
STABLE_ABI_DEPRECATED;
PyObject * PyObject_GetAttr(PyObject *, PyObject *);

you can be creative with the syntax of course, or even put the extra data inside comments, etc.

Thanks for the input!

I want to stay away from using external Python libraries to build CPython, so for me, pycparser is out. Otherwise it’s a great library!

I’d be happy to add a hand-rolled pegen-based parser if/when pegen is more generally usable. So far, it’s not – it’s a CPython implementation detail. In an ideal world, I would help improve the situation, but while I enjoy yak-shaving, I need to limit it to get stuff done.
Still, this would be a custom parser. A subset of C with extra additions is still a bespoke language; I don’t see the advantage here.

I’d be more comfortable automatically generating a C header.

FYI, under Tools/c-analyzer/ you will find all the tooling I’ve written for the multi-core Python project, including a C parser that is able to parse all of the CPython repo. (It is only C-accepting, rather than validating but we have the C compiler for that.)

As part of that there’s a script for various analysis tasks (e.g. check for globals): “./python Tools/c-analyzer/c-analyzer.py …”. The subcommand that I’ve been working on lately is “capi”, which identifies all the parts of the C-API, along with distinction for internal/public/limited. (I think I called limited “stable”.)

It may be worth taking a look to see if the parser or any of the tooling that uses it would help here.

1 Like

I’ve made a few changes based on the discussion here and submitted the draft as PEP 652.


Thanks for the pointer! I’ll keep it in mind. (But so far, this makes me even more convinced that I’d rather generate C than parse it…)
Could this be simplified with pegen?

I haven’t looked at Eric’s parser, but writing a full grammar for C using pegen would be a rather big project. (Writing the Python was big too.)

1 Like