Improving collection of stable ABI macros and inline functions

A few weeks ago, I (unexpectedly) started a thread on whether Py_XDECREF and its siblings are part of the limited API/stable ABI despite it not being listed in the stable_abi.toml (TL,DR: They are.).

I’m currently working on improving stable ABI auditing of wheels by allowlisting such inline functions manually if they are found in a C++ extension, since a failed audit mentioning Py_XDECREF was the origin of my inquiry.

Link to the discussion: Stable ABI builds and Py_XDECREF · wjakob/nanobind · Discussion #500 · GitHub

In that thread, @encukou commented that the tooling around collection of macros/inline functions from Python’s header files could be improved, see the comment Stable ABI builds and Py_XDECREF · wjakob/nanobind · Discussion #500 · GitHub.

If there is interest in improving the status quo with respect to this issue, I’d hereby like to offer support. (I wasn’t sure if I should open a GH issue right away, posting here seemed like the safer option.)

Probably related to PEP652 (I would link to it here, but I’m over the limit for links, brand new account).
- Nico

Thanks for the offer!
I’ve started to get to back to that, but I don’t mean to “lick the cookie”. Please go ahead; let me know here if you need help or directions.

So, if you’re looking for a first step, IMO it’d be best to gather all the macros and static inline functions Python.h defines with Py_LIMITED_API, and for each one, decide whether to add it to Misc/stable_abi.toml or put it in #ifndef Py_LIMITED_API. The choices should be ratified by the WG (all at once, after a discussion e.g. here).

The current tooling is in Tools/build/stable_abi.py. Its main purpose is generating several files based on stable_abi.toml; it also has a best-effort checker of stable_abi.toml. (Not a comprehensive generator – it can only look at compiler output for the current platform, but stable_abi.toml covers all other platforms as well.)

“They are” is my opinion. I like to think my opinion matters here (a lot), but, the C API working group is nowadays the authority here.

Technically speaking: macros & static inline functions are not themselves part of any ABI.
If a macro/static inline function is a part of limited API, it means that the calls and data accesses it makes become part of the stable ABI (even if we don’t want users to make the calls themselves, i.e. they’re not public API).

Thanks! So this is more a “bookkeeping”/accounting job for now? By your comments, I was under the impression that there was a need for automatic extraction of inlines/macros (and maybe also symbols?) from the headers into stable_abi.toml.

I’m happy either way, just wanted to clarify the direction.

(Also, what do you mean by

or put it in #ifndef Py_LIMITED_API.

that sounds like making changes to the visibilities of the definitions themselves?

We want to run automatic extraction in CI to verify stable_abi.toml. But those runs aren’t comprehensive – each single run misses #ifdef’d stuff for other platforms/configurations.
(Those #ifdefs should only look at the small, explicitly defined set of feature_macros, but, you can’t simply define MS_WINDOWS on a Mac and expect the headers to compile.)

To populate stable_abi.toml, we need to run the extraction once on all platforms and combine the results. Or once on just one platform, and then verify on all the others, adding platform-specific stuff by hand.
IOW, the code to generate TOML from verification failures could just be semi-automated; a single-use hack.

Yes. If they aren’t part of the limited API, then they shouldn’t be visible if you define Py_LIMITED_API. The headers almost certainly contain some stuff that shouldn’t be there.

Ah, so we’re talking pre-compiled headers. I was under the impression that we were going to parse the symbols/macros and their availabilities directly from the header files in the repository (e.g. by identifying ifdef Py_LIMITED_API || ... statements).

Before I start on this, could you give your thoughts on what the requirements are? Is it supposed to be run in CI, is it supposed to be run by hand, do we look at the raw headers or pre-compiled ones, should there be a report (and if so, in what format), etc.

In any case, I can already start looking at ways to extract info from header files.

Ah, so we’re talking pre-compiled headers. I was under the impression that we were going to parse the symbols/macros and their availabilities directly from the header files in the repository (e.g. by identifying ifdef Py_LIMITED_API || ... statements).

I guess that would work too! But, running a C preprocessor and parsing defines from the output is much easier than building a C preprocessor…

thoughts on what the requirements are

It should run in CI, and detect any macros/functions that are defined in Include/* but aren’t in stable_abi.toml, or vice versa. The output should be a human-readable report (example message here) – we expect it’ll run on PRs that only add/remove a couple of entries; these should generally be added by hand.

A single initial run should generate all the missing entries in TOML format. That can only be semi-automated – it might be an ad-hoc script or editor macro, run on the check results.

Some useful compiler options I found so far:

Annotated preprocessor output: gcc -E -dD -I./ -IInclude/ Python.h
AST dump: clang -Xclang -ast-dump=json -fsyntax-only -I./ -IInclude/ Python.h