Limited C API: implement Py_INCREF() and Py_DECREF() as function calls

Hi,

I modified Python 3.8 to make the PyObject structure ABI compatible between Python release and debug builds. You can just use python-debug (debug build) as a drop-in replacement of python (release build), it “just works” (on most platforms, ex: Linux).

I fixed Python 3.10 to make add support of the limited C API to Python debug build (Py_REF_DEBUG): Py_INCREF() and Py_DECREF() are implemented as opaque function calls in this case. Moreover, PyObject_INIT() is now implemented with an opaque function call: _Py_NewReference() is no longer called directly as the ABI level.

Previously, the C API exposed many implementation details at the ABI level (through inlined functions and macros):

  • private _Py_RefTotal variable (Py_REF_DEBUG macro of debug build)
  • private _Py_NewReference() and _Py_ForgetReference()
  • private _Py_tracemalloc_config variable
  • private _Py_inc_count() and _Py_dec_count() functions of COUNT_ALLOC special build (removed in Python 3.9)

In Python 3.12, with the implementation of immortal objects (PEP 683), Py_INCREF() and Py_DECREF() implementation became more complex than PyObject.ob_refcnt++ and PyObject.ob_refcnt--.

Extract of Py_INCREF() implementation (simplified code):

struct _object {
    _PyObject_HEAD_EXTRA
    union {
       Py_ssize_t ob_refcnt;
#if SIZEOF_VOID_P > 4
       PY_UINT32_T ob_refcnt_split[2];
#endif
    };
    PyTypeObject *ob_type;
};

static inline int _Py_IsImmortal(PyObject *op)
{
#if SIZEOF_VOID_P > 4
    return _Py_CAST(PY_INT32_T, op->ob_refcnt) < 0;
#else
    return op->ob_refcnt == _Py_IMMORTAL_REFCNT;
#endif
}

static inline void Py_INCREF(PyObject *op)
{
#if SIZEOF_VOID_P > 4
    // Portable saturated add, branching on the carry flag and set low bits
    PY_UINT32_T cur_refcnt = op->ob_refcnt_split[PY_BIG_ENDIAN];
    PY_UINT32_T new_refcnt = cur_refcnt + 1;
    if (new_refcnt == 0) {
        return;
    }
    op->ob_refcnt_split[PY_BIG_ENDIAN] = new_refcnt;
#else
    // Explicitly check immortality against the immortal value
    if (_Py_IsImmortal(op)) {
        return;
    }
    op->ob_refcnt++;
#endif
    _Py_INCREF_STAT_INC();
#ifdef Py_REF_DEBUG
    _Py_INC_REFTOTAL();
#endif
}

Link to full implementation:

By the way, the current implementation causes new C compiler warnings/errors.

While PEP 683 says that the stable ABI is not affected (C extensions built with Python 3.11 continue to work), IMO it’s now time to consider converting Py_INCREF() and Py_DECREF() to opaque function calls in limited C API version 3.12 and newer. I propose: issue and PR.

Guido opened a similar discussion at capi-workgroup/problems.

In the long term, I even consider doing a similar change for the regular C API, as HPy does, to hide even more implementation details and slowly bend the regular C API to the limited C API and the stable ABI. But that’s out of my short term scope. Now I only consider changing the limited C API version 3.12.

Victor

3 Likes

I agree, but I don’t think we can make the change in 3.12.

Even more than a regular breaking change, if we introduce “ABI3 built on Python 3.12 isn’t really ABI3 and won’t work on 3.11 or earlier” then we break a lot of promises.

Perhaps this is the point where we introduce ABI4? There are plenty of cleanups we’ve been avoiding doing, so this would be the opportunity.

I don’t think there’s any way we can modify INCREF/DECREF without inherently breaking the stable ABI, the current changes for immortal objects included.

1 Like

My change is backward compatible, it doesn’t break the backward compatibility (according to the limited C API specification). Please correct me if I’m wrong. (Check also my PR implementation.)

For the Python debug build, I already modified Py_INCREF() and Py_DECREF() to always use an opaque function call in Python 3.12: it uses Py_IncRef() and Py_DecRef() which exist since Python 3.0, or _Py_IncRef() and _Py_DecRef() when requesting the limited API version 3.10 or newer. So the implementation works on Python 3.2, as request by PEP 384: Stable ABI.

Extract of Py_INCREF() (simplified code):

tatic inline void Py_INCREF(PyObject *op)
{
#if defined(Py_REF_DEBUG) && defined(Py_LIMITED_API)
    // Stable ABI for Python built in debug mode. _Py_IncRef() was added to
    // Python 3.10.0a7, use Py_IncRef() on older Python versions. Py_IncRef()
    // accepts NULL whereas _Py_IncRef() doesn't.
#  if Py_LIMITED_API+0 >= 0x030a00A7
    _Py_IncRef(op);
#  else
    Py_IncRef(op);
#  endif
#else
    ...
#endif
}

With my proposed change, if you target limited C API version 3.12, the wheel binary will use _Py_IncRef() and _Py_DecRef() which were added to Python 3.10. So the wheel binary should only be used with Python 3.12 and newer, no?

Request limited C API version 3.2 if you care about maximum compatibility (support the maximum number of Python versions): my change would not be used in this case, the code would access directly PyObject.ob_refcnt as before.

1 Like

In the most trivial case, yes, your proposed change is compatible (your existing change is fine, debug builds don’t matter here, it’s the new proposal that concerns me).

The problem is back-and-forth compatibility. ABI3 built on 3.11 will have a different INCREF than 3.12 and so is not technically usable. A change in 3.12’s normal INCREF behaviour means that an ABI3 built on 3.12 targeting 3.11 or earlier will have incorrect INCREF on earlier versions. A difference between ABI3 INCREF and non-ABI3 INCREF within 3.12 builds will lead to issues that remove the value proposition of making the change in the first place.

What this means is that we’re effectively telling users to hold their noses and ignore known compatibility issues, or to avoid ABI3 builds, when we could do neither and instead just not deliver a new feature of copy-on-write being triggered less often (as long as you don’t have ABI3 libraries in your system). The tradeoffs seem all wrong.

You know full well this doesn’t work :wink: The limited API has changed enough over time that promise is well broken, and it’s somewhere around 3.6-3.7 that things start becoming usable again (if you’re careful, and test thoroughly on multiple versions on multiple platforms, by which time you may as well have just built without it).

We’d be better off deprecating the limited API entirely and adopting HPy as the stable interface. (I assume they’re in a position to handle these INCREF changes safely, but haven’t checked for sure.)

2 Likes

Oh, and you can only call old functions under limited API if they were part of the limited API at that time. Functions on Windows are only exported explicitly, and if they’re weren’t listed as limited API, extensions will simply break.

It seems like Python 3.10 added Py_IncRef() and Py_DecRef to PC/python3dll.c, it wasn’t listed there before.

$ git show 3.9:PC/python3dll.c|grep -E 'Py_(Inc|Dec)Ref'
# no results

$ git show 3.10:PC/python3dll.c|grep -E 'Py_(Inc|Dec)Ref'
EXPORT_FUNC(_Py_DecRef)
EXPORT_FUNC(_Py_IncRef)
EXPORT_FUNC(Py_DecRef)
EXPORT_FUNC(Py_IncRef)
1 Like

The SC approved PEP 683 knowing that it impacts the stable ABI, it’s documented in PEP 683. I’m not sure that I get your rationale. If you build an extension for the stable ABI with 3.11, you get a buggy Py_INCREF, but if you build it with 3.12 (with my change), you get the correct Py_INCREF. Your point is that it’s better that everybody gets the same buggy Py_INCREF on Python 3.12 and newer forever?

Right, my change has no impact on limited API 3.11. Now I’m no longer sure if your point is about PEP 683 or my proposed change. PEP 683 got approved and is now implemented. My proposition is only to make the limited C API better for PEP 683.

PEP 683 states that using Py_INCREF of Python 3.11 or older can have a negative impact on performance in some cases:

This will invalidate all the performance benefits of immortal objects.

It doesn’t crash Python or change the object lifetime.

Is your point that the stable ABI was never usable and so it’s better to remove it rather than fixing it? It’s currently used by cryptography and PySide projects which seem to be happy about it.

Yes, there were implementation issues with the stable ABI. It’s getting better. The whole ABI (not only the stable ABI) is now carefully tested once Python 3.x.0 final is released (in stable branches). There are now better tests for the limited C API: PEP 652: Maintaining the Stable ABI, accepted and implemented in Python 3.10. Implementation issues are being fixed over time.

But I don’t think that it’s relevant here. As I wrote before, my change only concerns the limited C API version 3.12 and newer. If you target older limited C API, these functions are simply not used.

AFAIK, this has always been the case. If I build with 3.11’s stable ABI and use the buffer API, it won’t (necessarily) work on 3.10. Here, if I use the 3.12 stable ABI, I expect it to work with 3.13 and later, but 3.11 and older have no expectations.

It has, but unless Victor’s next change brings back the old Py_INCREF in 3.12 when compiling with Py_LIMITED_API < 0x030C0000, keeps the current Py_INCREF for !defined(Py_LIMITED_API) and adds the new call for Py_LIMITED_API >= 0x030C0000 then we’ll introduce new breakage (on top of the breakage that already exists for people who thought that Py_LIMITED_API < 0x030C0000 would be compatible with Python 3.12…)

@steve.dower IMO you misunderstood the specification of the limited C API and the stable ABI. Maybe the documentation is unclear about backward and forward compatibility at the API and ABI level. But I don’t see how my proposed change is different than other limited C API / ABI changes done in the change. It respects the specification.

Yes, for each Python version, we have different API and ABI depending the request limited C API version, and the testing matrix is quite big. But I don’t see why Py_INCREF/DECREF is different and how my change breaks any “promise”.

It seems like you’re thinking about hypothetical correctness issues, but there is no known correctness issue: https://github.com/python/cpython/issues/105387#issuecomment-1584643774 I am unable to reason about hypothetical issues. The discussion would be more constructive if you come with a concrete issue, or example of past mistakes related to limited C API / stable ABI which might happen with my proposed change.

I plan to merge my PR soon. It got approved by @gpshead.

1 Like

Hum, the discussion is now splitted between the issue, the PR and here. Copy of my message on my PR.

If I understood correctly @steve.dower rationale, he wants to build a C extension with (the limited C API of) the new Python 3.(N+1), run it with an old Python 3.N and have the exact same behavior as if the C extension was built with an old Python 3.N. The current implementation of the limited C API DOES NOT provide such ABI warranty. There are many corner cases like the function being discussed here: Py_INCREF().

There are around 315 functions in Python 3.12 which are either implemented as a macro or as a static inline function and so is likely to access structure members or relying on another implementation details: Statistics on the Python C API — pythoncapi 0.1 documentation

The intent of moving the limited C API towards “opaque function calls” is to build a C extension with Python 3.N and have exactly the Python 3.x behavior on Python 3.x since the executed code is no longer hardcoded in the extension binary, but instead we call call in the Python executable (libpython).

For example, if this change lands in Python 3.12 beta3, a C extension is built with Python 3.12 beta3, but Python 3.12 beta4 changes again the Py_INCREF() implementation: the C extension will execute the exact new beta4 implementation, rather than using the copy of old beta3 implementation.

Without opaque function calls, the promise of a “stable ABI” is weak. This change moves the limited C API / stable ABI one step towards a better (more stable) stable ABI.

@steve.dower rationale is that the implementation of the limited C API and the stable ABI is broken is should be abandoned. I disagree here. IMO not only the current half-baken implementation is useful to many people, but also it is fixable in an increment way. Moreover, my long term goal is to bend the whole (regular) Python C API towards to limited C API / stable ABI.

5 Likes

I merged my PR: see PEP 683 (Immortal Objects): Implement Py_INCREF() as function call in limited C API 3.12 · Issue #105387 · python/cpython · GitHub for details.

The new compiler warning is not fixed yet: object.h uses an anonymous union in a struct (older C incompatible) · Issue #105059 · python/cpython · GitHub

You’re looking at the wrong place. 3.9 used a different file to list the functions.

$ git show 3.9:PC/python3dll.c | wc --lines
8

$ git grep -E 'Py_(Inc|Dec)Ref' 3.9:PC/python3.def
3.9:PC/python3.def:731:  Py_DecRef=python39.Py_DecRef
3.9:PC/python3.def:758:  Py_IncRef=python39.Py_IncRef

$ git grep -E 'Py_(Inc|Dec)Ref' v3.2.1:PC/python3.def
v3.2.1:PC/python3.def:627:  Py_DecRef=python32.Py_DecRef
v3.2.1:PC/python3.def:646:  Py_IncRef=python32.Py_IncRef

They were there since the very first stable ABI version. AFAICS, it’s safe to use them.

The only reason against them that I can see is performance – but it’s becoming clear that, long-term, any flavor of stable ABI we come up with will need C function calls even for such frequent operations.
If C function calls are too slow for some extension, it should build against the version-specific ABI. (Preferably in addition to a stable ABI build, so it works with older/newer interpreters too.)

So let’s switch to function calls now. I wouldn’t do it in 3.12, as it could have performance implications that affect people who test with betas, but that’s the release manager’s call.
AFAIC, in 3.13 we could switch to function calls in all limited API versions.

1 Like

I think you will need to get SC buy-in for such a plan first.

Right now, every single change in that direction appears to create long discussions, so at least part of the core devs are not aligned with your idea. This causes frustration on both sides and is not a healthy approach to implementing change.

Once we have a clear set of goals, everyone can focus on these goals and discussions would then focus more on implementation details and making change less painful.

FWIW, I have never been a fan of the cross-version ABI stability idea. Many of the top used Python extensions want performance and thus need to be compiled per version. The added confusion around the various API subsets doesn’t really pay off, if they are only used by a handful of less used extensions. With today’s CI/CD pipelines, creating distributions for the common set of targets is not hard anymore (as it was when Martin kicked off the idea) and tooling in that area is getting better all the time as well.

1 Like

I agree and it was discussed in length during the Language Summit: Python Software Foundation News: The Python Language Summit 2023: Three Talks on the C API

The outcome is the creation of the creation of this project which will become an informal PEP listing C API “issues” without proposing solutions: GitHub - capi-workgroup/problems: Discussions about problems with the current C Api

But here the scope is only: the limited C API, Python 3.12 and PEP 683.

1 Like

In terms of development cycle, IMO it’s better to use function calls right now, spend time to measure the performance impact in length, and keep in mind that we can revert to use again inlined in the limited C API in Python 3.12.

Making such change late in the development cycle with a potential negative impact on performance sounds risky. Eddie Elizondo spent long time to investigate different implementations and measure their cost on performance. I don’t get expect someone to come up with a “magic” clever optimization trick next months.

Moreover, IMO the urgency is to fix the compiler warning about the usage of PyObject anoymous union when using C99, especially strict C99 (“pedantic”).

1 Like

As just one example, cryptography can hardly count as a “less used extension” and its devs are quite adamant that without it, their support for new python versions would be reduced/delayed. Seeing how cryptography underlies requests and everything web-related on top of that, this would be a huge barrier for the roll-out of any new CPython version to the broader ecosystem.

5 Likes

Fair enough, but is this really such a good example ?

The project is using a home grown solution to building wheels instead of building on top of the officially supported cibuildwheel, which really makes the process a lot easier.

cibuildwheel supports all Python versions currently supported by cryptography. Most of the heavy lifting is taken care off by this project, so the added overhead of building per version wheels is not that significant: example config.

Quite a few big projects use it successfully: some projects using cibuildwheel, including e.g. NumPy and matplotlib.

Anyway, this is getting off-topic and just my personal opinion on the stable ABI idea and the added complexity that it introduces seen from today’s POV (it was a good idea at the time, esp. for Windows builds).

FWIW, I have never been a fan of the cross-version ABI stability idea. Many of the top used Python extensions want performance and thus need to be compiled per version. The added confusion around the various API subsets doesn’t really pay off, if they are only used by a handful of less used extensions. With today’s CI/CD pipelines, creating distributions for the common set of targets is not hard anymore (as it was when Martin kicked off the idea) and tooling in that area is getting better all the time as well.

I very much want to use a stable ABI for my (proprietary) extension. Building it for two versions of python was extremely painful in the past, and extending that to 3.8, 3.9, 3.10, and 3.11 is the sort of pain I really don’t want. There may be perfectly good solutions for building multiple wheels for different versions when what you what you are producing is “an extension”, but when you have an existing build system for an existing (fairly large) product incorporating those is likely to be hard.

Using the stable ABI makes my life very much simpler (and the performance difference between the stable ABI and the version-specific ABI is trivial for us.)

2 Likes