Add an Unstable C API for immortalizing objects

I’ve been looking at a multithreaded scaling issue in NumPy. This has uncovered a number of scaling issues in NumPy and CPython that we are in the process of resolving.

In the process, we noticed that there is some reference count contention around two static, global PyCapsules that NumPy defines for the default allocator and floating point exception handler. Users can override these but in practice almost no one does that and most people use the global, default allocators.

Making things more complicated, these PyCapsule objects and the pointers they wrap are part of the public NumPy API and users may be relying on them.

In the thread, @colesbury suggested making the troublesome PyCapsule objects immortal. And indeed, this does fix the issue. If I measure using my fast-cache branch of NumPy and my fast-cache-no-immortal branch with release builds of NumPy on my M3 Max Macbook Pro, I see a ~20% wall clock performance improvement for the multithreaded case in a test script by making these capsules immortal. Without this change, multiprocessing beats multithreading on this benchmark. These special branches are necessary because they merge in a number of WIP pull requests from me and Kumar Aditya that fix other performance bottlenecks.

Of course I can only do that right now by using the internal _Py_SetImmortal function and defining Py_BUILD_CORE in NumPy internals. I’d prefer to avoid doing that or at least only do that on older Python versions with frozen ABIs via the pythoncapi-compat header.

I’d like to propose making a version of _Py_SetImmortal public as PyUnstable_SetImmortal with the following API and contract:

int PyUnstable_SetImmortal(PyObject *op)

Marks the object op immortal. The argument should not be a string and should be a newly created Python object that is not exposed for concurrent reads to multiple interpreters or threads. This is a one-way process: objects can only be made immortal, they cannot be made mortal once again. Immortal objects do not participate in reference counting and will never be garbage collected. If this function fails, the object is not immortal. Returns -1 on failure and 0 on success.

I don’t think it’s a good idea to expose _Py_SetImmortal directly since it cannot fail and that leads to races reported on the issue tracker. There’s also a note in the internal docs that you shouldn’t pass _Py_SetImmortal a string, so I also added that caveat to the contract.

In NumPy I’d like to call this function on the two PyCapsule pointers I describe above in the module initialization code so the additional checks I’m proposing shouldn’t be a performance concern. I think usually people will want to do this in initialization code and a few extra checks shouldn’t be problematic.

Eventually I think we might also want to make more things in NumPy immortal - in particular there are some singleton instances that represent data types (e.g. np.dtype(‘int64’) is shared between all int64 arrays) that may benefit from being marked immortal.

There’s also some prior discussions initiated by @ZeroIntensity on this topic. He tells me privately that he’s no longer interested in pursuing those proposals. I’m not proposing adding a Python API for this.

12 Likes

I think this seems reasonable. Immortalization is a concept I expect us to keep in CPython in the future. Regarding

Perhaps it would be better to say something like “the object should be uniquely referenced by it’s creating thread”?

2 Likes

+1!

It wouldn’t surprise me if this rule expanded to more objects in the future, so we should be cautious about what we document as okay to immortalize. How about we limit PyUnstable_SetImmortal to only user-defined objects? Given the current motivation and proposed API, I don’t see any reason to immortalize an object you don’t control.

3 Likes

Sorry, does this mean “only objects local to a module” or “only types defined outside of CPython?” If it’s the latter, that won’t help NumPy - I want to immortalize PyCapsules.

Contrary to the internal docs, I don’t think that’s actually an invariant. You can have immortal but not interned strings from reference count overflow. (An invariant in software is always true; this is only mostly true.)

3 Likes

I meant “only types defined outside of CPython”, but if you want to avoid contention on capsules specifically, why not use deferred reference counting? Capsules are allowed to be GC-tracked these days, so PyUnstable_Object_EnableDeferredRefcount will work just fine.

2 Likes

Neat, I didn’t know that made it into 3.14.

Sadly, that doesn’t seem to help nearly as much as making them immortal. I’m still seeing reference count contention: Firefox Profiler .

In the linked profile, select focus between 3 and 12 seconds on the timeline and then select one of the thread pool executors in the list of tracks on the top left then select the “flame graph” tab. If you drill down to some of the slowdowns, particularly ones with PyDataMem_GetHandler and fetch_curr_extobj_state, you’ll see some refcounting showing up in the profiles. I actually don’t see any performance benefit to turning on deferred reference counting here at all. See this commit if you want to try yourself.

That said, Maybe I’ve done something incorrect or maybe there’s a bug somewhere that’s preventing deferred reference counting from being beneficial?

Maybe given @colesbury’s note above that the statement in the internal docs isn’t correct, we can simply document that the object has to be uniquely referenced with no other caveats but then add that CPython builtins and other types defined outside of user code may not be made immortal. I also probably made a mistake saying that not making the object immortal is a failure. Instead, taking a cue from PyUnstable_Object_EnableDeferredRefcount, let’s have it return 0 or 1 to indicate whether the object was made immortal.

So along with @emmatyping’s suggestion:

int PyUnstable_SetImmortal(PyObject *op)

Marks the object op immortal. The argument should be uniquely referenced by the calling thread. Builtin types and other types defined outside of the user’s code may not be made immortal.

This is a one-way process: objects can only be made immortal, they cannot be made mortal once again. Immortal objects do not participate in reference counting and will never be garbage collected.

This function is intended to be used soon after op is created, by the code that creates it, such as in the object’s tp_new slot.

Returns 1 if the object was made immortal and returns 0 if it was not. This function cannot fail.

3 Likes

What’s the objection to making built-in types immortal? (I assume we’re talking about immortalising a PyCapsule instance rather than the PyCapsule_Type type object itself)

I can see that you might want specific rules for some built-in types. But in that case, why can’t PyUnstable_SetImmortal know the rules and just return 0 rather than leaving it up to users to remember the rules? It doesn’t have to be a quick function really because it’s really just for one-off initialization so it wouldn’t suffer too much from a long list of checks.

3 Likes

Agreed - I proposed more-or-less this as an alternative in my last reply.

Deferred reference counting avoids reference counting contention from the interpreter’s stack but here the contention is happening in C code using C-API, in this case it is ineffective and making the object immortal is necessary. Functions like Py_INCREF avoids reference counting only for immortal objects and objects with deferred reference counting get increfed like any other object.

4 Likes

@kumaraditya303 implemented this and got it merged just in time for 3.15 alpha 6. Thanks Kumar!

The docs are available here: Object Protocol — Python 3.15.0a6 documentation.

1 Like