I’ve been looking at a multithreaded scaling issue in NumPy. This has uncovered a number of scaling issues in NumPy and CPython that we are in the process of resolving.
In the process, we noticed that there is some reference count contention around two static, global PyCapsules that NumPy defines for the default allocator and floating point exception handler. Users can override these but in practice almost no one does that and most people use the global, default allocators.
Making things more complicated, these PyCapsule objects and the pointers they wrap are part of the public NumPy API and users may be relying on them.
In the thread, @colesbury suggested making the troublesome PyCapsule objects immortal. And indeed, this does fix the issue. If I measure using my fast-cache branch of NumPy and my fast-cache-no-immortal branch with release builds of NumPy on my M3 Max Macbook Pro, I see a ~20% wall clock performance improvement for the multithreaded case in a test script by making these capsules immortal. Without this change, multiprocessing beats multithreading on this benchmark. These special branches are necessary because they merge in a number of WIP pull requests from me and Kumar Aditya that fix other performance bottlenecks.
Of course I can only do that right now by using the internal _Py_SetImmortal function and defining Py_BUILD_CORE in NumPy internals. I’d prefer to avoid doing that or at least only do that on older Python versions with frozen ABIs via the pythoncapi-compat header.
I’d like to propose making a version of _Py_SetImmortal public as PyUnstable_SetImmortal with the following API and contract:
int PyUnstable_SetImmortal(PyObject *op)Marks the object
opimmortal. The argument should not be a string and should be a newly created Python object that is not exposed for concurrent reads to multiple interpreters or threads. This is a one-way process: objects can only be made immortal, they cannot be made mortal once again. Immortal objects do not participate in reference counting and will never be garbage collected. If this function fails, the object is not immortal. Returns -1 on failure and 0 on success.
I don’t think it’s a good idea to expose _Py_SetImmortal directly since it cannot fail and that leads to races reported on the issue tracker. There’s also a note in the internal docs that you shouldn’t pass _Py_SetImmortal a string, so I also added that caveat to the contract.
In NumPy I’d like to call this function on the two PyCapsule pointers I describe above in the module initialization code so the additional checks I’m proposing shouldn’t be a performance concern. I think usually people will want to do this in initialization code and a few extra checks shouldn’t be problematic.
Eventually I think we might also want to make more things in NumPy immortal - in particular there are some singleton instances that represent data types (e.g. np.dtype(‘int64’) is shared between all int64 arrays) that may benefit from being marked immortal.
There’s also some prior discussions initiated by @ZeroIntensity on this topic. He tells me privately that he’s no longer interested in pursuing those proposals. I’m not proposing adding a Python API for this.