Dear Python team,
I would like to raise a discussion about PEP 620 entitled “Hide implementation details from the C API”. I am consciously not posting this to the PEP discussion sub-forum since that is AFAIK where PEP writers eventually submit their finished work for discussion; the context here is different.
This message is written from the perspective of a (co-) maintainer of pybind11 and nanobind, which are binding libraries bridging C++ and Python. pybind11 is widely used in numerical/ML frameworks including SciPy, Tensorflow, PyTorch, JAX, and others. Google is currently in the process of transitioning to it as default binding tool for C++ projects.
For context: PEP 620 sets out to hide many CPython implementation details that extension libraries rely upon – this includes layout of core data types like PyObject
, PyTupleObject
, PyTypeObject
, etc. The main motivation is the complexity of implementing alternative interpreters like PyPy that need to expose a conforming interface. That motivation makes sense.
However, there is also a flipside. The purpose of this message is to communicate the team’s significant unease about PEP 620. We’re worried about the fallout that this set of changes will have on pybind11, nanobind, and on the larger scientific python ecosystem. We fear that these changes, if realized as proposed, would come at a significant performance and implementation cost.
Just two a few data points:
-
An opaque
PyObject
orPyTupleObject
would mean a dramatic increase in the number of API calls for very basic steps like reference counting and unboxing tuples for function call dispatch. Every C/C++ ↔ Python call will be affected by this. With the current API, very common constructions likePy_INCREF
/PyTuple_GET_ITEM
/PyTuple_SET_ITEM
can be inlined by the compiler, and it is important that this continues to be possible. There is a more recent question of whether such functions should be implemented as macros or inline functions, and we don’t have strong opinions on that. It’s the prospect of them eventually becoming non-inlineable that seems concerning. -
Related: function calls using the classic CPython
tp_call
API are very tuple/dictionary-heavy, which adds even more overheads to every function call if PEP 620 is realized. There is a new PEP (vector calls) that has the potential to address this, but it appears to be considered an implementation detail (not part of the limited API, no mention in PEP 620) -
pybind11/nanobind accesses
PyTypeObject
internals all over the place. Alternatives construction methods likePyType_FromSpec
lack critical functionality. Even if it was possible to adapt to a fully opaquePyTypeObject
(I am doubtful), somebody would have to sit down for months to figure out how to rearchitect pybind11/nanobind. And that’s just two libraries within a vast ecosystem of CPython extensions.
The introduction of PyPy states
While PyPy is way more efficient than CPython to run pure Python code, it is as efficient or slower than CPython to run C extensions.
One could argue that PEP620 creates a level playing field between interpreters by removing an advantage that native CPython extensions have previously enjoyed (direct access to data structures). In other words, everything will run slower, but it will be consistent. This seems unfortunate given the huge ecosystem of scientific libraries that have been developed for CPython in the last decades.
Generally PEP 620 appears highly aligned with the “limited API”, and our suggestion and request would be that these drastic changes are made under the umbrella of the limited API without shutting the door to CPython internals.
Thanks!