Caching instance creation in C extensions

Currently, the CPython maintains small lists (freelists) of pre-allocated objects of some builtin types (e.g. integers or floats) to speedup instance creation. Respectively, the instance destruction function for such types may not call the type’s tp_free function, but instead it puts object to the type’s freelist.

The gmpy2 extension does something similar (see GMPy_MPZ_New, roughly tp_new, and GMPy_MPZ_Dealloc). Speedup is noticeable:

$ python -m timeit -s 'from gmpy2 import mpz; a,b=map(mpz,[1,2])' 'a+b'
2000000 loops, best of 5: 167 nsec per loop
$ git checkout no-cache
$ pip install -e . -q
$ python -m timeit -s 'from gmpy2 import mpz; a,b=map(mpz,[1,2])' 'a+b'
1000000 loops, best of 5: 276 nsec per loop

But I worry that this is not something backed by the C-API documentation. It says about tp_dealloc: “The destructor function is called by the Py_DECREF() and Py_XDECREF() macros when the new reference count is zero. At this point, the instance is still in existence, but there are no references to it. The destructor function should free all references which the instance owns, free all memory buffers owned by the instance (using the freeing function corresponding to the allocation function used to allocate the buffer), and call the type’s tp_free function.

The PyPy, for example, strictly follows to docs and reject (abort() of the interpreter) above example of caching in the gmpy2.

Did I miss something or such optimization is actually not possible, i.e. it may be used only for few builtin CPython types, but not in external C extensions?