The reason the current C implementation pairwise
is slower than the python recipe seems to be the usage of Py_TuplePack
(as was already hinted at by @Stefan2 in the linked discussion on pairwise and zip).
Replacing Py_TuplePack
(which uses varargs) with a direct Py_TuplePack2
makes the pairwise
C implementation as fast as the pure python recipe. Branch: Comparing python:main...eendebakpt:putuple_pack2 · python/cpython · GitHub
The Py_TuplePack
with arguments 1 and 2 is used several times in the Python codebase, so perhaps we can add Py_TuplePack1
and Py_TuplePack2
to the python API. I will create an issue to discuss that a bit later.
Update: issue is at Add Py_TuplePack2 and Py_TuplePack1 · Issue #118222 · python/cpython · GitHub