UPDATE: There was a bug in my benchmark. I fixed it and reran the benchmark. Now it’s faster instead of slower for tuple-1000 ![]()
Mark is referring to the issue C API: Add PyTupleWriter API that I just proposed. This API is mostly a replacement for _PyTuple_Resize(): when the input size is not known in advance. For example, look at my PR to see how PySequence_Tuple() becomes simpler with PyTupleWriter.
When the input size is known, there are other existing safe functions: PyTuple_FromArray() (new! I just added it), PyTuple_Pack(), Py_BuildValue(), etc.
I ran a micro-benchmark comparing [tuple] to [writer]:
[tuple]:PyTuple_New()andPyTuple_SetItem().[writer]:PyTupleWriter_Create(),PyTupleWriter_AddSteal()andPyTupleWriter_Finish().
| Benchmark | tuple | writer |
|---|---|---|
| tuple-1 | 37.4 ns | 41.3 ns: 1.10x slower |
| tuple-5 | 65.7 ns | 68.8 ns: 1.05x slower |
| tuple-10 | 99.9 ns | 102 ns: 1.02x slower |
| tuple-100 | 800 ns | 762 ns: 1.05x faster |
| tuple-1000 | 7.68 us | 7.28 us: 1.05x faster |
| Geometric mean | (ref) | 1.01x slower |
tuple-1 is the worst case scenario, measure the overhead of the abstraction): it’s only 3.9 nanoseconds slower.
My implementation calls PyTuple_New() and _PyTuple_Resize() internally, so it’s hard to be faster than these functions.
IMO between 3.7 ns slower and 1.07x slower on a micro-benchmark is an acceptable trade-off for an abstraction and a safer API.
A PyTupleWriter instance is allocated on the heap memory, but there is a free list which reduces the cost of the memory allocation and deallocation. I designed the API to be compatible with the stable ABI in the long term. Hiding the structure members is required for that. I would also prefer to not have a structure of a fixed size, since it would be the implementation more complicated and less flexible (it would be harder to try other optimizations later).
PyTupleWriter uses a small array of 16 items to avoid having to resize small tuples multiple times. It switches to an internal concrete tuple object for 17 items and more. I would prefer to not leak such implementation details in the ABI.
I didn’t measure the PyTupleWriter_AddArray() performance, it should be more efficient since it works on an array.