We have or are getting “writer” APIs for incrementally creating immutable objects like string, bytes and tuples.
I am concerned that these APIs are inefficient and will thus be ignored by C extension authors and we will make no progress in moving away from old safe APIs, but just accumulate new APIs to maintain. If we are to expect users to move to the new APIs they must be close to the old API in efficiency and, ideally, easy to port to.
Taking tuples as an example:
Currently to create a tuple of length 2 from a source of objects, the code looks something like this:
PyObject *t = PyTuple_New(3);
// Handle error case
PyObject *o0 = get_object(0);
// Handle error case
PyTuple_SET_ITEM(t, o0, 0);
PyObject *o1 = get_object(1);
// Handle error case
PyTuple_SET_ITEM(t, o1, 1);
with PyTupleWriter
this becomes:
PyTupleWriter *w = PyTupleWriter_Create(3);
// Handle error case
PyObject *o0 = get_object(0);
// Handle error case
int err = PyTupleWriter_Add(w, o0);
Py_DECREF(o0);
// Handle error case
PyObject *o1 = get_object(1);
// Handle error case
int err = PyTupleWriter_Add(w, o1);
Py_DECREF(o1);
// Handle error case
PyObject *t = PyTupleWriter_Finish(w);
// Handle error case
Not only are there many more calls, there is more error handling, and an additional heap allocation for the writer.
We could make it more efficient by not requiring heap allocation of the writer, stealing the reference to the object (only relevant for PyTupleWriter
, the bytes and str versions do no have this issue) and guaranteeing success if the writer is initialized to the exact size needed.
PyTupleWriter w;
PyTupleWriter_Init(&w, 2);
// Handle error case
PyObject *o0 = get_object(0);
// Handle error case
PyTupleWriter_AddSteal(&w, o0);
PyObject *o1 = get_object(1);
// Handle error case
PyTupleWriter_AddSteal(&w, o1);
PyObject *t = PyTupleWriter_Finish(&w);
It is still more code than the original, but it is close enough that we can reasonably expect authors to move to the new API, and that tooling to do it automatically can be easily made.