C API for iterable unpacking

rtobar · December 24, 2024, 6:03pm

Hi everyone,

I just found myself writing some code in a C extension to mimic the behaviour of simple target list assignments, e.g. a, b = c, which should work for any iterable c. From what I could gather after perusing the relevant C API sections, I couldn’t find anything like this (I would have expected it to be under the Iterator Protocol section).

What I have the following:

int my_unpack(PyObject *o, Py_ssize_t expected, ...)
{
        va_list args;
        va_start(args, expected);
        PyObject *iter = PyObject_GetIter(o);
        if (!iter) {
                PyErr_Format(PyExc_TypeError, "cannot unpack non-iterable %s object", Py_TYPE(o)->tp_name);
                return -1;
        }
        Py_ssize_t count = 0;
        for (PyObject *o; (o = PyIter_Next(iter)); count++) {
                if (count >= expected) {
                        continue;
                }
                PyObject **target = va_arg(args, PyObject **);
                *target = o;
        }
        Py_DECREF(iter);
        if (PyErr_Occurred()) {
                return -1;
        }
        if (count > expected) {
                PyErr_Format(PyExc_ValueError, "too many values to unpack (excepted %d, got %zd)", expected, count);
                return -1;
        }
        else if (count < expected) {
                PyErr_Format(PyExc_ValueError, "not enough values to unpack (excepted %d, got %zd)", expected, count);
                return -1;
        }
        return 0;
}

My two questions are:

Is there really no such C API already?
If not, would it be considered a good idea to add one?

blhsing · December 27, 2024, 7:42am

For simple iterable unpacking like a, b = iterable where the number of items to unpack is known, the iterable is usually a sequence, so the status quo is to make a call to PySequence_GetItem/PyTuple_GetItem/PyList_GetItem for each item with a constant index, which is also generally more performant than using the iterator protocol.

Besides, I can’t think of a good real-world use case of an iterable that doesn’t implement the sequence protocol but somehow yields a known number of items myself, so you should probably list the cases you know to justify such an API.

rtobar · December 27, 2024, 1:53pm

My use case is that I’m implementing a generator-like object in C where you can send data to it (e.g., generator.send(('a', 'b', 'c'))). I also have a python version of the same generator that does:

path, event, value = (yield)

In the python version there are no constraints on what can be sent by users that can be received by this code, as long what is sent is an iterable, and yields 3 values (i.e., what the language allows in these kind of expressions). I’m trying to provide the same flexibility for those using the C extension counterpart (the python and C ext versions are in principle fully interchangeable). Since target list assignments are fairly common in Python code, I suspected that there would be something that I could reuse to mimic it, but I found nothing. Hence, I ended up with my code above. How then I’m using it is something like:

PyObject *a;
PyObject *b;
PyObject *c;
if (!my_unpack(value, 3, &a, &b, &c)) {
   /* do stuff with a, b and c */
}

I’m sure the my_unpack in the original post can be improved in terms of performance (check for lists/tuples/etc and shortcut/specialise on those). I also haven’t dug (but I meant to) to see how CPython implements this, since whatever it does is pretty much what I’d like to have myself.

…goes and actually reads cpython…

The relevant code is defined in ceval.c, and is used by the UNPACK_SEQUENCE and UNPACK_EX bytecodes. It uses PyIter_Next to iterate the value being unpacked. There are specialisations however (done at runtime, if my understanding is correct about how these work) that consider unpacking a tuple into to values, and things like that, where more specific APIs like PyList_* and PyTuple are used. The function in ceval also receives a Python ** argument to refer to all the targets rather than the variadic arguments I’m using – I might chance this on my side.

ncoghlan · December 28, 2024, 12:13am

If it’s only three items, always creating a tuple will be pretty quick.

PyArg_UnpackTuple is the closest thing the C API offers to the kind of unpacking operation you’re looking for: Parsing arguments and building values — Python 3.13.1 documentation

The more generic form to handle arbitrary iterables isn’t there because multi value arguments, return values, and iteration values all use tuples.

rtobar · December 30, 2024, 12:23pm

True, that’s nice, thanks for the pointer! So your idea would be that I turn the send() input value into a tuple (via PySequence_Tuple, weirdly named (?) since it takes any iterable as an input), then unpack with PyArg_UnpackTuple. I tried this, however it raises different exception types than those raised by the UNPACK_SEQUENCE bytecode (TypeError v/s ValueError). The messages are different too. I’m not sure whether this misalignment is an oversight or a design choice, but either way I’m really trying to mimic the built-in experience, so I’ll give this a pass.

In any case, it seems like there’s not really enough appetite for a new API that mimics this particular behaviour. I do understand that it can be a niche use case, so I’m happy to let this sleep. Thanks everyone for the discussion though