PyBuffer segfault, but only on linux (C Extension)

I am working on a C extension and have a test failure which i can’t fix. Maybe someone here can see the root cause.

Here is the failing test code:

def test_calculate_crc8():
    # test start_value and first_call
    first_call = True
    crc = e2e.crc.CRC8_INITIAL_VALUE
    data = b"\x33\x22\x55\xAA\xBB\xCC\xDD\xEE\xFF"
    for i, _val in enumerate(data):
        crc = e2e.crc.calculate_crc8(data[i:i+1], start_value=crc, first_call=first_call)
        first_call = False
    assert 0xCB == crc

e2e.crc.calculate_crc8 is implemented in C:

static PyObject *
py_calculate_crc8(PyObject *module,
                  PyObject *args,
                  PyObject *kwargs)
{
    Py_buffer data;
    uint8_t start_value = CRC8_INITIAL_VALUE;
    bool first_call = true;
    static char *kwlist[] = {
        "data",
        "start_value",
        "first_call",
        NULL};

    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|Ip:calculate_crc8",
                                     kwlist, &data, &start_value, &first_call))
    {
        return NULL;
    }

    uint8_t crc = Crc_CalculateCRC8(data.buf, data.len, start_value, first_call);
    PyBuffer_Release(&data);

    return (PyLong_FromUnsignedLong(crc));
}

So the function py_calculate_crc8 is called with a bytes object of length 1. This object is parsed via format y* into the Py_buffer “data”. The segfault occurs when data.buf[0] is accessed. But all tests pass on WIndows and macos.

The full code is available here

start_value (parameter type “I”) should be unsigned int, not uint8_t (that’s too small).

first_call (parameter type “p”) should be int, not bool (I’m not sure of the size of bool, but it could be 1 byte, whereas Python expects an int, so probably too small).

Thank you for taking a look!

I actually tried “b” instead of “I” on the other git branch. I could try changing bool to something else tomorrow. I actually assumed, that PyArg_ParseTupleAndKeywords automatically casts to the correct size.

I also tried incrementing the ref count of “args” but that didn’t help either.

You pass in pointers to variables and use the format argument to tell it what types the arguments are, and, therefore, how big the pointed-to variables are. If you tell it that it’s “I” (unsigned int) but it’s actually uint8_t, don’t be surprised if it writes something beyond the end of the variable!

Another issue I’ve only just realised: I’m not sure about the PyBuffer_Release(&data);. A buffer is being passed in, right? But whose responsibility is it to release that buffer? You or the caller? I’d probably assume it was the caller’s responsibility.

1 Like

You were absolutely correct about the format arguments. That solved the problem. Here is the current version. Thank you for your advice.

Regarding the PyBuffer_Release, the docs say:

However, when a Py_buffer structure gets filled, the underlying buffer is locked so that the caller can subsequently use the buffer even inside a Py_BEGIN_ALLOW_THREADS block without the risk of mutable data being resized or destroyed. As a result, you have to call PyBuffer_Release() after you have finished processing the data (or in any early abort case).