PyBuffer segfault, but only on linux (C Extension)

zariiii9003 · September 21, 2022, 10:08pm

I am working on a C extension and have a test failure which i can’t fix. Maybe someone here can see the root cause.

Here is the failing test code:

def test_calculate_crc8():
    # test start_value and first_call
    first_call = True
    crc = e2e.crc.CRC8_INITIAL_VALUE
    data = b"\x33\x22\x55\xAA\xBB\xCC\xDD\xEE\xFF"
    for i, _val in enumerate(data):
        crc = e2e.crc.calculate_crc8(data[i:i+1], start_value=crc, first_call=first_call)
        first_call = False
    assert 0xCB == crc

e2e.crc.calculate_crc8 is implemented in C:

static PyObject *
py_calculate_crc8(PyObject *module,
                  PyObject *args,
                  PyObject *kwargs)
{
    Py_buffer data;
    uint8_t start_value = CRC8_INITIAL_VALUE;
    bool first_call = true;
    static char *kwlist[] = {
        "data",
        "start_value",
        "first_call",
        NULL};

    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|Ip:calculate_crc8",
                                     kwlist, &data, &start_value, &first_call))
    {
        return NULL;
    }

    uint8_t crc = Crc_CalculateCRC8(data.buf, data.len, start_value, first_call);
    PyBuffer_Release(&data);

    return (PyLong_FromUnsignedLong(crc));
}

So the function py_calculate_crc8 is called with a bytes object of length 1. This object is parsed via format y* into the Py_buffer “data”. The segfault occurs when data.buf[0] is accessed. But all tests pass on WIndows and macos.

The full code is available here

MRAB · September 21, 2022, 10:22pm

start_value (parameter type “I”) should be unsigned int, not uint8_t (that’s too small).

first_call (parameter type “p”) should be int, not bool (I’m not sure of the size of bool, but it could be 1 byte, whereas Python expects an int, so probably too small).

zariiii9003 · September 21, 2022, 10:46pm

Thank you for taking a look!

I actually tried “b” instead of “I” on the other git branch. I could try changing bool to something else tomorrow. I actually assumed, that PyArg_ParseTupleAndKeywords automatically casts to the correct size.

I also tried incrementing the ref count of “args” but that didn’t help either.

MRAB · September 22, 2022, 2:36am

You pass in pointers to variables and use the format argument to tell it what types the arguments are, and, therefore, how big the pointed-to variables are. If you tell it that it’s “I” (unsigned int) but it’s actually uint8_t, don’t be surprised if it writes something beyond the end of the variable!

Another issue I’ve only just realised: I’m not sure about the PyBuffer_Release(&data);. A buffer is being passed in, right? But whose responsibility is it to release that buffer? You or the caller? I’d probably assume it was the caller’s responsibility.

zariiii9003 · September 22, 2022, 4:26pm

You were absolutely correct about the format arguments. That solved the problem. Here is the current version. Thank you for your advice.

Regarding the PyBuffer_Release, the docs say:

However, when a Py_buffer structure gets filled, the underlying buffer is locked so that the caller can subsequently use the buffer even inside a Py_BEGIN_ALLOW_THREADS block without the risk of mutable data being resized or destroyed. As a result, you have to call PyBuffer_Release() after you have finished processing the data (or in any early abort case).