How to write data to a C buffer passed in a callback function with ctypes?

mara004 · January 23, 2022, 10:17am

I am trying to set up a callback function for custom file reading access with a C library mapped to Python using ctypes.
The callback basically receives position and number of bytes to read, and an empty string buffer (unsigned char *) that the data shall be written into.

My question is: How can I get the read data (which are Python bytes) into the buffer passed with the callback?

So far, I’ve got the following code:

class _reader_class:
    
    def __init__(self, buffer):
        self.buffer = buffer
    
    def __call__(self, param, position, p_buf, size):
        
        self.buffer.seek(position)
        data = self.buffer.read(size)
        
        print(position, size, data)

        # TODO assign data to p_buf
        
        return 1

(A reader object is initialised using _reader_calss(buffer), which is then wrapped with ctypes.CFUNCTYPE(...).)

I tried to use ctypes.memmove() as suggested in several forums:

c_string = ctypes.create_string_buffer(data)
ctypes.memmove(p_buf, c_string, size)

This appeared to work kind of, but it looked very risky and indeed turned out to cause segfaults.

How can I safely assign data to p_buf?

eryksun · January 23, 2022, 5:01pm

I would need the C declaration of the callback and the ctypes.CFUNCTYPE(...) definition to be certain of anything here.

In general, there’s nothing wrong with using memmove() if the source address, destination address, and size are all correct. One problem I see is that self.buffer.read(size) may read fewer than size bytes. Also, since the type of data must be bytes, you don’t need create_string_buffer(). It’s an unnecessary copy of the data. The call should be ctypes.memmove(p_buf, data, len(data)).

That said, it’s inefficient to read the data as a bytes object just to copy it to the destination buffer. If self.buffer supports the readinto() method, you can avoid the copy by creating a ctypes array that references the destination buffer. For example, if p_buf is the address of the destination buffer:

buf = (ctypes.c_char * size).from_address(p_buf)
self.buffer.seek(position)
return self.buffer.readinto(buf)

I assume the callback is supposed to return the number of bytes read.

mara004 · January 24, 2022, 11:24am

Thanks very much for the response. I’ll try this soon.

The functype definition is

ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(None), ctypes.c_ulong, ctypes.POINTER(ctypes.c_ubyte), ctypes.c_ulong)

The C declaration looks like this:

int(* m_GetBlock )(void *param, unsigned long position, unsigned char *pBuf, unsigned long size)

The callback is not supposed to return the number of bytes read, but only a non-zero exit status for success, according to the documentation.

mara004 · January 24, 2022, 11:40am

Okay, so now I tried the following:

    def __call__(self, param, position, p_buf, size):
        
        self.buffer.seek(position)
        data = self.buffer.read(size)
        assert len(data) == size
        
        print(position, size)
        #print(data)
        
        ctypes.memmove(p_buf, data, len(data))
        
        return 1

This works a bit for small examples, but causes segfaults when running my test suite.

Then I tried this, but it doesn’t work either and shows symptoms like the data never actually arrives on the C side:

    def __call__(self, param, position, p_buf, size):
        
        print(position, size)
        
        buf = (ctypes.c_char * size).from_address( ctypes.addressof(p_buf) )
        self.buffer.seek(position)
        self.buffer.readinto(buf)
        
        return 1

eryksun · January 24, 2022, 2:18pm

That’s unexpected. Maybe there aren’t size bytes available from position to the end of self.buffer. But maybe it’s text data, and it zeroes the buffer beforehand to handle the result as a null-terminated string. Anyway, you’d see a system error in that case due to the added assertion, assuming you’re running a debug build (i.e. __debug__ is true) instead of a release build. (Python calls the latter an ‘optimized’ build. This mixed-up terminology is confusing. Optimization is separate from compiling the debug or release version of a program.)

I don’t see a reason for the segfault due to ctypes.memmove(p_buf, data, len(data)). Enabling the faulthandler module (i.e. -X faulthandler) might help narrow it down, but you’ll probably need to use a native debugger to diagnose the problem, such as gdb in Linux or WinDbg in Windows.

For the case with ctypes.addressof(), I had stipulated that p_buf was the address of the buffer, but I hadn’t seen your ctypes definition yet. Since p_buf is an instance of ctypes.POINTER(ctypes.c_ubyte), the address of the buffer is ctypes.addressof(p_buf.contents). You mistakenly used the address of the pointer itself, ctypes.addressof(p_buf).