Add PyBytesWriter public C API

encukou · March 11, 2025, 11:02am

The PR doesn’t quite explain what the buf argument is. That can change of course, but as I said, I find it rather hard to explain.
I’m not convinced that’s worth the performance hit for loops that write a byte at a time.

pitrou · March 11, 2025, 11:29am

My preference is for passing the length explicitly when desired, not a pointer.

For example:

PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_FinishWithSize(
    PyBytesWriter *writer, Py_ssize_t size);

We’ve had similar APIs for years in Arrow C++ and they’ve served us well.

(note: in most cases you would let the PyBytesWriter track the length by itself)

malemburg · March 11, 2025, 11:51am

I understand that you want to create an API with great writing performance, but also agree with @steve.dower that we should not create a second approach to the writer concept in Python.

Here’s a version of your example using a new PyBytesWriter_WriteFillByte() which mimics memset().

static PyObject *
byteswriter_center_example(Py_ssize_t spaces, char *str, Py_ssize_t str_size)
{
        PyBytesWriter *writer = PyBytesWriter_Create(spaces * 2);

        /* Write left spaces */
        PyBytesWriter_WriteFillByte(writer, ' ', spaces);

        /* Make room for str */
        if (PyBytesWriter_Extend(writer, str_size)) {
            goto error;
        }

        /* Write str */
        PyBytesWriter_WriteBytes(writer, str, str_size);

        /* Write right spaces */
        PyBytesWriter_WriteFillByte(writer, ' ', spaces);
 
        return PyBytesWriter_Finish(writer);

error:
        PyBytesWriter_Discard(writer);
        return NULL;
}

This doesn’t need a buf variable at all and will still be fast.

It may also be useful to have an iterator style interface:

PyBytesWriter_WriteBytesFromIterator(writer, iter)

which takes an iterator and reads all data from the iterator into the bytes object.

A standard Python iterator would work, but may be too inefficient at the C level. Perhaps we could have a simple C level iterator API for such things or just a callback that returns a buffer and a size to copy, setting the buffer to NULL when its done (or using an integer return value for this).

vstinner · March 11, 2025, 12:49pm

My example was the most simple example possible, to make it short and easy to understand. Your proposed API seems to be too specific to be usable in the general case.

In C, passing a callback or an “iterator” is very limited. See the implementation of bytes[start:end:step]:

            source_buf = PyBytes_AS_STRING(self);
            result = PyBytes_FromStringAndSize(NULL, slicelength);
            if (result == NULL)
                return NULL;

            result_buf = PyBytes_AS_STRING(result);
            for (cur = start, i = 0; i < slicelength;
                 cur += step, i++) {
                result_buf[i] = source_buf[cur];
            }

            return result;

In C, there is no easy way to “pass an iterator” or “pass a generator” to replace this for loop.

That’s why my API provides a buf pointer that you can use to write directly into the buffer. It’s the most generic way to support any use case.

malemburg · March 11, 2025, 2:26pm

I’m not sure I follow. The API I was sketching closely follows that of the PyUnicodeWriter in that you provide functions to write various things to the buffer, without actually exposing the buffer.

I am aware that C doesn’t have the concept of an iterator. I was actually talking about passing in a Python iterator and using this to fetch the data, or, as an alternative at the C level, a callback which gets called by the PyBytesWriter_WriteBytesFromIterator(writer, iter) multiple times until it returns a completed state (could set the buffer to NULL or return e.g. integer 1 to indicate that it’s done). State of the C level callback would have to be managed by the caller.

Here’s your bytes example using the suggested alternative API without buf:

            source_buf = PyBytes_AS_STRING(self);
            PyBytesWriter *writer = PyBytesWriter_Create(slicelength);
            if (writer == NULL)
                return NULL;

            /* write self[start:start+slicelength] */
            PyBytesWriter_WriteBytes(writer, source_buf + start, slicelength);

            return PyBytesWriter_Finish(writer);

The use case of having step != 1 is fairly limited, but if you want to capture that as well, we could add e.g. PyBytesWriter_WriteBytesFromSlice(writer, src, start, stop, step).

pitrou · March 11, 2025, 2:33pm

Why do you want to replace it, though? The equivalent code with PyBytesWriter would not be shorter, and it would not be faster either.

malemburg · March 11, 2025, 3:01pm

To make managing state easier for the callback, it’s probably better to add a state pointer to the call:

PyBytesWriter_WriteBytesFromIterator(writer, c_callback, c_state)

The writer would then call c_callback(&buffer, &size, c_state) until buffer gets set to NULL. Internally, it would monitor the available space in the writer and extend this as necessary before copying the next chunk.

Such an API would make it easy to e.g. read from databases, files or streams directly into a writer, avoiding boilerplate code.

OTOH, the already sketched API could also be used directly by the caller, but then extending the writer buffer and error handling would have to be done for each such use case.

Anyway, this is just a side idea.

The main point I wanted to make is that you don’t need to expose the internal buffer in order to have a well performing API - after all, the whole point of these writers is to hide away the implementation details and possible side effects (e.g. the bytes implementation may decide to move the internal buffer somewhere else or resize it between calls).

malemburg · March 12, 2025, 7:53pm

This would be Victor’s API without the *buf parameter and adding the new slice and fill byte padding API:

typedef struct PyBytesWriter PyBytesWriter;

PyAPI_FUNC(void*) PyBytesWriter_Create(
    PyBytesWriter **writer,
    Py_ssize_t alloc);
PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetRemaining(
    PyBytesWriter *writer);
PyAPI_FUNC(void*) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    Py_ssize_t extend);
PyAPI_FUNC(void) PyBytesWriter_Discard(
    PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer);

PyAPI_FUNC(int) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    const void *bytes,
    Py_ssize_t size);
PyAPI_FUNC(int) PyBytesWriter_WriteFillByte(
    PyBytesWriter *writer,
    const void *byte,
    Py_ssize_t size);
PyAPI_FUNC(int) PyBytesWriter_WriteBytesFromSlice(
    PyBytesWriter *writer,
    const void *src,
    Py_ssize_t start,
    Py_ssize_t stop,
    Py_ssize_t step);
PyAPI_FUNC(int) PyBytesWriter_Format(
    PyBytesWriter *writer,
    const char *format,
    ...);

For cases where these are not enough, we could even add special APIs to get access to the current internal char *buf, but with all the strings attached, e.g.

a resize may render the pointer invalid
we’d have to document which of the other APIs may render the pointer invalid (probably most to stay flexible)
buffer overflows are easily possible
parts of the buffer can remain uninitialized

/* Return a pointer to the current position of writer, permitting the
   caller to write size bytes at that position. The writer position will
   advance by size bytes after this call. Calls to other writer APIs
   will render the returned pointer invalid. If the caller fails to
   write new data to the claimed buffer area, the contents will remain
   undefined. Use at your own risk. */
PyAPI_FUNC(void*) PyBytesWriter_WriteBuffer(
    PyBytesWriter *writer,
    Py_ssize_t size);

malemburg · March 15, 2025, 2:05pm

I haven’t received any replies to the proposed alternative API and Victor has closed his PR.

I still think that should have a writer for bytes as well, so would like to get some feedback on the proposal. Thanks.

vstinner · March 15, 2025, 6:45pm

I didn’t give up on the idea. But it takes a lot of time to design and implement such API. I may give a new try later.

vstinner · March 18, 2025, 5:26pm

Ok, here is a counterproposal API.

API

typedef struct PyBytesWriter PyBytesWriter;

PyAPI_FUNC(PyBytesWriter *) PyBytesWriter_Create(
    Py_ssize_t prealloc);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer);
PyAPI_FUNC(void) PyBytesWriter_Discard(
    PyBytesWriter *writer);

// High level methods
PyAPI_FUNC(int) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    const void *bytes,
    Py_ssize_t size);
PyAPI_FUNC(void*) PyBytesWriter_Format(
    PyBytesWriter *writer,
    const char *format,
    ...);

// Alloc/Extend API using void*
PyAPI_FUNC(void*) PyBytesWriter_Alloc(
    PyBytesWriter *writer,
    Py_ssize_t alloc);
PyAPI_FUNC(void*) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    void *buf,
    Py_ssize_t extend);
PyAPI_FUNC(int) PyBytesWriter_Truncate(
    PyBytesWriter *writer,
    void *buf);

Changes: Now only Alloc/Extend/Truncate() functions use void*; Finish() has a single writer parameter.

PyBytesWriter_Extend() extends bytes allocated by PyBytesWriter_Alloc() and returns an updated buffer to the current position. It takes a buffer (void*) argument which is used as the current position.

PyBytesWriter_Truncate() sets the writer size from a buffer (void*). It avoids having to store the buffer start which doesn’t work with PyBytesWriter_Extend(): if you store PyBytesWriter_Alloc() result as the start, PyBytesWriter_Extend() can move the start (call realloc()) making the previous start a dangling pointer (!). So well, use PyBytesWriter_Truncate(), it just works as expected (see examples below).

`Truncate()` example

PyBytesWriter_Truncate() example, creating the string "Hello World". Allocate too many bytes, and then truncate the string:

PyObject*
truncate_example(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(0);
    if (writer == NULL) {
        return NULL;
    }

    // Allocate too many bytes
    char *buf = PyBytesWriter_Alloc(writer, 20);
    if (buf == NULL) {
        goto error;
    }

    // Write some bytes
    memcpy(buf, "Hello World", strlen("Hello World"));
    buf += strlen("Hello World");

    // Truncate to len("Hello World") bytes
    if (PyBytesWriter_Truncate(writer, buf) < 0) {
        goto error;
    }
    return PyBytesWriter_Finish(writer);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

`Extend()` example

PyBytesWriter_Extend() example, creating the “Hello World” string. PyBytesWriter_Extend() adds bytes to PyBytesWriter_Alloc(): it takes a void* buffer.

PyObject*
extend_example(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(0);
    if (writer == NULL) {
        return NULL;
    }

    // Allocate 10 bytes
    char *buf = PyBytesWriter_Alloc(writer, 10);
    if (buf == NULL) {
        goto error;
    }

    // Write some bytes
    memcpy(buf, "Hello ", strlen("Hello "));
    buf += strlen("Hello ");

    // Allocate 10 more bytes
    buf = PyBytesWriter_Extend(writer, buf, 10);
    if (buf == NULL) {
        goto error;
    }

    // Write more bytes
    memcpy(buf, "World", strlen("World"));
    buf += strlen("World");

    // Truncate to len("Hello World") bytes
    if (PyBytesWriter_Truncate(writer, buf) < 0) {
        goto error;
    }

    return PyBytesWriter_Finish(writer);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

`Alloc()` example

When the output length is known in advance, the code can be simpler. Example creating the string "abc":

PyObject*
alloc_example(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(3);
    if (writer == NULL) {
        return NULL;
    }
    char *buf = PyBytesWriter_Alloc(writer, 3);
    if (buf == NULL) {
        PyBytesWriter_Discard(writer);
        return NULL;
    }
    memcpy(buf, "abc", 3);
    return PyBytesWriter_Finish(writer);
}

pitrou · March 18, 2025, 5:45pm

It looks as confusing and weird as the previous proposal. I don’t understand why you insist so much on passing pointers instead of sizes.

MRAB · March 18, 2025, 6:07pm

What happens if PyBytesWriter_Finish is unable to create the bytes instance?

Presumably it would return NULL, but would the PyBytesWriter instance still exist, or would it always be discarded?

MRAB · March 18, 2025, 6:21pm

In the Extend() example, PyBytesWriter_Alloc returns a pointer to the buffer.

You can write data into the buffer, advancing the pointer as you go.

You pass the final pointer into PyBytesWriter_Truncate, telling it “this is how far I’ve got in the buffer”.

You don’t have to keep track yourself how much you’ve written or remember where you started in order to calculate that.

pitrou · March 18, 2025, 6:36pm

If it’s important “not to keep track” of those things, then PyBytesWriter can expose API functions to query them. They can be implemented as inline functions if you’re worried about performance.

PyByteArray_Resize doesn’t return a pointer to the newly-allocated buffer. Nobody is complaining that they have to “keep track” of the length and pointer, because you can just call the corresponding functions/macros for that (such as PyByteArray_AsString).

vstinner · March 19, 2025, 9:52am

The alternative to Truncate() is to have a SetSize() function. For example, replace:

    if (PyBytesWriter_Truncate(writer, ascii_data) < 0) {
        goto error;
    }

with:

    if (PyBytesWriter_SetSize(writer, ascii_data - ascii_start) < 0) {
        goto error;
    }

In the simple Alloc() case, ascii_start can be created like that:

    unsigned char *ascii_start = PyBytesWriter_Alloc(writer, out_len);
    if (ascii_start == NULL) {
        goto error;
    }
    unsigned char *ascii_data = ascii_start;

The problem is more the more complex Extend() case. Extend() doesn’t return the start pointer, but the current position. It means that a new function should be added. Something like:

    char *start = PyBytesWriter_Start(writer);
    if (PyBytesWriter_SetSize(writer, ascii_data - start) < 0) {
        goto error;
    }

All code creating bytes objects in the Python code base use a pointer to write to the current position. PyBytesWriter API is designed around this pointer.

For me, it seems more convenient, easy and less error-prone to have a Truncate() function which takes a pointer to the current position.

In practice, code using the PyBytesWriter API is splitted into sub-functions. It’s convenient to only have to pass a pointer to track the current position, without keeping track of the written size.

So you would prefer a PyBytesWriter_Start() + PyBytesWriter_SetSize() combination? Well, I don’t like that

Note: I would prefer to hide implementation details to be able to maybe add the API to the stable ABI later. So I would prefer to not use inline functions.

vstinner · March 19, 2025, 9:54am

On error, PyBytesWriter_Finish() returns NULL. In any case, you must not use the writer after calling PyBytesWriter_Finish(): it’s always discarded. Documentation:

.. c:function:: PyObject* PyBytesWriter_Finish(PyBytesWriter *writer, void *buf)

   Finish a :c:type:`PyBytesWriter` created by :c:func:`PyBytesWriter_Create`.

   On success, return a Python :class:`bytes` object.
   On error, set an exception and return ``NULL``.

   The writer instance is invalid after the call.

pitrou · March 19, 2025, 10:15am

No, you just need a FinishWithSize function.

Well, that’s your API choice.

malemburg · March 19, 2025, 12:35pm

Thanks for the updated proposal, but I still don’t understand why you insist on exposing the underlying buffer.

Just like for the Unicode writer, it should be enough to provide fast APIs for writing to the internal buffer and we can add more if needed. There should be no need to expose the internal buffer directly.

That way, we can make changes to the internals as needed in the future, e.g. a bytes implementation which applies extra locking for the buffer, to be able to write to it from multiple threads (or even interpreters).

The PyBytesWriter_WriteBuffer() I have in my proposal really is “use at your own risk” and I would rather not have it at all.

You might say: In some cases, I may need to revert a write and go back to a previous position of the writer. We could add special APIs to cover that use case as well, e.g. PyBytesWriter_Tell()and PyBytesWriter_Seek(new_position) (or perhaps just a dedicated PyBytesWriter_MoveBack(number_of_bytes)).

steve.dower · March 19, 2025, 1:51pm

I quite like having access to the underlying buffer, but I tend to wrap up a lot of OS APIs that require a call to get the buffer size, which I then allocate and pass back into a subsequent call to have it be filled.

I’ve actually taken to doing these with C++ strings (or bytearray if I’m in Cython) because it’s so easy and reliable compared to handling the allocation manually. But since they then usually get moved into a bytes object, being able to use this API to allocate an in-place buffer for me would be convenient as well as efficient.

I believe the point of this API is to optimise the creation case, when you know that no references exist outside of the local function/context/whatever. Just like was the case for the Unicode writer. If that’s not explicit, I think we should say that it’s explicit and just scope ourselves out of multithreaded writers for this API.

Add PyBytesWriter public C API

API

Truncate() example

Extend() example

Alloc() example

`Truncate()` example

`Extend()` example

`Alloc()` example