Add PyBytesWriter public C API

So this (having other APIs write into the allocated buffer) would be use case where PyBytesWriter_WriteBuffer() would make sense. Fair enough.

I still don’t think that we should make this the default way of accessing the internal bytes buffer and rather provide a safe writer interface. PyBytesWriter_WriteBuffer() could then later also be changed to provide access to a separate buffer which then gets copied into the actual buffer, if we ever want to make changes to the bytes internals.

We have the Unicode writer API, because of the complexities around the internals of the implementation (having three different internal ways of storing the data).

We don’t have such complications with the bytes type at the moment, but we may want to still have more control over the internals to implement optimizations or protections in the future, or where we want to be able to use the writer API for bytes subclasses implementing these.

FWIW: I don’t see much point in having a writer API for bytes at all, if we decide that direct writing to the internal buffer is still deemed the best way to fill a bytes object during its creation. We can just continue the approach we’ve been using for 30+ years without new APIs - perhaps make the resize API public and that’s it.

I get your overall objection, and largely agree with it. But I’ll try and clarify one point:

It’s really a design principle that we’re trying to enforce via the API, which is that a bytes object simply has no way to mutate it, even from C. (There are a few other places we’re looking at similar things, like frozen types, ints, and Unicode, as well as arguably in thread states and initialization.)

So this API is really about providing a “bytes object that isn’t a real bytes object yet” so that it can’t escape, but it can be mutated. Then when you finish it, it can’t be mutated anymore, but it can escape.

We can achieve the same thing by saying “even though we let you, you shouldn’t mutate a bytes object if it might be accessible by someone else”, but that’s hard to enforce or detect, and may interfere with things like automatic garbage collection. So adding an additional explicit step where you convert it from “setting it up” into “ready for normal use” is to make it clear to the developer what it can be used for at any point.

I think from that perspective, it makes sense, and we ought to do as little hand-holding as possible on the writer side. Frankly, I’d settle here for an API that basically looks like malloc followed by writing followed by PyBytesWriter_Finish (made efficient because the “malloc”-like function allocates an entire bytes object that’ll be ready to use, and Finish returns the real pointer to it instead of the pointer to the data).

I think we should continue to use the principle that we say (and document) what C API users can do and then leave it at that. It’s worked fine for 30+ years. I don’t see much reason to change it.

C programmers can still get the internal buffer from PyBytes_AsString() and write to it if they want to (with all the strings attached) - even with a writer API. There isn’t much point in adding layers of protection when it’s easy to work around them.

AFAIK, the motivation for a bytes writer API is to not have to make the often used resize API public. But really, with the above in mind, do we need this, if it doesn’t provide some additional benefits in form of possible future optimizations ?

We might just as well make _PyBytes_Resize() public and accept the fact that this is how things have been done in the past three decades.

FWIW, I don’t see a problem with unfinished bytes object “escaping” into Python. Programmers won’t let it escape prior to making sure that the content is actually what they want it to be.

I’d also like to learn what finalization would actually do to a bytes object. After all, if the object has been created, but not yet passed back into the Python interpreter, the only part which may already know about it is the garbage collector. Now, collecting uninitialized bytes objects would still work, since the GC doesn’t really care about the contents of the object.

And the resize API is smart enough to deal with the GC (and the ref tracer and ref counter), because it internally properly destroys the old object and builds a new one, so no issue there either.

The existing bytes writer finish API basically builds the bytes object and does a final resize. There are no other operations happening beyond this, so my guess is that finalization is not needed for bytes objects to make them usable outside the creating function.

Pretty sure the GC wouldn’t ever learn about bytes objects, because they can’t hold references to other objects, but I could be wrong there.

In any case, finalization right now probably doesn’t do much more than a typecast (return (PyBytesObject*)(writer - offsetof(PyBytesObject.buf))[1]). As I tried to explain, it’s about the expression of what it’s doing, not the reality.

The explicit finalization step could let us do interning, as it invalidates the incoming pointer and returns a “new” one, which isn’t possible with the current API.

It can also completely bypass most locks that are needed for a “real” object, since we know by design that the writer isn’t being used anywhere else.

Basically, it’s just an overall more flexible design for us, that allows us to change more about the implementation. Including things like making PyBytes_AsString one day return a const pointer, which would then allow things like referencing static memory as the contents of bytes objects.

And yes, people can break anything. Given the choice between guardrails/warnings and recommended paths, we’re actively pushing towards providing good APIs that lead users into good practices. The “obvious” use of the APIs should land users in the most reliable code, whereas most of the existing API does not work like that. New users approaching it are at a huge disadvantage until they learn all the edge cases and oddities, but the new APIs are way more predictable in their behaviour (as well as preserving our ability to innovate - both aspects are important, and complex to manage).


  1. I made up some names here, hopefully it makes the point. ↩︎

That sounds like a reasonable goal, but I’m missing consistency in the general approach:

On one hand some core devs complain that the C API is too large and try to trim it down. The argument often being maintenance, making it easier for other Python implementations to emulate, or having a better perspective for changing things.

On the other hand, we are creating many new APIs which don’t have much additional value (for neither us, nor the extension writers), other than promoting good practice. No one asked for these APIs. The size of the C API increases, but apparently this is still fine.

Yet, when other APIs are really needed by extension writers and they often find the functionality in private APIs, there’s quite some pushback to make those public and extend the C API.

I have always been a proponent of a rich Python C API. It’s one of the reasons why the Unicode C API was so complete.

I don’t have anything against adding new APIs to promote best practices and if people think that exposing buffers during bytes creation is good practice, that’s fine with me as well, but then please also make it easier to give extension writers access to many of the private APIs we have. [1]

After all, we created those private APIs for a reason in CPython: because they are indeed needed for implementing the stdlib. That should be proof enough that those APIs should be part of the public Python C API.

This would make things consistent again.

Anyway, I’ve made my point, and thanks for reading this far :slight_smile: I’ll step out of this discussion.


  1. The _PyBytes_Resize() (or in Python2 string) resize C API is one of those APIs which should have been public for ages. ↩︎

The problem is that the private APIs typically leak enough details that they constrain our ability to change anything. And that’s usually why they were private in the first place. Making them public actively works against our ability to innovate the runtime.

The new APIs do increase the overall surface area, but they usually don’t increase the abstraction leakage (e.g. we were very careful with the zero-copy int export API to preserve our ability to change the underlying structure, whereas most private APIs would prevent this entirely if we were to guarantee their behaviour).

So what looks like conflict between two approaches is justified when the additional argument is considered.

It’s also helpful to look at this as something like a 10-15 year transition. We’d have loved to do it quicker, but short of a massively breaking release with no/limited opportunity to learn from the changes (sound familiar?), it’s really not possible or fair to do it.

So counting additions against very long deprecations, we ought to end up with only a slightly larger overall API, and only because we are adding access to functionality that’s currently not in the public API (as requested).

I agree with @vstinner and @steve.dower that access to the underlying buffer is important in at least some cases.

Again, I would bring up our equivalent API in Arrow C++, which also gives access to the underlying buffer for whoever needs it (but also provides append-like methods for people who do not need such flexibility). Such a hybrid API is important in practice, for example when dealing with third-party APIs that want to write into an existing memory area.

Arrow C++ deals quite heavily with large data buffers. This API is getting a lot of use and has served us well for efficiently building up data. Of course, CPython is not Arrow but for PyBytesWriter we’re talking about very similar concerns.

My disagreement here with @vstinner is only about the particular API details, not the general idea of exposing the underlying buffer.

Would you mind to propose a C API which would fit your needs and update (some of) my examples with your proposed C API? I don’t see how the C++ Arrow API converted in C would more convenient.

I created a new draft PR with my latest proposed C API (Alloc/Extend). Please look at modified code to see how using a pointer for the current position and to write fits well existing code and is efficient.

Ok. I think that snippets like:

    PyBytesWriter *writer = PyBytesWriter_Create(soself->s_size);
    if (writer == NULL) {
        return NULL;
    }
    char *buf = PyBytesWriter_Alloc(writer, soself->s_size);
    if (buf == NULL) {
        PyBytesWriter_Discard(writer);
        return NULL;
    }

can easily be replaced with:

    PyBytesWriter *writer = PyBytesWriter_Create(soself->s_size);
    if (writer == NULL) {
        return NULL;
    }
    char *buf = PyBytesWriter_GetData(writer);

Similarly, this snippet:

    if (PyBytesWriter_Truncate(writer, bin_data) < 0) {
        goto error_end;
    }
    return PyBytesWriter_Finish(writer);

can easily be replaced with:

    Py_ssize_t final_size = bin_data - PyBytesWriter_GetData(writer);
    return PyBytesWriter_FinishWithSize(writer, final_size);

or even perhaps:

    return PyBytesWriter_FinishWithEndPointer(writer, bin_data);

I also think this is confusing:

    p = PyBytesWriter_Extend(writer, p, 10-1);

for two reasons:

  1. “extend” makes me think of list.extend
  2. you’re passing 10-1 which is the additional total size, not the number of bytes that will be available after p: we actually want 10 bytes after p.

(that said, raw_unicode_escape should perhaps be rewritten to first compute the exact required size, then write directly into the final bytes object)

1 Like

Hi Antoine,

I tried to implement the API that you described: draft PR using size. In this API, int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t alloc) takes an “absolute” size, rather than a “relative” size.

The simple case is when we allocate too many bytes, and then truncate the buffer in Finish(). binascii falls into this category for example. This case doesn’t cause any API issue, the code is clean.

The more complex case is when we allocate some bytes, and then depending on the input, we allocate more data while processing the input. pickle and PyBytes_FromFormatV() are examples of the complex case.

Example from pickle:

if (ch >= 0x10000) {
    /* -1: subtract 1 preallocated byte */
    alloc += 10-1;
    Py_ssize_t pos = p - (char*)PyBytesWriter_Data(writer);
    if (PyBytesWriter_SetSize(writer, pos) < 0) {
        goto error;
    }
    if (PyBytesWriter_Resize(writer, alloc) < 0) {
        goto error;
    }
    p = PyBytesWriter_Data(writer) + pos;

    *p++ = '\\';
    *p++ = 'U';
    *p++ = Py_hexdigits[(ch >> 28) & 0xf];
    *p++ = Py_hexdigits[(ch >> 24) & 0xf];
    *p++ = Py_hexdigits[(ch >> 20) & 0xf];
    *p++ = Py_hexdigits[(ch >> 16) & 0xf];
    *p++ = Py_hexdigits[(ch >> 12) & 0xf];
    *p++ = Py_hexdigits[(ch >> 8) & 0xf];
    *p++ = Py_hexdigits[(ch >> 4) & 0xf];
    *p++ = Py_hexdigits[ch & 15];
}

It takes 9 lines of code to allocate more bytes with this API. I’m unhappy about this code. It’s not convenient to have to call SetSize() and Resize() (SetSize() name is confusing as well). I would prefer a single function call to do all these steps. Something like:

/* -1: subtract 1 preallocated byte */
alloc += 10-1;
p = PyBytesWriter_ResizeEndBuffer(writer, p, alloc)
if (p == NULL) {
    goto error;
}

Note: PyBytesWriter_SetSize() call is needed to notify the writer that size bytes have been written, and so the following PyBytesWriter_Resize() call must copy these bytes when the buffer is resized (moved in memory). For example, Resize() has to copy bytes from the “small buffer” to the bytes object when a bytes object is created when size becomes larger than 256 bytes.

That sounds a bit unexpected. The typical way something like PyBytesWriter_Resize is implemented is to call realloc, which will automatically copy the entire contents regardless of the logical size.

Oh you’re right. I modified my PR to simply remove PyBytesWriter_SetSize() function.

The pickle code becomes:

            alloc += 10-1;
            Py_ssize_t pos = p - (char*)PyBytesWriter_GetData(writer);
            if (PyBytesWriter_Resize(writer, alloc) < 0) {
                goto error;
            }
            p = (char*)PyBytesWriter_GetData(writer) + pos;

Ok, I think you could also add a PyBytesWriter_ResizeAndUpdatePointer function for this use case, and then:

The slightly complicated function name would make it clear that it packs two operations in one.

As a side note, the fact that “resize” takes an increment rather than an absolute size is a deviation from common semantics for this kind of function, so perhaps another name should be found… “grow” perhaps? That would give PyBytesWriter_Grow and PyBytesWriter_GrowAndUpdatePointer.

PyBytesWriter_Resize() size is an absolute size, not an increment.

Ah, my bad! I had misread your code example.

Here is the 3rd version of PyBytesWriter, using size rather than pointers. Is it easier to understand and to use? What do you think?

API

Create, Finish, Discard

PyAPI_FUNC(PyBytesWriter *) PyBytesWriter_Create(
    Py_ssize_t size);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_FinishWithSize(
    PyBytesWriter *writer,
    Py_ssize_t size);
PyAPI_FUNC(PyObject*) PyBytesWriter_FinishWithPointer(
    PyBytesWriter *writer,
    void *data);
PyAPI_FUNC(void) PyBytesWriter_Discard(
    PyBytesWriter *writer);

High-level API

PyAPI_FUNC(int) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    const void *bytes,
    Py_ssize_t size);
PyAPI_FUNC(int) PyBytesWriter_Format(
    PyBytesWriter *writer,
    const char *format,
    ...);

Getters

PyAPI_FUNC(void*) PyBytesWriter_GetData(
    PyBytesWriter *writer);
PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetSize(
    PyBytesWriter *writer);

Low-level API

PyAPI_FUNC(int) PyBytesWriter_Resize(
    PyBytesWriter *writer,
    Py_ssize_t size);  // absolute size
PyAPI_FUNC(int) PyBytesWriter_Grow(
    PyBytesWriter *writer,
    Py_ssize_t size);  // relative size
PyAPI_FUNC(void*) PyBytesWriter_GrowAndUpdatePointer(
    PyBytesWriter *writer,
    Py_ssize_t size,
    void *data);

Examples

Create the bytes string “abc”

PyObject* create_abc(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(3);
    if (writer == NULL) return NULL;

    char *str = PyBytesWriter_GetData(writer);
    memcpy(str, "abc", 3);

    return PyBytesWriter_Finish(writer);
}

GrowAndUpdatePointer() example

Example using a pointer to write bytes and to track the written size.

Create the string "Hello World":

PyObject* grow_example(void)
{
    // Allocate 10 bytes
    PyBytesWriter *writer = PyBytesWriter_Create(10);
    if (writer == NULL) {
        return NULL;
    }
    char *buf = PyBytesWriter_GetData(writer);

    // Write some bytes
    memcpy(buf, "Hello ", strlen("Hello "));
    buf += strlen("Hello ");

    // Allocate 10 more bytes
    buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf);
    if (buf == NULL) {
        PyBytesWriter_Discard(writer);
        return NULL;
    }

    // Write more bytes
    memcpy(buf, "World", strlen("World"));
    buf += strlen("World");

    // Truncate the string at 'buf' position and create a bytes object
    return PyBytesWriter_FinishWithPointer(writer, buf);
}
1 Like

Yes, this is looking good! Thanks a lot.

I wrote PEP 782: Add PyBytesWriter C API. Please continue the discussion on the PEP.

2 Likes