I have been working on a draft of a PEP to update the buffer protocol based on the suggestions from this thread: Introspection and "mutable XOR shared" semantics for PyBuffer
@alex_Gaynor I was wondering if this draft sufficiently addresses the issues that you originally outlined in your blog post: Buffers on the edge: Python and Rust · Alex Gaynor
I’m also interested in any other feedback that folks might have about what to include in the PEP or any adjustments that we should make.
Unfortunately I am only able to include 2 links in this post due to being a new member, so I’ll include the links in replies to this post.
I’m including the draft inline with this post; let me know if there’s a more preferable way to share:
Preamble
number: tbd
title: Safer mutability semantics for the buffer protocol
author: Samuel Eisenhandler
sponsor: Petr Viktorin
Abstract
This document proposes the introduction of new mutability semantics to the buffer protocol.
Objects that support the new semantics export two “kinds” of buffers:
-
immutable buffers: these come with a promise that the underlying bytes will never change while the buffer is held
-
mutable buffers: these come with a promise that only the holder of the mutable buffer will be able to write to or read from the underlying bytes while the buffer is held
Motivation
The existing semantics for mutability in the buffer protocol make it easy to accidentally cause undefined behavior. For example, if multiple extension modules each have mutable buffers exported from the same object and one of those extension modules drops the GIL, data races are bound to occur.
Several people have expressed that the existing semantics can encourage C-level undefined behavior and some have proposed solutions:
- Alex Gaynor (link in reply) describes the challenges associated with the buffer protocol in the presence of concurrency and proposes a high-level solution having the buffer protocol implement Rust-like mutable XOR shared semantics.
- Petr Viktorin (link in reply) responds to Alex’s post and proposes a sketch for a solution that enables callers to request immutable or exclusive buffers.
- LWN discussion (link in reply) surrounding “wrangling the Python C API” and the ergonomics of Rust bindings.
Rust is becoming an increasingly popular language for writing Python extension modules and imposes significant constraints on mutable aliasing. While an immutable reference (“shared borrow”) exists, nothing should be able to mutate the referent. And while a mutable reference (“exclusive borrow”) exists, nothing else should be able to mutate or even view the referent. As a result, both interacting with and implementing the buffer protocol from Python extension modules written in Rust is not ergonomic. Enabling efficient and ergonomic ways for Python extension modules written in Rust to handle buffers becomes more important as Rust becomes a more popular option for writing extension modules. Additionally, moving toward Rust-like semantics for the buffer protocol also makes it easier to avoid undefined behavior, regardless of the language used to implement the extension module.
The existing flags for the buffer protocol enable callers to request a WRITABLE
buffer and if the object does not support exporting writable buffers, the call will fail. However, even when a caller does not request a writable buffer, they may receive one. The new semantics are much stricter about which callers receive writable buffers which should make it much easier to reason about the presence of data races and/or mutable aliasing across multiple Python extension modules.
Specification
We propose to add two new bit flags to the buffer interface:
-
PyBUF_IMMUTABLE
: indicates that if thePyObject_GetBuffer
call succeeds, nothing can change the memory while the buffer is held -
PyBUF_EXCLUSIVE
: indicates that if thePyObject_GetBuffer
call succeeds, only this consumer can read or write to the memory while the buffer is held
Additionally, we will add a bit field to PyBufferProcs
to enable implementers of the buffer protocol to indicate which semantics they support:
typedef struct {
getbufferproc bf_getbuffer;
releasebufferproc bf_releasebuffer;
uint32_t potential_pybuf_flags;
} PyBufferProcs;
Because the new semantics require a promise from the exporter, we cannot trivially support them for all objects that currently support the buffer protocol. For backwards compatibility, callers should check if the new flags are supported before including them in a PyObject_GetBuffer
call. We propose to add support for the PyBUF_IMMUTABLE
flag to the builtin bytes
type (which already only supports exporting read-only buffers).
Out of scope
We do not plan to immediately add support for the new flags to the builtin bytearray
type. It is not entirely clear what the semantics of bytearray
should be when an extension module holds an exclusive buffer.
However, here is a naiive sketch of how we might implement the new semantics for bytearray
:
At a given point in time, a bytearray
instance may be in one of several different states:
- unexported: python may view and mutate the underlying bytes arbitrarily, immutable exclusive or “classic” exports may be made
- immutably exported: python may only view the underlying bytes, only immutable exports may be made
- exclusively exported: python cannot view or mutate the underlying bytes, cannot be exported again (until the existing mutable export is released)
- classic exported: python may view or mutate underlying bytes, we can only make other classic exports
Backwards Compatibility
Backwards compatibility was a requirement of our design, and the changes we propose in this PEP will not break any existing code:
- 0 in
potential_pybuf_flags
means that the new flags are not supported. - Before calling
PyObject_GetBuffer
with the new flags, callers must check to see if they are supported.
Reference Implementation
The initial implementation is available here: (link in reply)
Before creating a formal PEP, I plan to add tests exhibiting an extension module that demonstrates the new semantics and some documentation outlining the new semantics.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.