PEP draft: Safer mutability semantics for the buffer protocol

I have been working on a draft of a PEP to update the buffer protocol based on the suggestions from this thread: Introspection and "mutable XOR shared" semantics for PyBuffer

@alex_Gaynor I was wondering if this draft sufficiently addresses the issues that you originally outlined in your blog post: Buffers on the edge: Python and Rust · Alex Gaynor

I’m also interested in any other feedback that folks might have about what to include in the PEP or any adjustments that we should make.

Unfortunately I am only able to include 2 links in this post due to being a new member, so I’ll include the links in replies to this post.

I’m including the draft inline with this post; let me know if there’s a more preferable way to share:

Preamble

number: tbd
title: Safer mutability semantics for the buffer protocol
author: Samuel Eisenhandler
sponsor: Petr Viktorin

Abstract

This document proposes the introduction of new mutability semantics to the buffer protocol.

Objects that support the new semantics export two “kinds” of buffers:

  • immutable buffers: these come with a promise that the underlying bytes will never change while the buffer is held

  • mutable buffers: these come with a promise that only the holder of the mutable buffer will be able to write to or read from the underlying bytes while the buffer is held

Motivation

The existing semantics for mutability in the buffer protocol make it easy to accidentally cause undefined behavior. For example, if multiple extension modules each have mutable buffers exported from the same object and one of those extension modules drops the GIL, data races are bound to occur.

Several people have expressed that the existing semantics can encourage C-level undefined behavior and some have proposed solutions:

  • Alex Gaynor (link in reply) describes the challenges associated with the buffer protocol in the presence of concurrency and proposes a high-level solution having the buffer protocol implement Rust-like mutable XOR shared semantics.
  • Petr Viktorin (link in reply) responds to Alex’s post and proposes a sketch for a solution that enables callers to request immutable or exclusive buffers.
  • LWN discussion (link in reply) surrounding “wrangling the Python C API” and the ergonomics of Rust bindings.

Rust is becoming an increasingly popular language for writing Python extension modules and imposes significant constraints on mutable aliasing. While an immutable reference (“shared borrow”) exists, nothing should be able to mutate the referent. And while a mutable reference (“exclusive borrow”) exists, nothing else should be able to mutate or even view the referent. As a result, both interacting with and implementing the buffer protocol from Python extension modules written in Rust is not ergonomic. Enabling efficient and ergonomic ways for Python extension modules written in Rust to handle buffers becomes more important as Rust becomes a more popular option for writing extension modules. Additionally, moving toward Rust-like semantics for the buffer protocol also makes it easier to avoid undefined behavior, regardless of the language used to implement the extension module.

The existing flags for the buffer protocol enable callers to request a WRITABLE buffer and if the object does not support exporting writable buffers, the call will fail. However, even when a caller does not request a writable buffer, they may receive one. The new semantics are much stricter about which callers receive writable buffers which should make it much easier to reason about the presence of data races and/or mutable aliasing across multiple Python extension modules.

Specification

We propose to add two new bit flags to the buffer interface:

  • PyBUF_IMMUTABLE: indicates that if the PyObject_GetBuffer call succeeds, nothing can change the memory while the buffer is held

  • PyBUF_EXCLUSIVE: indicates that if the PyObject_GetBuffer call succeeds, only this consumer can read or write to the memory while the buffer is held

Additionally, we will add a bit field to PyBufferProcs to enable implementers of the buffer protocol to indicate which semantics they support:

typedef struct {
getbufferproc bf_getbuffer;
releasebufferproc bf_releasebuffer;
uint32_t potential_pybuf_flags;
} PyBufferProcs;

Because the new semantics require a promise from the exporter, we cannot trivially support them for all objects that currently support the buffer protocol. For backwards compatibility, callers should check if the new flags are supported before including them in a PyObject_GetBuffer call. We propose to add support for the PyBUF_IMMUTABLE flag to the builtin bytes type (which already only supports exporting read-only buffers).

Out of scope

We do not plan to immediately add support for the new flags to the builtin bytearray type. It is not entirely clear what the semantics of bytearray should be when an extension module holds an exclusive buffer.

However, here is a naiive sketch of how we might implement the new semantics for bytearray:
At a given point in time, a bytearray instance may be in one of several different states:

  • unexported: python may view and mutate the underlying bytes arbitrarily, immutable exclusive or “classic” exports may be made
  • immutably exported: python may only view the underlying bytes, only immutable exports may be made
  • exclusively exported: python cannot view or mutate the underlying bytes, cannot be exported again (until the existing mutable export is released)
  • classic exported: python may view or mutate underlying bytes, we can only make other classic exports

Backwards Compatibility

Backwards compatibility was a requirement of our design, and the changes we propose in this PEP will not break any existing code:

  • 0 in potential_pybuf_flags means that the new flags are not supported.
  • Before calling PyObject_GetBuffer with the new flags, callers must check to see if they are supported.

Reference Implementation

The initial implementation is available here: (link in reply)

Before creating a formal PEP, I plan to add tests exhibiting an extension module that demonstrates the new semantics and some documentation outlining the new semantics.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

1 Like

Links:
Alex’s blog post: Buffers on the edge: Python and Rust · Alex Gaynor
Petr’s response to Alex’s blog post: Introspection and "mutable XOR shared" semantics for PyBuffer
LWN discussion: Progress in wrangling the Python C API [LWN.net]

Initial implementation link: Start on buffer protocol adjustments · sgeisenh/cpython@b599949 · GitHub

Thanks for attempting this. I think this would benefit from additional sections:

  • guidelines for producers
  • guidelines for consumers
  • how to transition from the legacy semantics to the safer semantics (in particular, if a consumer requests an immutable or exclusive buffer, but the producer only understands the legacy semantics, what should happen?)

There are cases where this may be complicated to implement for producers. Take this example with a NumPy array and several memoryviews pointing at different parts of the array:

>>> a = np.arange(8)
>>> m1 = memoryview(a[0:2])
>>> m2 = memoryview(a[2:4])
>>> m3 = memoryview(a[0:4])
>>> [m.tolist() for m in (m1, m2, m3)]
[[0, 1], [2, 3], [0, 1, 2, 3]]
>>> m1[1] = 42
>>> m2[0] = 43
>>> [m.tolist() for m in (m1, m2, m3)]
[[0, 42], [43, 3], [0, 42, 43, 3]]

Obviously, in this example, it is safe to read and write from/to both m1 and m2 (even concurrently!) because their underlying memory areas are disjoint. However, reading from m3 is unsafe because its underlying memory area overlaps the other memoryviews.

The slices can be arbitrarily complex (n-dimensional, strided…), so computing whether they overlap is a difficult problem - and potentially expensive. It seems that NumPy has a dedicated internal function for it that can fail if the amount of work to be done is too large.

3 Likes

@sgeisenh if it is useful to you I’d be happy for us to also have an issue / PR / or even an experimental feature in PyO3 if you’d want to play around with how this ends up being usable in a Rust context.

The slices can be arbitrarily complex (n-dimensional, strided…), so computing whether they overlap is a difficult problem - and potentially expensive.

You may be interested in some prior art here in the rust-numpy crate, where we have put some dynamic borrow checking into global state to try to guarantee that at least all accesses going through rust-numpy comply with Rust’s semantics. The first thread of discussion is at RFC: Add dynamic borrow checking for dereferencing NumPy arrays. by adamreichold · Pull Request #274 · PyO3/rust-numpy · GitHub

We’ve run into exactly the complexity of the kind suggested by @pitrou too; we tried to come up with a solution which is a reasonable balance between accuracy and performance, though it is narrower in scope than this PEP would be aiming to solve.

numpy::borrow - Rust (docs.rs)

2 Likes

Thank you both for your replies!

Thanks for attempting this. I think this would benefit from additional sections:

  • guidelines for producers
  • guidelines for consumers
  • how to transition from the legacy semantics to the safer semantics (in particular, if a consumer requests an immutable or exclusive buffer, but the producer only understands the legacy semantics, what should happen?)

Agreed. I’ll start working on those sections and include some examples.

For the transition from the legacy semantics, I had imagined that consumers using the new semantics would first be required to check if the producer supports the safer semantics (using the potential_pybuf_flags field of the producer’s PyBufferProcs struct) before requesting an immutable or exclusive buffer. The consumer can then make a decision about whether or not a buffer with legacy semantics is sufficient and take the appropriate action. This seems like the simplest approach to me but might place an undue burden on consumers; do you have other ideas that would maintain backward compatibility?

One idea that I had was to define some common helpers for consumers that want to use the safer semantics to reduce the amount of boilerplate required to request an exclusive or immutable buffer.

The slices can be arbitrarily complex (n-dimensional, strided…), so computing whether they overlap is a difficult problem - and potentially expensive. It seems that NumPy has a dedicated internal function for it that can fail if the amount of work to be done is too large.

We’ve run into exactly the complexity of the kind suggested by @pitrou too; we tried to come up with a solution which is a reasonable balance between accuracy and performance, though it is narrower in scope than this PEP would be aiming to solve.

I’ll read some of the prior art that you’ve both linked.

Do you think it would be desirable to have a generic solution for attempting to determine overlap to reduce the amount of work required to implement a producer that supports the new semantics? Since there seems to be a lot of complexity in the “maybe disjoint slices” case, is it worthwhile to try to “solve” in this PEP or is there enough value in supporting the simple “completely immutable” case for the PEP to be worthwhile with an incomplete discussion of how to properly implement a producer for the complex case? Regardless, I would like to better understand the potential solutions for implementing a producer that supports the complex case.

if it is useful to you I’d be happy for us to also have an issue / PR / or even an experimental feature in PyO3 if you’d want to play around with how this ends up being usable in a Rust context.

I do think that would be useful, though I was also hoping that I might be able to work on this as I improve the PEP! Do you think there might be an opportunity to collaborate here or would it be dramatically more efficient for an existing PyO3 maintainer to look at this?

If you mean “expose in the C API something similar to NumPy’s overlap detection routine”, then why not? We should ask the NumPy developers for advice. @rgommers @mattip

I’m not sure I understand the suggestion, but I think the current scope of the PEP (with “immutable” and “exclusive” exports) is adequate.

To be clear, the PEP should attempty to guide producers and consumers into implementing the specified semantics, but I don’t think it needs to provide an off-the-shelf solution, especially for complex producers such as NumPy.

2 Likes

Yay! Thanks for posting this!

You did exactly the right thing.
If you prefer, as your sponsor/mentor I could review each draft before before you publish it. But that’s completely optional.

Do we know of other kinds of UB than data races? If not, drop the “for example”.

The reason I want to make Rust’s approach possible in Python is that it’s a good approach to avoiding data races.
It is obviously a limiting approach, but Rust has proven that it’s useful and scalable. That should be why Rust gets mentioned in the PEP. Making Rust bindings easier to write is just a nice side-effect here.

This needs fleshing out, even if it might be obvious from the name. Also, the specification section should have the full spec; the Backwards Compatibility can repeat it or refer to it.
Something like:

  • a 0 bit indicates the flag is not supported
  • a 1 bit indicates that the flag might be supported by this instance; PyObject_GetBuffer will check it and fail if it can’t honor it.

In this proposal, there is no way to indicate that a flag is always supported. We found no use case that isn’t served by LBYL – calling PyObject_GetBuffer.
(The use case for potential_pybuf_flags itself is checking whether the exporter will even look at a given bit in the PyObject_GetBuffer request.)

For backwards compatibility, if a class is created with potential_pybuf_flags=0, then PyType_Ready will set the bits used in Python 3.12’s buffer protocol.


I’d go as far as to say that there are no plans to change Python’s bytearray. (Not with me as a PEP sponsor. Currently you can always get a classic buffer from a bytearray, and changing that would be a compatibility break that I’m not comfortable with.)

IMO, the sketch would be better targeted at third-party extension modules, or new additions to the stdlib.

IMO, if the PEP touches on this at all, it could recommend the producers should be strict first – e.g. lock the whole array if any view of it is “borrowed”. Consumers can use “classic” buffers if this is a problem ­– and if they’re satisfied with the level of correctness they get that way.
If the producer can be more clever than that, good for them!

As Antoine said: no. The PEP – and Python – should provide the necessary API for such a complex case, but not more. That doesn’t mean such work wouldn’t be welcome, just that CPython is not the place for it.

1 Like

Given the Rust code

let v: Vec<u8> = construct_some_vector();
let r = &v[0];
call_some_python_code_bridged_with_pyo3(&v, *r);
println!("{}", r);

the Rust compiler is going to happily optimize this to something like

let v: Vec<u8> = construct_some_vector();
let r = v[0]; // value not pointer
call_some_python_code_bridged_with_pyo3(&v, r);
println!("{}", r);

which fetches the first element of the array only once instead of twice, because it assumes the call to Python cannot mutate v, since it is passed as &v and not &mut v. It’s undefined behavior in the abstract Rust semantics to mutate v if you’re handed an immutable reference &v. This doesn’t need threads to become a problem. In the example above, the abstract UB triggered is mutation of an immutable value, and the concrete translation is that the value of r becomes wrong (edit: clarified and shortened). This can trigger all sorts of misbehavior and security issues.

TLDR: When bridging Rust code with C, the relevant types of undefined behavior are also those defined by the Rust side, not just the C spec. The C UB you can inadvertently trigger with mutable buffers is data races I guess (plus iterator invalidation etc.) but the Rust UB includes mutating a value behind a shared reference.

2 Likes

Thanks for putting this together! I’ve only had a brief read, but my biggest recommendation would be to include some language describing what you expect the impact on memoryview to be, since it’s one of the most dynamic and flexible ways to interact with the buffer API.

2 Likes

The PEP shouldn’t focus on specific issues that Rust bridges like PyO3 have; but on general shortcomings that they uncover.
Getting a buffer, dropping the GIL and manipulating the data is reasonable usage of the C API, regardless of whether you do it in Rust, C or assembly. The reasonable usage leads to data races; we need changes in CPython to solve that.
On the other hand, getting a buffer and incorrectly assuming that Rust’s &v semantics apply to it would be an issue in the Rust code. That’s not in scope for this PEP.

1 Like

Agreed that there is no need to provide any implementation for Rust/PyO3 as part of the PEP.

I won’t have bandwidth to implement anything for you in the short term so I definitely wouldn’t be more efficient :blush: What I was more going for was an invitation for you use to PyO3 as a test case (of both producers and consumers, potentially), if that helps inform your design at all. I can support from the Rust side if needed.

1 Like

Thoughts on the protocol addition:

  1. If you ensure the returned buffer also flags immutable you can successfully gamble that no current consumer cares about additional flags being passed. This is true at least for most big producers: NumPy/Cython and also Python itself, I am very certain.
    I can see Python not wanting to condone such behavior, but there is little to lose if you want to early opt in.
  2. I very much like the idea of adding new capability flags: it is low cost and would allow adding new features without Python API changes in the future (users can backport new flags). I think this alone is reason enough to just add that new flags slot (even without a concrete plan).
  3. Having overlap detection helpers or flags in Python seems unnecessary to me. Not saying it can’t be useful, but in pratice overlap is rare and complicated overlap even more so. In NumPy I prefer at least defaulting to read-only for self-overlapping arrays (not that it is always true).
  4. I am curious if there are thoughts on alternative ways to handle the locking (effectivel?). This does not allow holding a view for a long time. (I am totally fine with that, though. The use-case here is simply not taking a long lived view.)
  5. I do wonder a bit if “exclusive” is just immutable+writable, although maybe it is clearer even if it isn’t?

I can see immutable as a useful concept in general. Because it is stronger than not writable (that forbids writing, but doesn’t strictly guarantee it cannot be changed: I suspect everyone will currently consider that a user problem, but I could see that changing eventually. Especially since copy on write semantics are becoming more common with pandas adopting it) and JAX is immutable also (but probably doesn’t care about pushing the subtleties to the user).

About NumPy itself:

  1. NumPy does not have a locking mechanism on array data. We do have a check for writability that which may be a possible locking point. I suspect we may need such a mechanism for object arrays in a no-gil world, although I am not certain that would actually live at the same place (it might fit better on the dtype rather than the views).
  2. NumPy does not currently keep track of views. Keeping track of arbitrary views and their overlapping would require some new data structure to keep track of a tree of views?!

In other words, it seems like a fair bit of work necessary to have enough bookkeeping so that NumPy could guarantee immutable/exclusive access in all but the simplest cases.
That isn’t a problem, just a warning that the API addition here seems like the easy part.

PS: In general, I would be a lot more interested in extensions around additional datatypes or device support but that is just nerd sniping to see if I get some inertia to drive those.

2 Likes

AFAIU, it shouldn’t be an incorrect assumption if the buffer is advertised as immutable.

I’m still working on making some adjustments to the draft and assembling some illustrative examples. Replying to recent posts, in the meantime.

I’m going to defer this, for now, and include references to discussions on managing overlapping views in the PEP.

That seems reasonable to me; I’ve started writing up the new sections to offer some recommendations.

I don’t fully grasp this suggestion. What does PyType_Ready need to do? Should the existing (as of Python 3.12) flags be included in the potential_pybuf_flags field for consistency? And then the check for whether or not a request can be honored can be performed in PyObject_GetBuffer with a few bitwise operations?

That makes sense, I’m hoping to add an example in a unit test if that seems reasonable.

I’ve put a little bit of thought into this. The current semantics of memoryview make this a little bit tricky:

A couple of possibilities I’ve been thinking about:

  • We could consider having memoryview attempt to request an immutable buffer if the exporter supports the new flags (which the memoryview could then “forward”).
  • We could introduce new type(s) that enable callers to more explicitly specify the semantics that they’re interested in.

Though I’m unsure if this belongs in the scope of this PEP.

What do you mean by additional? I think I’m a little dense.

I have been thinking a little bit about this, especially with the discourse about the free-threaded interpreter. But at least for the PEP, I’m mostly aiming to provide improvements to the interface without the need for additional synchronization. Definitely something I’m curious to explore more, though!

I’ll keep it in the back fo my mind :slight_smile: This is my first experience drafting a PEP (or contributing to CPython in a meaningful way) and so far I’ve been having a lot of fun!

Sorry, “additionally”, as in additional to the currently specified ones are just ignored in practice.

Yes. IMO, it would be nicely consistent if the bits in this field worked the same for both old and new flags. That’s not a hard requirement – if there’s a reason not to do it, let’s not do it – but it would be nice.

Hm, thinking about that, you might want to change this part of the PEP:

callers should check if the new flags are supported before including them in a PyObject_GetBuffer call

The PyObject_GetBuffer function could do the check itself, right?
(It would still be nice to allow callers to check easily, for introspection or nicer error messages, but it could be optional.)

Yeah, requesting a readonly buffer currently seems to mean “I’m OK with a read-only buffer, but if you can give me a read-write one, go ahead”.
We definitely don’t want to request an immutable buffer by default: memoryview itself will list PyBuf_IMMUTABLE in its potential_pybuf_flags, but the request would fail for views of all existing types.

Well, I guess a Python API for memoryview creation does belong in the PEP. But let’s worry about that after the C side is fleshed out?

That sounds safer, but I don’t like the idea of it not being possible to query. That is because you would actively make it hard/impossible to backport potential new flags (even more so if it should be acceptable for the new flag to be simply ignored).
If it is easy to query, that is not a problem (besides additional logic like choosing to guess something like readonly == immutable on older versions of NumPy).

There may also be reasons to want to distinguish this type of failure from others, although I am not sure. (Some libraries like mpi4py use try: ... except BufferError: to probe various protocols, but I am not sure if this would be relevant for them to distinguish.)

1 Like