Hi, @alex_Gaynor recently posted a blog post on how CPython’s buffer protocol causes, or at least encourages, C-level undefined behavior. (Please read it if you haven’t, I won’t repeat it here.)
Meanwhile, @jelle’s PEP 688 is trying to make the buffer introspectable in Python, and ran into issues identifying immutable buffers.
Here’s a preliminary sketch of how both could be fixed:
Add a bit field to PyBufferProcs specifying “~potentially supported PyBUF_* flags”. Expose on the Python side.
For backward compatibility, 0 would mean “no information”?
Add a PyBUF_IMMUTABLE flag meaning “nothing (i.e. neither buffer consumers nor any other code) can change the memory while the buffer is held”
Add a PyBUF_EXCLUSIVE flag meaning “nothing except this consumer can read or write to the memory while the buffer is held”
For backwards compatibility: before using these new flags, consumers must check if they appear in the “potentially supported” set to see if the exporter supports them.
Add support for the PyBUF_IMMUTABLE flag to immutable exporters in the stdlib (like bytes).
Potentially add support for PyBUF_EXCLUSIVE to mutable exporters (like bytearray), though the locking would be more involved here.
My TODO list is too long so I won’t get to this any time soon, but if someone’s interested and has basic C or Rust knowledge, I can mentor (and sponsor a PEP). No guarantee of success, of course.
If anyone wants to take this on I’m happy to also provide support alongside @encukou.
Once upon a time I started drafting a PEP on this, but eventually it became too messy (and I ran out of town) trying to deal with semantics around memoryview, since they allow you to create many slices and sub-slices in Python.
I’m interested in taking this on after PEP 688. I think I’ll drop any attempt to detect mutability in PEP 688, so that the PEP only provides the basis for identifying buffers in Python. Then later we can use Petr’s proposed change as a vehicle for richer buffer typing (including mutability, and potentially also other concerns like contiguousness).
Just to add a missing piece, this is my understanding of how Petr’s proposal would fix the problem identified in Alex’s post: pyo3 would inspect the flags field to figure out whether the buffer is safe for shared use, and generate different Rust bindings based on the answer. For example, this would allow it to produce &[u8] Rust buffers from a bytes object. Is that correct?
I’m not sure how pyo3 works, actually
It could also have 3 fallible operations – asking for an immutable, exclusive, or unsafe raw buffer – for any PyObject, and fail if the requested one isn’t available.
Well, bytes is easy – it’s built-in so there can be a special case for it. This proposal is for third-party libraries, which need a way to announce they’re immutable. Currently you’d need to convert those to bytes (with a memory copy) to get a safe immutable buffer.
I think that’s exactly the API you’d end up with, and it’d totally solve our problems! Currently you only have a choice of the raw buffer (which you can either treat as “unsafe” or “safe, but totally unergonomic”): PyBuffer in pyo3::buffer - Rust
I’m interested in taking a look at this. I came from the comment section of the recent LWN article discussing the C API.
I have no experience writing PEPs but I have a decent amount of experience with C and Rust.
I’m planning to read through PEP 1 – PEP Purpose and Guidelines | peps.python.org to get a better understanding of the process. Any other resources (besides those already linked in the thread) that I should take a look at?
That’s the general guide to the PEP process, the most important piece is always making sure your PEP has content that helps the community evaluate the proposal in a complete way, and so PEP 1 – PEP Purpose and Guidelines | peps.python.org is the most important section of that doc.
PEP 1 has lots of details about the process and formatting. Right now, as Alex said, read the What belongs in a successful PEP? section. Ignore the rest for now. You can write the text in Markdown for now, if you prefer.
You’ll want to put in some links to people saying they want the feature – things like Alex’s blog post or the LWN discussion. Start collecting them.
It’s possible to write a PEP before playing with the implementation, but I recommend getting at least a rough draft of the code ready first, so that you know what the PEP will say.
I don’t know how familiar you are with the CPython codebase. The first step is to build from source; if you haven’t done that the instructions are here: Setup and building
Then, if you want to go with my draft plan above, you’ll want to look around:
That sounds good to me. I’ll probably spend a good amount of time this weekend playing around with CPython and getting familiarized with interacting with the buffer protocol from a C extension and then maybe prod at a few things in pyo3 to see what that looks like.
I’m in EST so I should have a good amount of overlap in my morning/your afternoon hours
I can do any time after 2023-11-13T12:00:00Z any day next week.