Introspection and "mutable XOR shared" semantics for PyBuffer

Hi,
@alex_Gaynor recently posted a blog post on how CPython’s buffer protocol causes, or at least encourages, C-level undefined behavior. (Please read it if you haven’t, I won’t repeat it here.)
Meanwhile, @jelle’s PEP 688 is trying to make the buffer introspectable in Python, and ran into issues identifying immutable buffers.
Here’s a preliminary sketch of how both could be fixed:

  • Add a bit field to PyBufferProcs specifying “~potentially supported PyBUF_* flags”. Expose on the Python side.
    • For backward compatibility, 0 would mean “no information”?
  • Add a PyBUF_IMMUTABLE flag meaning “nothing (i.e. neither buffer consumers nor any other code) can change the memory while the buffer is held”
  • Add a PyBUF_EXCLUSIVE flag meaning “nothing except this consumer can read or write to the memory while the buffer is held”
  • For backwards compatibility: before using these new flags, consumers must check if they appear in the “potentially supported” set to see if the exporter supports them.
  • Add support for the PyBUF_IMMUTABLE flag to immutable exporters in the stdlib (like bytes).
  • Potentially add support for PyBUF_EXCLUSIVE to mutable exporters (like bytearray), though the locking would be more involved here.

My TODO list is too long so I won’t get to this any time soon, but if someone’s interested and has basic C or Rust knowledge, I can mentor (and sponsor a PEP). No guarantee of success, of course.

6 Likes

I’m glad my post sparked some interest!

If anyone wants to take this on I’m happy to also provide support alongside @encukou.

Once upon a time I started drafting a PEP on this, but eventually it became too messy (and I ran out of town) trying to deal with semantics around memoryview, since they allow you to create many slices and sub-slices in Python.

3 Likes

I’m interested in taking this on after PEP 688. I think I’ll drop any attempt to detect mutability in PEP 688, so that the PEP only provides the basis for identifying buffers in Python. Then later we can use Petr’s proposed change as a vehicle for richer buffer typing (including mutability, and potentially also other concerns like contiguousness).

Just to add a missing piece, this is my understanding of how Petr’s proposal would fix the problem identified in Alex’s post: pyo3 would inspect the flags field to figure out whether the buffer is safe for shared use, and generate different Rust bindings based on the answer. For example, this would allow it to produce &[u8] Rust buffers from a bytes object. Is that correct?

2 Likes

I’m not sure how pyo3 works, actually :sweat_smile:
It could also have 3 fallible operations – asking for an immutable, exclusive, or unsafe raw buffer – for any PyObject, and fail if the requested one isn’t available.

Well, bytes is easy – it’s built-in so there can be a special case for it. This proposal is for third-party libraries, which need a way to announce they’re immutable. Currently you’d need to convert those to bytes (with a memory copy) to get a safe immutable buffer.

I think that’s exactly the API you’d end up with, and it’d totally solve our problems! Currently you only have a choice of the raw buffer (which you can either treat as “unsafe” or “safe, but totally unergonomic”): PyBuffer in pyo3::buffer - Rust