PEP 688: Making the buffer protocol accessible in Python

Jelle · May 1, 2022, 2:50pm

I just talked about PEP 688 at a PyCon lightning talk presenting two options for supporting typing for the buffer protocol:

Adding a new __buffer__(flags) dunder method

This would work similarly to PyPy’s __buffer__ method. We’d map this dunder to the bf_getbuffer slot in Python; Python objects would implement it by returning a memoryview.

Open question: What about the bf_releasebuffer slot?

Adding an __isbuffer__ = True attribute to buffer objects

This is simpler and avoids having to deal with more of the complexities of the buffer protocol. However, this behavior would be unlike any other dunder, and it may be confusing to users if they set the field on a Python class and the class doesn’t actually become a buffer.

I like option 1 best, but I’d like to make sure it works well with the C buffer protocol.

To Serhyi’s point, I think the documentation is often a bit vague about terms like “buffer” or “sequence”. (Is a “sequence” a collections.abc.Sequence, or just a class that accepts ints in __getitem__, or something in between?) I would like to restrict the term buffer to “supports the buffer protocol”, and use more precise terms for the other possibilities.

hlovatt · May 1, 2022, 9:44pm

I’ve written typesheds for MicroPython (GitHub - hlovatt/PyBoardTypeshed: Typesheds (a.k.a.: interface stubs, `pyi` files, and type hints) for MicroPython.) and this would be a great help because it will allow custom buffer types. I currently use:

AnyReadableBuf: Final = TypeVar("AnyReadableBuf", bytearray, array, memoryview, bytes)

AnyWritableBuf: Final = TypeVar("AnyWritableBuf", bytearray, array, memoryview)

Which brings me to the second point that distinguishing between readonly and readwrite is common in MicroPython and would be a valuable addition.

Would also suggest names: AnyReadableBuf and AnyWritableBuf to be consistent with AnyStr.

Jelle · May 1, 2022, 11:34pm

Thanks for the feedback! I’ll continue to try to think of ways to support writability in an elegant way.

(Also, a constrained TypeVar doesn’t make sense for this use case. We can talk about this further in Discussions · python/typing · GitHub if you like.)

guido · May 2, 2022, 12:57am

Sadly, I missed your lightning talk.

Is this instead of the Buffer type you’re proposing in PEP 688, or in addition?

Could you show a complete example?

hlovatt · May 2, 2022, 10:24am

We can talk about this further in Discussions · python/typing · GitHub if you like.

Is there an existing topic or are you proposing to start one. If there is a better solution than a constrained type, I’m all for it.

Jelle · May 2, 2022, 3:21pm

Please open a new topic.

Jelle · May 2, 2022, 3:54pm

This would replace current PEP 688’s types.Buffer. I’ll write out some complete explanations.

Option 1. `buffer` special method

This will allow implementing buffer types in Python too, so it’s also a significant non-typing change.

Buffer types implemented in C automatically get a __buffer__ method exposed in Python. It takes a flags: int argument and returns a memoryview wrapping the Py_buffer object returned by the underlying slot.
flags is the same as in C, an OR of various fields documented around Buffer Protocol — Python 3.11.0a7 documentation. For convenience, perhaps we should expose those flags in the stdlib somewhere (a types.BufferFlags enum?).
Types implemented in Python that define a __buffer__ method automatically get it mapped to the bf_getbuffer slot. They will then be usable as buffers (e.g., they can be passed
Not sure yet how this affects the bf_releasebuffer slot.
To check for buffers in typeshed or elsewhere, we can now simply define a Protocol with def __buffer__(self, flags: int) -> memoryview: ....
For convenience, we can add a typing.SupportsBuffer protocol defining this method. (Or it can go into collections.abc?)
For backporting, we can add typing_extensions.Buffer, and we can lie in typeshed that the __buffer__ method existed before 3.12.

Some code samples:

# typeshed builtins.pyi
class bytes:
    def __buffer__(self, flags: int) -> memoryview: ...

# typeshed typing.pyi
class SupportsBuffer(Protocol):
    def __buffer__(self, flags: int) -> memoryview: ...

# user code
from typing import SupportsBuffer

def need_buffer(bf: SupportsBuffer):
     memoryview(bf)

class MyBuffer:
    def __buffer__(self, flags: int) -> memoryview: ...
        return memoryview(b"hi")

need_buffer(MyBuffer())  # works

Option 2: `isbuffer = True` attribute

This will allow checking for buffer types through a protocol, but not defining them in Python.

Buffer types implemented in C automatically expose an attribute __isbuffer__ = True.
If a Python type sets this attribute, nothing happens, except that it’s now lying about being a buffer.
As with Option 1, we can use Protocols to check for buffers, and add a typing.SupportsBuffer protocol for convenience.

guido · May 3, 2022, 11:11pm

How useful would it be for a Python object to define a __buffer__() method except in something like a Mock?

It seems that calling b.__buffer__() does the same thing as memoryview(b) except for something with the flags. If we only cared about the runtime behavior we could just add the flags to the memoryview() constructor? It seems that the relationship between __buffer__ and memoryview is similar to that between __len__ and len().

The bf_releasebuffer slot is called by memoryview(b).release() and after that the memory view is no longer usable (most operations just give errors).

I like type checks using the presence of a method (i.e., __buffer__) better than checks for a data field (__isbuffer__).

I presume the part of PEP 688 about replacing arg: bytes with args: Buffer is also invalidated? It just doesn’t mean the same thing. If we want to do something about arg: bytes implying arg: bytes|bytearray we should think harder IMO. I could easily be convinced that memoryview doesn’t belong in that union though.

storchaka · May 4, 2022, 3:51am

It is important that bf_releasebuffer is set to NULL in bytes and to non-NULL in mutable types. Some C code only accept types with this slot set to NULL.

guido · May 4, 2022, 4:05am

Thanks, that’s an important detail that I forgot. Is it important enough to distinguish between Buffer and MutableBuffer?

pablogsal · February 7, 2023, 3:56pm

Hi!

I am writing this representing the Python Steering Council. Thanks a lot for submitting this PEP to us and apologies for the delay in the response! We have discussed the PEP in detail and are generally happy to accept it but we have an item we would like to discuss. The PEP proposes overloading __buffer__ but we noticed several 3rd party libraries and projects already use this with potentially different semantics. Some examples of this include pypy, pyzmq and other popular projects (from a simple GitHub search). We want the impact of this to be at least discussed in the PEP and ideally reach to maintainers if the plan continues to be to keep __buffer__ in the proposal.

To be clear: the only hard requirement here that we are asking for is having this aspect included in the document and the risk analyzed, but any further effort would be greatly appreciated.

Please, reach out to me or any other member of the Steering Council if you have any questions, or if you need any clarifications or if we can help in any way!

Thanks a lot for the great work!

pablogsal · February 7, 2023, 5:28pm

Also, one clarification: we are aware that we have documented that dunder methods are subject to change without warning. The reason we want this aspect to be discussed is so we can consider openly the effects of the change between us and the community, not because we think anything has to change here in the proposal (in the sense that a different dunder needs to be used) necessarily.

Jelle · February 7, 2023, 5:47pm

Thanks @pablogsal!

Here’s some initial discussion of third-party __buffer__ methods:

PyPy’s __buffer__ is documented at The __pypy__ module — PyPy documentation. Its semantics actually match PEP 688, so PEP 688 would improve CPython/PyPy compatibility here.
pyzmq uses PyPy’s __buffer__ semantics (pyzmq/message.py at fe18dc55516ef50d168fc02f8550a67ff5b5633d · zeromq/pyzmq · GitHub)
mpi4py (mpi4py/typing.py at 453b87d0da37c5914b91afb511b188556dff2a9c · mpi4py/mpi4py · GitHub) defines a PEP 688-compatible SupportsBuffer protocol.
bokeh uses __buffer__ as a key in some serialization dict (bokeh.util.serialization — Bokeh 2.4.0 Documentation), which doesn’t conflict with PEP 688.
numpy used to have an undocumented behavior where it would look up a __buffer__ attribute that would return a buffer, but this was removed in 2019: MAINT: remove undocumented __buffer__ attribute lookup by mattip · Pull Request #13049 · numpy/numpy · GitHub

So for the most part, existing uses of __buffer__ are compatible with PEP 688, but I’ll reach out to PyPy to hear their thoughts.

I agree that while dunder names are documented as reserved (2. Lexical analysis — Python 3.11.1 documentation; thanks @zware for finding the link), we shouldn’t break compatibility lightly if major third-party users are using a name that is technically reserved. However, in this case I think using the __buffer__ name actually enhances compatibility, so I’m leaning towards keeping the name, but documenting this decision in the PEP.

mattip · February 7, 2023, 6:24pm

PyPy would love to see this adopted. As mentioned above, PyPy has a PyPy-specific __pypy_.buffereable base class with the __buffer__ method. The PyPy interpreter will look for that method on instances of classes that inherit from bufferable to implement memoryview(obj). This is what allows us to implement memoryview(ctypes.Array(...)) in pure python.

I removed the undocumented use of __buffer__ from numpy in order to clear up any confusion around this convention.

pablogsal · March 7, 2023, 3:16pm

Sorry for the delay (we had a couple of weeks that we could not meet due to holidays and stuff).

I am very happy to report here that the Steering Council approves PEP 688: Making the buffer protocol accessible in Python with the latest changes. Thanks a lot @JelleZijlstra for the fantastic work and the patience. This is a fantastic improvement and we are very happy to see it happening

Congratulations!

KubaSO · September 26, 2024, 10:47am

This is great, but lacks a key feature (to me at least): that of wrapping a ctypes pointer + length. It is needed to interoperate with OS/library APIs.

Is there a way currently to go from a tuple[c_pvoid, int] or something similar to a memoryview?

Replying to myself: yes, it is possible.

array_ptr = cast(pvoid, POINTER(c_char * length))
view = memoryview(array_ptr.contents)

encukou · September 26, 2024, 4:29pm

I hope to get Rian Hunter’s patch for memoryview_at into Python 3.14’s ctypes.

PEP 688: Making the buffer protocol accessible in Python

Option 1. __buffer__ special method

Option 2: __isbuffer__ = True attribute

Option 1. `buffer` special method

Option 2: `isbuffer = True` attribute