PEP 688: Making the buffer protocol accessible in Python

I just talked about PEP 688 at a PyCon lightning talk presenting two options for supporting typing for the buffer protocol:

  1. Adding a new __buffer__(flags) dunder method

This would work similarly to PyPy’s __buffer__ method. We’d map this dunder to the bf_getbuffer slot in Python; Python objects would implement it by returning a memoryview.

Open question: What about the bf_releasebuffer slot?

  1. Adding an __isbuffer__ = True attribute to buffer objects

This is simpler and avoids having to deal with more of the complexities of the buffer protocol. However, this behavior would be unlike any other dunder, and it may be confusing to users if they set the field on a Python class and the class doesn’t actually become a buffer.


I like option 1 best, but I’d like to make sure it works well with the C buffer protocol.

To Serhyi’s point, I think the documentation is often a bit vague about terms like “buffer” or “sequence”. (Is a “sequence” a collections.abc.Sequence, or just a class that accepts ints in __getitem__, or something in between?) I would like to restrict the term buffer to “supports the buffer protocol”, and use more precise terms for the other possibilities.

I’ve written typesheds for MicroPython (GitHub - hlovatt/PyBoardTypeshed: Typesheds (a.k.a.: interface stubs, `pyi` files, and type hints) for MicroPython.) and this would be a great help because it will allow custom buffer types. I currently use:

AnyReadableBuf: Final = TypeVar("AnyReadableBuf", bytearray, array, memoryview, bytes)

AnyWritableBuf: Final = TypeVar("AnyWritableBuf", bytearray, array, memoryview)

Which brings me to the second point that distinguishing between readonly and readwrite is common in MicroPython and would be a valuable addition.

Would also suggest names: AnyReadableBuf and AnyWritableBuf to be consistent with AnyStr.

Thanks for the feedback! I’ll continue to try to think of ways to support writability in an elegant way.

(Also, a constrained TypeVar doesn’t make sense for this use case. We can talk about this further in Discussions · python/typing · GitHub if you like.)

Sadly, I missed your lightning talk.

Is this instead of the Buffer type you’re proposing in PEP 688, or in addition?

Could you show a complete example?

We can talk about this further in Discussions · python/typing · GitHub if you like.

Is there an existing topic or are you proposing to start one. If there is a better solution than a constrained type, I’m all for it.

Please open a new topic.

This would replace current PEP 688’s types.Buffer. I’ll write out some complete explanations.

Option 1. __buffer__ special method

This will allow implementing buffer types in Python too, so it’s also a significant non-typing change.

  • Buffer types implemented in C automatically get a __buffer__ method exposed in Python. It takes a flags: int argument and returns a memoryview wrapping the Py_buffer object returned by the underlying slot.
  • flags is the same as in C, an OR of various fields documented around Buffer Protocol — Python 3.11.0a7 documentation. For convenience, perhaps we should expose those flags in the stdlib somewhere (a types.BufferFlags enum?).
  • Types implemented in Python that define a __buffer__ method automatically get it mapped to the bf_getbuffer slot. They will then be usable as buffers (e.g., they can be passed
  • Not sure yet how this affects the bf_releasebuffer slot.
  • To check for buffers in typeshed or elsewhere, we can now simply define a Protocol with def __buffer__(self, flags: int) -> memoryview: ....
  • For convenience, we can add a typing.SupportsBuffer protocol defining this method. (Or it can go into collections.abc?)
  • For backporting, we can add typing_extensions.Buffer, and we can lie in typeshed that the __buffer__ method existed before 3.12.

Some code samples:

# typeshed builtins.pyi
class bytes:
    def __buffer__(self, flags: int) -> memoryview: ...

# typeshed typing.pyi
class SupportsBuffer(Protocol):
    def __buffer__(self, flags: int) -> memoryview: ...

# user code
from typing import SupportsBuffer

def need_buffer(bf: SupportsBuffer):
     memoryview(bf)

class MyBuffer:
    def __buffer__(self, flags: int) -> memoryview: ...
        return memoryview(b"hi")

need_buffer(MyBuffer())  # works

Option 2: __isbuffer__ = True attribute

This will allow checking for buffer types through a protocol, but not defining them in Python.

  • Buffer types implemented in C automatically expose an attribute __isbuffer__ = True.
  • If a Python type sets this attribute, nothing happens, except that it’s now lying about being a buffer.
  • As with Option 1, we can use Protocols to check for buffers, and add a typing.SupportsBuffer protocol for convenience.

How useful would it be for a Python object to define a __buffer__() method except in something like a Mock?

It seems that calling b.__buffer__() does the same thing as memoryview(b) except for something with the flags. If we only cared about the runtime behavior we could just add the flags to the memoryview() constructor? It seems that the relationship between __buffer__ and memoryview is similar to that between __len__ and len().

The bf_releasebuffer slot is called by memoryview(b).release() and after that the memory view is no longer usable (most operations just give errors).

I like type checks using the presence of a method (i.e., __buffer__) better than checks for a data field (__isbuffer__).

I presume the part of PEP 688 about replacing arg: bytes with args: Buffer is also invalidated? It just doesn’t mean the same thing. If we want to do something about arg: bytes implying arg: bytes|bytearray we should think harder IMO. I could easily be convinced that memoryview doesn’t belong in that union though.

It is important that bf_releasebuffer is set to NULL in bytes and to non-NULL in mutable types. Some C code only accept types with this slot set to NULL.

Thanks, that’s an important detail that I forgot. Is it important enough to distinguish between Buffer and MutableBuffer?

1 Like

Hi!

I am writing this representing the Python Steering Council. Thanks a lot for submitting this PEP to us and apologies for the delay in the response! We have discussed the PEP in detail and are generally happy to accept it but we have an item we would like to discuss. The PEP proposes overloading __buffer__ but we noticed several 3rd party libraries and projects already use this with potentially different semantics. Some examples of this include pypy, pyzmq and other popular projects (from a simple GitHub search). We want the impact of this to be at least discussed in the PEP and ideally reach to maintainers if the plan continues to be to keep __buffer__ in the proposal.

To be clear: the only hard requirement here that we are asking for is having this aspect included in the document and the risk analyzed, but any further effort would be greatly appreciated.

Please, reach out to me or any other member of the Steering Council if you have any questions, or if you need any clarifications or if we can help in any way!

Thanks a lot for the great work!

1 Like

Also, one clarification: we are aware that we have documented that dunder methods are subject to change without warning. The reason we want this aspect to be discussed is so we can consider openly the effects of the change between us and the community, not because we think anything has to change here in the proposal (in the sense that a different dunder needs to be used) necessarily.

Thanks @pablogsal!

Here’s some initial discussion of third-party __buffer__ methods:

So for the most part, existing uses of __buffer__ are compatible with PEP 688, but I’ll reach out to PyPy to hear their thoughts.

I agree that while dunder names are documented as reserved (2. Lexical analysis — Python 3.11.1 documentation; thanks @zware for finding the link), we shouldn’t break compatibility lightly if major third-party users are using a name that is technically reserved. However, in this case I think using the __buffer__ name actually enhances compatibility, so I’m leaning towards keeping the name, but documenting this decision in the PEP.

PyPy would love to see this adopted. As mentioned above, PyPy has a PyPy-specific __pypy_.buffereable base class with the __buffer__ method. The PyPy interpreter will look for that method on instances of classes that inherit from bufferable to implement memoryview(obj). This is what allows us to implement memoryview(ctypes.Array(...)) in pure python.

I removed the undocumented use of __buffer__ from numpy in order to clear up any confusion around this convention.

4 Likes

Sorry for the delay (we had a couple of weeks that we could not meet due to holidays and stuff).

I am very happy to report here that the Steering Council approves PEP 688: Making the buffer protocol accessible in Python with the latest changes. Thanks a lot @JelleZijlstra for the fantastic work and the patience. This is a fantastic improvement and we are very happy to see it happening :slight_smile:

Congratulations! :tada: :metal:

10 Likes