Support subclassing a memoryview

In Biopython (an open source library for molecular biology) we are subclassing a numpy array to add a few capabilities relevant for biology. Because our class is a subclass of a numpy array, the buffer protocol provided by the numpy array is also supported by our subclass. This is extremely useful, as it allows us to pass instances of our class to a C extension module and directly make use of the buffer protocol.

The numpy developers, however, officially discourage subclassing a numpy array, and advocate wrapping a numpy array instead. I don’t fully understand why subclassing is discouraged, but the main problem seems to be compatibility.

But a class that wraps a numpy array (i.e. instead of subclassing, the numpy array stored as an attribute) does not support the buffer protocol support, which is very inconvenient when passing the object to a C extension module.

Now if I could subclass a memoryview in the following way:

class B(memoryview):
    def __new__(cls, a):
        return super(B, cls).__new__(cls, a)

and create an instance as follows:

import numpy
a = numpy.array([1,2,3,4,5])
b = B(a)

then b supports the buffer protocol by make use of the buffer protocol provided by the numpy array.

However, memoryview does not support subclassing:

>>> class A(memoryview): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'memoryview' is not an acceptable base type

Would it be a good idea to allow subclassing a memoryview? This would enable chaining memoryviews, much in the same way buffers can be chained at the C-level.

I am not sure subclassing memoryview is right approach to support buffer protocol.

Since Python doesn’t support implementing buffer protocol from Python, common approach is adding some method to return an object implementing buffer protocol.

Examples:

You can write data to file without temporary bytes object like:

file.write(obj.getbuffer())

If direct support is really important, this could be some protocol (maybe, __get_buffer__?)

But I think current .getbuffer() approach works fine enough.

Thank you.
The getbuffer method won’t be needed. If my class wraps a numpy array as an attribute .data, and a function f expects an object that supports the buffer protocol, then I could just call f(myobject.data).
But that still loses the benefits of subclassing. For example, I would have to use f(myobject) if myobject is a plain numpy array and f(myobject.data) if myobject wraps a numpy array. That is acceptable if this were just for my own code, but not for a broadly used library like Biopython.
The __get_buffer__ sounds interesting. Maybe __set_buffer__ is more appropriate though; __set_buffer__ could fill in the Py_bufferappropriately and would have to be called only once. I would think though that subclassing memoryview is safer (for example, what would happen if a user calls __set_buffer__ more than once on the same object?).

Note that numpy itself does support a “buffer protocol” (or several of them, even). See __array_interface__ and friends. So if the reason you want to support the buffer protocol is just for interoperating with numpy-based libraries, that might be a simple replacement.

A while ago we had a discussion about sq_concat vs nb_add. There was some (maybe a minority?) consensus around the idea that all slot functions should be reachable via a python-dunder method. It seems bf_getbuffer is another slot function with no dunder method. In PyPy bf_getbuffer is exposed in the __pypy__.bufferable class as __buffer__(self, flags). It is needed for the python-level implementation of _ctypes and used internally for the equivalent of bf_getbuffer: which should return a memoryview of self. __pypy__ also has a function to create a memoryview with a specified shape, strides, format, and itemsize from a buffer: newmemoryview(buffer, itemsize, format, shape=None, strides=None)

bf_getbuffer is non-trivial to expose in Python, because its task is to fill a struct Py_buffer which is not a PyObject*. That said, the __buffer__ semantics could simply be to return a buffer-supporting object.

Also, the various flag values, passed to bf_getbuffer, are currently not exposed in Python AFAIR.

See https://bugs.python.org/issue13797 for a previous discussion of this topic.