Memoryview multi-dimensional slicing support

dg-pb · May 7, 2024, 1:01am

PEP 3118 – Revising the buffer protocol | peps.python.org states:

__getitem__ (will support multi-dimensional slicing)
__setitem__ (will support multi-dimensional slicing)

However:

mv = memoryview(b'\x00').cast('b', (1, 1))
mv[:,:1]
---------------------------------------------------------------------------
NotImplementedError: multi-dimensional slicing is not implemented

Has anyone else missed this?

Don’t think much would change if only slicing was to be implemented.

However, it might open a fairly big window if full featured indexing as per https://data-apis.org/array-api was to be implemented.

vberlier · May 15, 2024, 11:50pm

As you pointed out, the buffer protocol is the foundation of a pretty extensive API surface. Python wouldn’t be able to maintain its own complete implementation. So my understanding has always been that Python’s own memoryview builtin was an intentionally limited utility that only implements the minimum required to perform low-level operations on buffers normally managed by external libraries. By low-level operations I mean things like preventing a buffer from being garbage collected and transferring it from one library to another. For high-level features, like multi-dimensional slicing, you’re meant to use the high-level abstractions provided by external libraries, like numpy arrays.

But it’s also true that there’s no technical limitation here, the memoryview builtin could reasonably support some of the more simple APIs as long as it doesn’t become too much of a burden for the core maintainers.

dg-pb · May 16, 2024, 12:22am

I am a bit in a pickle. I intend to write some basic array infrastructure, which is not to be used in scientific computing, but serve as utility for dependency free parts of infrastructure - e.g. standard library extensions, where anything like numpy is not and shouldn’t be a dependency.

The last thing I want to do is to write my own implementation in C. And memoryview almost fits the purpose - the only thing it lacks is multi-dimensional indexing.

If there were both __getitem__ and __setitem__ implemented as per Array API, it would be possible to implement a pretty good tensor object without any C work. With these 2 implemented, many other features can be implemented efficiently.

vberlier · May 17, 2024, 1:20pm

Not sure how you could make an actually useful tensor object based on memoryview without also implementing a bunch of item-wise operations in C though. Even if the full __getitem__ and __setitem__ were implemented natively, you’d still have to iterate in python, repeatedly boxing values to perform any operation. At this point there wouldn’t be much benefit over subclassing or wrapping memoryview to implement multi-dimensional slicing in python yourself.

dg-pb · May 17, 2024, 2:35pm

Thanks, I think I need to think it over one more time.