The buffer protocol is hugely important in many scientific application. However, it is limited to a quite fixed set of types based on the C single characters code (the struct interface).
We could extend this, which may for example make sense for bfloat16
support that is gaining popularity. For bfloat16
it would make sense to simply agree on a new character. But overall, there are more use-cases which I suspect would be useful (datetimes or just arbitrary user types).
One option would be to try to use name metadata on valid buffer formts. But memoryview()
doesn’t care about the buffer.fmt
being valid (unless you do indexing) and is happy to raise a NotImplementedError
when it sees an “invalid” buffer (memoryview(invalid_buf_fmt_buffer)
works).
That means we can stuff pretty much arbitrary stuff into the buffer format, so long we are certain that it is a clearly invalid format.
So I am wondering if we could extend the buffer protocol with something like defining a new type-code like:
[module$qualname$param]
as a generic extension “type-code” (or really any variations of something like this). The []
should make it clearly invalid currently, I think.
When e.g. NumPy sees such a []
enclosed format, it could then define a protocol on top of it like:
getattr(sys.modules[module], qualname)._numpy_dtype_from_buffer_fmt(param, byteorder="=")
which would have to return a valid numpy dtype instance. An important part here is that NumPy doesn’t have to recognize the dtype directly. It could be defined by a downstream library. (If there are no security concerns, NumPy could import the module. If there are concers, NumPy could raise an error asking the user to import it.)
Cython should be able to do the same thing, just that the user would already type the memoryview at compile time and attach the correct format to it (of course this would need API in Cython to do that).
Would such a fmt
extension making use of currently “invalid” format problematic in any way? As long as we define a protocol (and hope that nobody already does something similar) Python itself doesn’t seem to need to add any support (maybe beyond error message improvements).
Another or additional approach might be to just start the format with an invalid single character to opt into a whole new version.