Hi everyone.
I have been digging through the buffer protocol and realized that it appears to specify native Python support for arbitrary nested data structures, pretty much like what you can do with nested C structures, plus arbitrary N-dimensional arrays. Previously, my assumption was that - for example - the struct module supports only flat structures, and no arrays at all.
However, PEP3118 goes further, see here. The “Additions to the struct string-syntax” seem to be as powerful as structured Numpy arrays (or record arrays) with custom (and possibly nested) dtypes. PEP3118 specifically mentions the intention to support more complex memory layouts as in Numpy and the ctypes module.
The PEP is pretty clear that (refering to the additions)
The struct module will be changed to understand these as well and return appropriate Python objects on unpacking.
However, those additions seem to be unsupported in Python 3.13.
The two examples given in PEP3118 (“Nested structure” and “Nested array”) are rejected when passed to struct.calcsize or struct.pack, throwing a
struct.error: bad char in struct format.
Do I misunderstand something, or was it really forgotten to implement this section of PEP3118?
Some background information on how I came across this issue:
Since quite a long time, I found it convenient to use Numpy arrays to make some more complex C structures accessible in Python. I specifically use numpy arrays with custom structured dtypes to interface with native code written in C (and compiled as Python C extensions). I’m working with numerical algorithms and often I need to pass large double arrays back and forth between Python and compiled C code. Numpy is quite convenient because with numpy.dtype(…, align=True) it guarantees binary compatibility with a standard C compiler, according to the Numpy documentation. So I can manipulate my memory structures in native Python while having compiled C code operating on the same data structure. I use this mechanism a lot, and most of the time the data structures are not hardcoded but parametric (the C code is generated code).
Having a Python-native way to access structured binary data is definitely a good idea, and superior to relying on third-party libraries. That’s why I like the approach chosen in PEP3118.
I also found that Numpy actually implements the syntax additions from PEP3118, as in this example:
>>> import inspect
>>> import numpy
>>> dt = numpy.dtype([("a", "f8"), ("b", "f8", 3), ("c", "u4", (2, 2))])
>>> arr = numpy.zeros(shape=(), dtype=dt)
>>> buf = arr.__buffer__(inspect.BufferFlags.FORMAT)
>>> buf.format
'T{d:a:(3)d:b:(2,2)I:c:}'
In plain words: we define a numpy.dtype for a custom data structure, then request a memoryview via Python’s buffer protocol, and find that Numpy gives us a format string compliant with the extended PEP3118 syntax, and equivalent to the custom dtype.
So I’m actually wondering how I can interface with such an array using only the Python standard library - i.e. using the struct module, which I would expect to support the same syntax.
Thanks for any help.