Choice of complex buffer protocol format intentional break with PEP?

It was reported to NumPy that the newly added buffer protocol support for complex array.array and ctypes uses the character codes D and F rather than Zd and Zf.

Now, it’s a fair choice to accept the friction that this introduces, but for now it is still unclear that this was considered much and I think it is a violation of the PEP.
Yeah, I understand that the PEP said parts of it weren’t implemented and it’s a historical documented, although I would read it as “by Python itself” not as a rejection of the syntax.

So the question is simple, using D vs. Zd is a bit neater to implement, as it means things follow the struct module exactly and I can understand that the choice there went to D.
But, it means one of the following:

  • NumPy (and that means all meaningful users of the buffer protocol) would have to keep exporting Zd (and Python doesn’t understand this!).
  • NumPy and everyone else would have to export D on Python 3.15+ and Zd on older versions meaning an API break for downstream. EDIT: Sorry, not a real option since stable ABI extensions exist and that would break them all.

Or I guess the other question is: Does it make sense from the Python to avoid a bit of friction in it’s complex buffer protocol implementation (to me it seems like a bit, but I dunno) or is the slow above transition worthwhile?

(Historically, I have no idea what the background is. Since NumPy used D already, I wouldn’t be suprised if, at the time of the PEP, Zd was a preference by Python. But it could also have been one the scientific python side because they e.g. felt that Zi could make sense and single letters are really limiting…)

2 Likes

Why? I don’t think this is a part of the PEP 3118 specification. It looks that proposed changes were not discussed during PEP review. That part is not a specification, just because it’s too vague and unclear. See this for example. Complex-related critique:

“complex (whatever the next specifier is)”. Not really ‘whatever’. You can not have a ‘complex bool’ or ‘complex int’. What other types of complex are there besides complex double?

I think that recent (year ago) addition to the PEP clarify things:

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Buffer Protocol, PyBufferProcs, PyMemoryView_FromObject.

Not all features proposed here were implemented. Specifically:

This PEP targets Python 3.0, which was released more than a decade ago. Any proposals to add missing functionality should be discussed as new features, not treated as finishing the implementation of this PEP.

So, it’s not a part of PEP, period.

BTW, the NumPy itself uses different type codes for ndarray() and for buffer protocol. Which convention we should adopt e.g. for the array module and why?

We could add support for multiple digits in type codes and implement 'Zd' as an alias for 'D' in the struct module and co, as interoperability with NumPy is important. Though:

maybe we should discuss more deep changes in the struct’s format mini-langiage, where 'Zd' code will be a tiny part of the whole picture. IMO, recent thread (AFAIK, referenced NumPy issue is one outcome of that thread) shows the need for some extensibility for the struct module (ability to register user types with custom pack/unpack helpers).

“complex (whatever the next specifier is)”. Not really ‘whatever’. You can not have a ‘complex bool’ or ‘complex int’. What other types of complex are there besides complex double?

Sure, it is perfectly fine… Some Z<char> combinations will error out, just like any random character will error out. But I could have a Z<decimal> or a Zi dtype, just almost nobody uses it.

I would argue that Python should accept that the NumPy implementation as the canonical implementation since the ecosystem around that is a far more important user of it than Python itself.

That doesn’t mean Python can’t break with it of course! I just want it to be clear about the fact that it is breaking with the important prior implementation.

I think that recent (year ago) addition to the PEP clarify things:

Yeah, true… But in the discussion nobody noted the fact to the SC that it is incorrect to call these “unimplemented”. All of the additional format string specifiers are implemented by a vast ecosystem downstream. So there is a vital piece of information missing here.

Maybe I am missing this, but if this information was considered when merging that PR, then yes, I agree. But I guess I’ll ping @encukou on that.

Yeah, that would be great. I suggested: Buffer protocol and arbitrary (data) types - #13 by seberg (sorry that I lost track on this, for a while tried to rope in more people, but that effort petered out. I would be interested to restart it…).
(That said, I don’t think Zd fits the bill here as it is existing prior art and any extension needs to not use a single/multiple character pattern.)

1 Like

But maybe using Zd for the buffer protocol is impractical in Python, e.g. because ctypes already does weird things? So maybe using D is the only practical thing even if it means accepting that the community must fracture around it a bit…
Hopefully, that won’t be a terribly big fracture, because everyone will just implement both on the importing side (dunno if Python itself will though?).

Hi Sebastian,

Thanks for raising this issue here.

D (double complex) and F (float complex) format were introduced in Python 3.14 which was released on October 2025, so IMO it’s too late to change the chosen format strings.

  • Python 3.14 added D and F support to the ctypes and struct module
  • Python 3.15 adds D and F support to the array module and to memoryview

Before the 3.14 final release, ctypes and struct used C (double complex) and E (float complex) formats. These formats were replaced with D and F for compatibility with numpy:

$ python3.14
>>> import numpy as np
>>> np.__version__
'2.3.5'

>>> np.typename("F")
'complex single precision'
>>> np.typename("D")
'complex double precision'

>>> np.dtype(np.complex64).char
'F'
>>> np.dtype(np.complex128).char
'D'

>>> np.array([1, 2, 3], dtype='F')
array([1.+0.j, 2.+0.j, 3.+0.j], dtype=complex64)
>>> np.array([1, 2, 3], dtype='D')
array([1.+0.j, 2.+0.j, 3.+0.j])

>>> memoryview(np.array([1, 2, 3], dtype='F')).format
'Zf'
>>> memoryview(np.array([1, 2, 3], dtype='D')).format
'Zd'

As Sergey wrote above, in the bfloat16 format discussion, Serhiy proposed adding an API to register custom types:

It may be a solution to D vs Zd formats, maybe both can be accepted transparently.

PEP 3118 was written in 2006. It now has the header: “This PEP is a historical document”. I don’t think that using this old PEP is still relevant nowadays. Maybe numpy should reconsider its error message on unknown format.

Yeah, it’s unfortunate that numpy and Python stdlib (array, ctypes, struct and memoryview) are using different formats. But it’s maybe the opportunity to provide a generic solution to this kind of problem (API to register formats).

It’s also unfortunate that this issue wasn’t raised before, but as I wrote above, IMO it’s too late to change Python stdlib. By the way, currently the stdlib only use single letter formats which make some parts of the code simpler.

5 Likes

OK :). I think the “ooops” is on Python here, but maybe it just doesn’t matter much.

I had thought of array mostly (where exporting a different format string is unproblematic). But I can see that for ctypes it might be tedious and also it may be bad to diverge from the struct module in ctypes (and for which, I understand to not want to use Zd).

The ecosystem still can only add support for importing D slowly and (in a few years) consider changing it, though.

So, I would still suggest you implement Zd where-ever it makes sense (e.g. the few memoryview helpers, or if array.array would grow a frombuffer() method). If you like document it as “for backwards compatibility”, but the truth is that downstream projects are better of exporting Zd for the next few years (since it’ll take time for the whole ecosystem to also accept D).

1 Like

For what it’s worth, I’m hoping to revive the pre-PEP Sebastian and I explored in 2025. Now that I have triage rights in CPython it seems like a fun and useful way to make a more substantial contribution to CPython. Hopefully you’ll hear something more detailed from me later this year.

6 Likes

Why “too late”? This is obviously incompatible with PEP 3118 and also with NumPy’s long-standing usage of “Zd” and “Zf”.

If you claim to be compatible with a spec (a PEP even!) and you’re not, then it’s a bug that needs fixing.

Are you kidding me? This PEP is implemented by many third-party packages (including, of course, the ever-popular NumPy). It is in broad use every day by tens of millions of users at least. It sits at the heart of the Python data ecosystem and is the fundamental building block that allows zero-copy interoperability between all those packages.

That some CPython developers are not aware of this is not a reason to declare this PEP “old and irrelevant”.

What is old and should probably be declared irrelevant, however, is the standard library array module. Short of deprecating and removing it, perhaps it should stand as a policy that any proposed improvement to the array module should first be submitted to the attention of NumPy developers before being accepted. This would avoid mishaps like this.

1 Like

But what it is? As I noted above, the NumPy uses different type codes in different contexts.

Are you sure? PEP says:

Unpacking a long-double will return a decimal object or a ctypes long-double.

Where it’s implemented this way?

I don’t think, that this is too late. But what we should do to achieve best compatibility with NumPy, @seberg ?

Add 2-letter aliases for 'D' and 'F' types? Deprecate former type codes?

In Python we are trying to use same type codes across stdlib modules. That’s not true for NumPy, which sometimes use 'Zd' and sometimes 'D'.

Should we keep such aliases forever?

Your PEP still need a sponsor.

That seems to be not true, per SC decision.

I asked above where it was implemented the part of this proposal, related to long doubles. I suspect — nowhere. But maybe you know.

With all due respect, what’s actually implemented — very different from the “specification” from the PEP, which is vague and unclear. See related CPython issue.

Should we then apply same policy to the struct module? Its type codes used by the memoryview. To ctypes?

How we should be in cases (like current), when NumPy is just inconsistent by itself and uses different codes in different contexts (i.e. buffer protocol and for type codes in ndarray arguments) for no good reasons?

1 Like

Which SC decision are you talking about?

I have no idea about long doubles, but how is that related to complex floats and doubles (which are implemented)?

That’s a lot of whataboutism, isn’t it?

In any case: PEP 3118 specifies the semantics of the buffer protocol. Any type implementing the buffer protocol should follow the conventions specified by PEP 3118. That is the case of the memoryview object and of the array.array object as well.

Whether or not the struct module should use the same conventions as the buffer protocol is a separate decision.

It already was referenced above: PEP 3118: Add canonical-doc & mention unimplemented changes. by encukou · Pull Request #4200 · python/peps · GitHub

Why do you think so? PEP says about Z prefix:

complex (whatever the next specifier is)

But NumPy doesn’t support gaussian integers like 'Zi'. Does that still means that this “specification” was implemented?

I don’t know if this was a “SC decision” (apparently just a pull request?), but the language added there seems misleading and unfortunate.

Perhaps CPython only added the ? format code, but other packages did more. Since PEP 3118 is an interoperability PEP, not a CPython internal thing, its implementation status should take major implementations into account.

This discussion is about format codes “Zd” and “Zf”, which are implemented by NumPy.

Well, as Antoine said, I do think it’s fair to say that Python is in violation of the PEP here[1]. But what does that give us, I dunno. You could use that as argument to just change ctypes probably, the question if it is helpful and actually the better outcome.

Let’s not confuse things: D is used for the user-facing API (where single character codes make sense) for users to specify a complex double. Zd is used for the buffer protocol and is never seen by end-users.
Now, I don’t disagree that this clear distinction doesn’t quite work for the struct module (and maybe also ctypes?). It does work for array.array("D", ...), though, as it is trivial to translate the D to Zd when exporting the buffer.

Now… What does that give us? I dunno. If we prefer D for user-facing API (which makes sense also from a NumPy perspective), then using it for the struct module makes sense.
If it doesn’t matter that the struct module matches exactly here, Python could just do the same divergence, or maybe support both?

So, I dunno… I don’t disagree that D everywhere is the simpler choice :slight_smile:. But of course it still breaks with the ecosystem if used in the buffer protocol…
If Python prefers using D everywhere (or maybe not for array.array export?!), I think it should:

  • accept Zd when importing the buffer protocol (it is rather meaningless, the only place might be memoryview(complex_arr) which doesn’t even need the character code support for much – at least to me most users only use memoryview as an owning buffer)
  • Document that Z<char> is an accepted part of the PEP and probably best note that for the time it is the preferred way to export. (Yeah, probably nobody cares about Zi, but I doubt it is worth the time to actually be confident about that “probably”.)
    I don’t even care too much about the best documentation, so long as we agree on it. (EDIT: in the end both is just to avoid projects naively following Python and causing breakage.)

I.e. it is mildly annoying but not the end of the world that we would have to accept that the buffer protocol has two spellings for the same thing.
But Zd is the “more correct” one for the next few years at least, that we have to agree on, whether Python actually uses it or not.


  1. I dismiss the argument that the PEP has this note. It’s a clear oversight to include the section as considered “unimplemented” unless there is an explicit note somewhere that makes it clear the SC discussed this aspect. Nobody would ignore the clear community adoption without a discussion or ping to that community.) ↩︎

2 Likes

I think it would be nice if the Python standard library were to support importing buffer protocol that specifies 'Zf’ or 'Zd’.

>>> import array
>>> import numpy as np
>>> a = np.array([1.0+2.0j, 3.0+4.0j], dtype=np.complex128)
>>> a
array([1.+2.j, 3.+4.j])
>>> aa = array.array('D', a)
>>> aa
array('D', [(1+2j), (3+4j)])

>>> memoryview(a)
<memory at 0x7f5bef6fd3c0>
>>> memoryview(aa)
<memory at 0x7f5bef6fd240>

>>> memoryview(a).tolist()
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    memoryview(a).tolist()
    ~~~~~~~~~~~~~~~~~~~~^^
NotImplementedError: memoryview: unsupported format Zd

>>> memoryview(aa).tolist()
[(1+2j), (3+4j)]

For complex32, it would be nice to decide on something, probably 'E’. Let’s not introduce further ambiguity by having some modules export using 'E’ and others export using 'Ze’.

My concern about an API to register custom dtypes is its resulting in an explosion of identifiers for the same thing. As an example, I’d like to see everybody use the same format code for a buffer containing bfloat16 (and everybody use the same format code for a buffer containing complex bfloat16). Maybe providing an API which can be used to register format codes willy-nilly is a bad thing. I’d rather see a PEP that’s widely inclusive (e.g., format codes for int128, uint128, quad-precision floating point, quarter precision floating point, decimal floating point, …). Nobody would implement all of these. But anybody who wants to implement any one of them would know what format string to use.

1 Like

I don’t think appealing to the docs can be of much help here. They’re all over the place, and leave crucial details to the imagination.

Brief type codes show up in several contexts: struct, array, ctypes, buffer protocol, extensions (like numpy).

There’s no central place to look at that constrains their choices. And different contexts have different aims:

  • At one extreme, ctypes intends to faithfully expose all the quirks of the platform C compiler. For example, on Windows it does not support complex numbers. Because, to date, Microsoft’s C compiler doesn’t. And that’s fine!

  • At the other extreme, struct intends to be a Swiss army knife, in one mode reproducing size, endian, and padding decisions of the platform C, and in other modes forcing “standard” sizes and endianness (with no padding), giving compact platform-independent binary representations. And it supports complex numbers because Python does, regardless of whether the platform C does.

  • array is most closely allied with the buffer protocol: a dense array of raw native machine bytes, without internal padding.

  • numpy has its own needs beyond all those, due to its larger number of scalar types.

What’s needed, but doesn’t seem to exist, is reference documentation addressing all of these in a unified and detailed way.

In the absence of that, everyone will stumble along as best they can, and we’ll end up with multiple type codes, in different contexts, with the same meanings. struct’s original 1-letter type codes have outgrown their modest beginnings. But there’s no need for every context to support all type codes used across the union of all contexts. Different contexts have different aims.

2 Likes

I think you should convince SC on this, not just me :wink:

No, the discussion was about whether this “specification” was implemented somewhere. But the text doesn’t look as a specification at all. It just happened that some proposals from that section are implemented in some way across the Python ecosystem.

Lets not use this argument, unless SC reconsider their position.

Sorry, I don’t see much difference:

>>> import inspect
>>> import numpy as np
>>> a = np.array([1-1j, 1j])
>>> a.dtype.char
'D'
>>> m = a.__buffer__(inspect.BufferFlags.FULL)
>>> m.format
'Zd'

Both type codes are visible for me as end user. It just happens to be that the later code is not documented in the NumPy, besides reference to the PEP. BTW, the memoryview docs says about format attribute:

A string containing the format (in struct module style) for each element in the view. A memoryview can be created from exporters with arbitrary format strings, but some methods (e.g. tolist()) are restricted to native single element formats.

I like this approach, as it much more local: we need to change the memoryview and the array module, which uses buffer protocol.

So, in meanwhile, we will accept importing with either 'D' or 'Zd' format codes. And the array module will export complex arrays with the 'Zd' format code. Ditto for 'F'. Should this be a part of transition to 'D'/'F' type codes or will stay forever?

Edit:

A patch to play with.

This should fix Issues · numpy/numpy · GitHub

diff --git a/Modules/arraymodule.c b/Modules/arraymodule.c
index a86a7561271..d9389797a71 100644
--- a/Modules/arraymodule.c
+++ b/Modules/arraymodule.c
@@ -2960,6 +2960,12 @@ array_buffer_getbuf(PyObject *op, Py_buffer *view, int flags)
         if (sizeof(wchar_t) >= 4 && self->ob_descr->typecode == 'u') {
             view->format = "w";
         }
+        if (self->ob_descr->typecode == 'F') {
+            view->format = "Zf";
+        }
+        if (self->ob_descr->typecode == 'D') {
+            view->format = "Zd";
+        }
     }
 
     self->ob_exports++;
diff --git a/Objects/memoryobject.c b/Objects/memoryobject.c
index 4cbbb7eb7cd..ca31ec8e5a2 100644
--- a/Objects/memoryobject.c
+++ b/Objects/memoryobject.c
@@ -1220,6 +1220,21 @@ get_native_fmtchar(char *result, const char *fmt)
     case 'D': size = 2*sizeof(double); break;
     case '?': size = sizeof(_Bool); break;
     case 'P': size = sizeof(void *); break;
+    case 'Z':
+        {
+            if (fmt[1] == 'f' && fmt[2] == '\0') {
+                size = 2*sizeof(float);
+                *result = 'F';
+            }
+            else if (fmt[2] == 'd' && fmt[2] == '\0') {
+                size = 2*sizeof(double);
+                *result = 'D';
+            }
+            else {
+                return -1;
+            }
+            return size;
+        }
     }
 
     if (size > 0 && fmt[1] == '\0') {
@@ -2196,6 +2211,14 @@ adjust_fmt(const Py_buffer *view)
     const char *fmt;
 
     fmt = (view->format[0] == '@') ? view->format+1 : view->format;
+    if (fmt[0] == 'Z' && fmt[1] != '\0') {
+        if (fmt[1] == 'f' && fmt[2] == '\0') {
+            return "F";
+        }
+        if (fmt[1] == 'd' && fmt[2] == '\0') {
+            return "D";
+        }
+    }
     if (fmt[0] && fmt[1] == '\0')
         return fmt;
 

So, it looks that “letter-saving” convention doesn’t save us letters at all? :slight_smile:

For a single-precision real-valued array, JAX exports using format '=f’ and for complex64, '=Zf’.
Should memoryview’s tolist()accept these strings?
Or, is this a JAX bug?

>>> a = jnp.array([1,2], dtype=jnp.float32)
>>> memoryview(a).tolist()
Traceback (most recent call last):
  File "<python-input-5>", line 1, in <module>
    memoryview(a).tolist()
    ~~~~~~~~~~~~~~~~~~~~^^
NotImplementedError: memoryview: unsupported format =f

From documentation:

A memoryview can be created from exporters with arbitrary format strings, but some methods (e.g. tolist()) are restricted to native single element formats.

So, it’s documented behavior.

With more context:

Part of the problem. What does that mean, exactly? I don’t know what “struct module style” could mean. A single letter? A string? “Arbitrary format strings” implies the latter - but then why mention struct at all?

Or “native”. The struct module itself uses “native” to apply to 3 different concepts: byte size, endianness, and alignment. But never to its type codes.

Some connection with struct is intended, but clear as mud to me exactly what. I’ve noticed via trial-and-error (not from the docs) that operations like amemoryview.cast('d') work the same way if the string passed is '@d' instead, which is struct’s way of spelling “C double with native byte size, native endianness, and native alignment” (same as plain 'd' means to struct).

All intended connections should be explicitly documented (among all consumers of type codes: struct, memoryview, buffer protocol, cytpes), along with exhaustive lists of format strings accepted by all, and those accepted by only some. And in one place.

1 Like

I agreed, it’s too vague. Apparently, some “proposed additions” from the PEP 3118 are implemented in the ctypes:

>>> import ctypes
>>> class Point(ctypes.Structure):
...     _fields_ = [("x", ctypes.c_double), ("y", ctypes.c_double)]
... p = Point(1, 2)
... memoryview(p).format
... 
'T{<d:x:<d:y:}'

I think that part is much less ambiguous. Native single element format string is just a single-letter struct’s format with “all native” (byteorder, size, alignment), i.e. default when format character not prefixed by '=', '<' and so on.

Yes, as '@' is the default for struct module. BTW, cast() has similar remark as for `tolist():

The destination format is restricted to a single element native format in struct syntax.