Choice of complex buffer protocol format intentional break with PEP?

Well, I disagree that PEP 3118 is “a standard”. It leaves too many questions unanswered to serve in that capacity. Which is why the PEP preface points to the CPython docs for reference. which latter don’t even mention PEP 3118. And, alas, also leave too many questions unanswered to serve as an adequate standard.

But there is no process that can update a 20-year-old PEP. There are processes to update Python’s reference docs.

None of this precludes that the basic ideas in PEP 3118 were very successful and widely adopted. The PEP was fine so far is it went. It didn’t go far enough to serve as “a standard”, though, which is becoming ever more apparent as time passes.

2 Likes

Point of order :winking_face_with_tongue: - the way to get SC involvement is to file a ticket on the SC’s tracker.

2 Likes

It’s also worth reminding that PEPs are not documentation. They are the historical record of our decision making process. It’s great that many PEPs (including 3118) include a prominent note at the top stating this and point folks to the source-of-truth[1] relevant documentation.


  1. -ish? :winking_face_with_tongue: ↩︎

2 Likes

True, but they can be superseded by newer PEPs.

2 Likes

:Which I expect would be best here. struct’s original “1-character type codes” don’t scale to a world with ever-more types. Overkill for adding just one more type, but “just one more” is never the end of it.

2 Likes

Nobody question the first part of your claim: some “additions to the struct-string syntax” were implemented. But how we can verify correctness, if we have only PEP as a replacement for documentation? You only can define, that it’s correctly implemented, say, in NumPy.

Shouldn’t this knowledge be documented somewhere or we must look on the code?

Great. What practically do you suggest? Consider 'D' type code. Now it’s used in the CPython for memoryview, and for ctypes/array/struct modules.

Should we add 'Zd' type code instead (as alias), use only it in the memoryview, deprecate 'D' everywhere else (ctypes/struct in fact), and then remove? But 'D' was already chosen for compatibility with NumPy: they use both codes and 'Zd' looks more like “internal” one. In the end of such transition we will have simpler documentation, but I believe this will be a source of confusion for users (and bugs “why you not use 'D' as NumPy?”).

Should we add support for 'Zd' type code only for the buffer protocol? I.e. array/ctypes will use that code in Py_buffer, but 'D' for the rest.

Something else?

E.g. transition (just similar to the first option) from 'Zd' to 'D' in NumPy and Co?

I think it’s a separate issue. I don’t see how PEP 3118 address this. IIRC, the 'D' type was already used by NumPy when the PEP was discussed. Why new notation was chosen for the PEP?

1 Like

I don’t. The original post is about the brief type-code strings to be used for a specific type. How can that be discussed without knowing the space of allowable strings?

It doesn’t, directly. Neither do Python’s docs. Part of the problem here: all relevant docs are underspecified, leaving fundamental questions unaddressed.

PEP 3118 did propose a large pile of new type codes for struct to implement - but almost none of which were implemented in struct.. All the ones for new scalar types were 1-letter codes, no matter how strained. As if “1 letter” were a design constraint imposed by struct. But it isn’t. The struct docs refer only to “format characters”, which may be read as implying “just 1 character”, but isn’t as clear as a standard should be.

Nobody here seems to know. The OP speculated about that in the topic’s initial post

I don’t know either, although:

  • I doubt Python devs pushed for Zd - they overwhelmingly don’t give a rip about complex numbers, after all.
  • Single letters are indeed severely limiting now.
3 Likes

Here relevant NumPy issue: Questions about pep3118 format strings · Issue #24428 · numpy/numpy · GitHub. Maybe it’s not a bug (as it wasn’t triaged for years) for NumPy developers, but clearly shows more examples to my point (that part of the PEP 3118 is not a specification).

I did some more black archeology, here my findings for a very wide search strings:

Only last thread seems to be related to the subject in some part, but it’s rather about semantics than new syntax.

No argument from me - but I go on to say there’s no adequate specification anywhere yet. The PEP at least made clear that it was, at the time, intended to be tightly coupled with Python’s actual struct syntax, and suggested a bunch of concrete (albeit still woefully underspecified) new codes. All that seemed to survive of that in the Python docs is that codes must be “in struct module style”. left to the reader to puzzle out what that might mean..

But it is Travis saying Zd was intentional, although not addressing why that was added when D was already in use by numpy for “double complex” (albeit outside their buffer protocol contexts).

3 Likes

Right, to me this is the discussion in: Buffer protocol and arbitrary (data) types - #13 by seberg, or the bfloat16 (or a new one).
This is important and I care about it a lot (I have to catch up on the bfloat16 one)!

For me, this thread was about a specific unintentional divergence for complex floating types and the constraints when changing the format specification.
From my perspective (I am not quite sure we all agree on it), adding D and or additionally asking to stop using Zd is an API divergence or break. Doesn’t mean much, except that it should be done with that awareness[1].

Now, I don’t care too much about breaking with some parts of the PEP. The points about the inadequacy of both the PEP specificity and the also parts NumPy implementations aren’t wrong. So we have to try and use them and make progress from there.

I am a friend of trying to guess the impact:

  • E.g. the X{} function pointer syntax is possibly used by no-one. You can get away with just saying it is invalid. Just in case, I would try to avoid re-using it with a different meaning, anyway (if mostly to avoid having to search if anyone might be using it after all).
  • Weirder things like subarray alignment detail? Yeah, sorry we didn’t discuss it at the time… This is pretty niche in practice: It would be good to make sure there is a way to export any dtype in a way that is understood by both new and old NumPy behavior if we push for a change. But I expect it is niche enough that we could think about a DeprecationWarning, even if that isn’t technically a smooth transition, and to just accept if there is a longer divergence between NumPy and Python.
    • I.e. needs a bit of thought, but besides avoiding to use the same syntax for something different, I suspect we are in the clear.
    • Things in this category (probably): & (because I don’t really see it used as well),
  • Zd is hugely adopted. So let’s just not break it or even “deprecate” it (until there is confidence practically everyone also supports D which will take years). And because it is used so much, I think it is best if Python itself also accepts it to avoid anyone having to choose which version is “right”.[2]
    • g/Zg: I would start in this category, but it is likely niche enough to consider it the above.
    • t: I honestly think we would have to do a search to see how much it is used if we want to use it for something else.

(I personally don’t like to say “this is a bug” even if it clearly is, but sure, it can be used to tilt that towards “let’s just break it”. – except of course if others really need the bug to be fixed)

EDIT: sorry, if this is all obvious, I don’t mind to discuss other things, I suppose I am just looking for a bit of closure :).


  1. I think Python ideally would have realized it may be that. Others may say that Antoine, I, or someone else should have seen the ctypes change or PEP note go through… It doesn’t matter, the only action item here is to amend the docs so that they don’t make it sound like a format change/extension is clearly unproblematic.] ↩︎

  2. Unless someone looks at it, shrugs, and says: Yeah, we can just use Zd in Python core libraries as well, not a big deal. But that doesn’t sound likely – and I have neither thought it through to make that case, nor do I care all that much about it. ↩︎

2 Likes

Oh, I agree it should definitely be better documented. In this case, you don’t have to look at the code, though. It’s enough to create a NumPy array of complex numbers and see how it gets exported:

>>> a = np.array([1+4j])
>>> a
array([1.+4.j])
>>> memoryview(a).format
'Zd'

(this snippet with Python 3.12)

Or you can ask the relevant people, for example one of the NumPy maintainers.

Ok, so let’s separate concerns so that the problem is well understood:

  1. public-facing APIs can use whatever type codes they want; it’s ok if one library chooses D, another Cd or Zd or whatever else
  2. the format code exported in the buffer protocol, though, should follow what the ecosystem has standardized upon (which is, incidentally, the format codes listed in PEP 3118)

This is item 2 we should be discussing, because not using the same format codes as other libraries means you’re not interoperable with them (and vice-versa).

Basically, yes.

2 Likes

The code suggested for complex in the PEP is

‘Z’	complex (whatever the next specifier is)

I have no idea what “whatever the next specifier is” was intended to say. Are you somehow managing to read it as meaning the letter ‘d’ appended to ‘Z’? If so, that’s quite a trick :wink:. I guess you’re reading it as “Z is the start of multi-character codes for some number of distinct complex types, and is followed by a code (“the next specifier”) specifying the common type of the real and imaginary components”. Perhaps we’ll see “Zbfloat16” some day.

That said, I agree that Python’s implementation of the buffer protocol should follow numpy in using ‘Zd’ for the buffer protocol’s “complex double” format code. PEP or not, practicality beats purity.

While the PEP clearly intended that struct use the same codes as the buffer protocol, the PEP’s implementation didn’t follow through on that. Which is unfortunate. A proliferation of distinct alphabet soup notations isn’t attractive.

1 Like

Yes, as far as I’m concerned that’s the logical and obvious reading of the snippet you’re quoting.

I appreciate that this might not be immediately obvious to anyone reading it. I also assume the PEP did undergo a review process before being accepted, but I wasn’t a CPython contributor yet (you were, however :wink: ).

:tada:

And I appreciate that “‘Z’ complex (whatever the next specifier is)” might be adequate shorthand for those who already knew what it meant.

We were firmly in BDFL-land then, so all PEP flaws from that era are entirely Guido’'s fault :wink: While he could, if he wanted, punt a decision to someone else, PEP 3118 doesn’t name a BDFL delegate.

I’ve tried hard to find the PEP acceptance message, but failed to find it. So did two bots (Gemini and Copilot). I think it would help, because my memory is that it wasn’t the usual rubber-stamp “accepted, and thanks!”.

As I recall, it had real reservations, but accepted the PEP anyway in the interest of moving Python 3 along, for which the extended buffer protocol was a major new feature for Py3 acceptance in the scientific Python community.

1 Like

The problem is that such “divergence” (unintentional or not) exists in the NumPy itself.

Nobody ask to do anything like this. Struct and ctypes modules have no support for complex types for years before v3.14. Also, the memoryview had no support for 'Zd' or 'Zf' before, nor now (try tolist() method, for exampl). Per SC decision, “proposals” for the struct module aren’t part of the PEP 3118. The PEP doesn’t specify how to add new format codes. What does NumPy here is not documented at all.

So, I don’t think this break anything in the Python ecosystem. Someone can add support for 'F' and 'D' types to interact with the stdlib via buffer protocol. Or ignore new capabilities of the stdlib.

I’m not sure, should we accept things like 'Zi'? PEP doesn’t tell us too much. As Tim said above, it’s not obvious even for 'Zd'. We have to ask NumPy (looking on the code or playing with API it offers like you did) what’s a “correct” interpretation of “whatever the next specifier is”.

BTW, what’s it’s memory layout? Like Annex G double complex? Like Py_complex or MSVC’s _Dcomplex? From sources it looks that npy_cdouble might be an alias for double _Complex or e.g. _Dcomplex. Different C types, in principle. The CPython’s struct, array and ctypes modules always assume memory representation, compatible with the C standard. For good or bad — it’s different from the NumPy, and that may warrant new type codes.

Well, playing more with the patch, proposed above — I find that it’s much more hard to support such “brain split” in practice. (And I’m not taking into account complications of docs.) So, I’m not ready to come with a solution along this line soon.

Reversion of my recent patches might be an alternative:

Though, the ctypes module also offers support for buffer protocol (undocumented) and one since v3.14 uses 'F' and 'D' codes. If we can treat this as a bug — I think we could easily fix that, see patch below.

ctypes patch
diff --git a/Modules/_ctypes/_ctypes.c b/Modules/_ctypes/_ctypes.c
index 55eade1c830..83968e0c9c4 100644
--- a/Modules/_ctypes/_ctypes.c
+++ b/Modules/_ctypes/_ctypes.c
@@ -311,7 +311,7 @@ _ctypes_alloc_format_string_for_type(char code, int big_endian)
         break;
     }
 
-    result = PyMem_Malloc(3);
+    result = PyMem_Malloc(4);
     if (result == NULL) {
         PyErr_NoMemory();
         return NULL;
@@ -320,6 +320,7 @@ _ctypes_alloc_format_string_for_type(char code, int big_endian)
     result[0] = big_endian ? '>' : '<';
     result[1] = pep_code;
     result[2] = '\0';
+    result[3] = '\0';
     return result;
 }
 
@@ -3098,7 +3099,22 @@ PyCData_NewGetBuffer(PyObject *myself, Py_buffer *view, int flags)
     view->len = self->b_size;
     view->readonly = 0;
     /* use default format character if not set */
-    view->format = info->format ? info->format : "B";
+    if (!info->format) {
+        view->format = "B";
+    }
+    else {
+        view->format = info->format;
+        if (view->format[1] == 'F') {
+            view->format[1] = 'Z';
+            view->format[2] = 'f';
+            view->format[3] = '\0';
+        }
+        if (view->format[1] == 'D') {
+            view->format[1] = 'Z';
+            view->format[2] = 'd';
+            view->format[3] = '\0';
+        }
+    }
     view->ndim = info->ndim;
     view->shape = info->shape;
     view->itemsize = item_info->size;

What do you think about this? CC @vstinner

But I believe, first we should ask SC to reconsider their position, no? Would you mind to open an issue?

I think that added support for 'e' format type could be kept for memoryview/array, despite nor NumPy docs, nor PEP specify anything about this. It seems NumPy uses same type code for export with the buffer protocol, so — probably we are safe here.

2 Likes

You did a better job of guessing than I did :smile:

When I read

‘Z’	complex (whatever the next specifier is)

I had nothing like Zd or Zi in mind. I never used complex numbers in numpy, so had no memory cells that could bring such stuff to mind.

Instead my thoughts were more like: OK! The PEP is adding a pile of new 1-character codes for types, following struct’s lead. So it’s ‘Z’ for complex. Or maybe not - “whatever the next specifier is” may be Travis’s telegraphic way of saying “‘Z’ works for me, so I’ll use it now to make progress, but I’ll have to go back and check to see whether ‘Z’ is really “the next specfifier” struct hasn’t already used”.

Now that I know the intent was to introduce multi-character codes for some scalar types, I find it amazing that the PEP didn’t take pains to point out what a break that was from struct’s “1-letter type codes” design.

Regardless, if the goal is interoperability across apps, it’s essential for all apps to explicitly agree on the entire collection of format strings that must be supported by all implementations of the buffer protocol.

The PEP seemed to intend to do that by making Python’s struct docs the definitive source of format knowledge - but that got dropped on the floor for whatever reasons. Which left a void that remains unfilled.

1 Like

@skirpichev thanks for looking into what changes would be needed! From a code perspective (and maybe user experience, but I think for ctypes this is not a big thing) is there any argument to avoid just using Zd in the buffer protocol portion then?

There is still the argument that aligning struct and buffer-protocol is actually very nice, because at least conceptually, it means you can do struct.unpack(memoryview.format, memoryview.ptr). And as I said, I am happy if Python prefers "D" for struct.unpack and thus adds it as a valid specifier.
(And at that point, it really isn’t all that big deal to just export it as well.)

So, I think, that gives us:

  • Python could probably just switch over to using Zd for the buffer protocol. (Effective API break in ctypes, but it was introduced in 3.14 and unlikely to be widely used and downstream can support both if desired pretty easily.)
  • But, if Python devs prefer, it is perfectly fine to have Zd and D side-by-side as it has it’s advantages (in which case it is good to accept Zd unless awkward).
    • I am willing to punt on this really, so long we agree to think of all of this as introducing a side-by-side new API.

(And sorry, but while I want to, for now I’ll refuse to go back to discussing “poorly written/ill specified”. I don’t think establishing blame helps here; the only action item for that is improving the docs a bit. – Yeah, sometimes blame does matter. But in this case I really don’t see it. In my view, like any good accident, there is enough blame to go around for everyone to take a bit. Plus, also in my view, it isn’t like this is a bad thing… There are worse bugs in every single release of every single piece of softare…)

1 Like

Probably, I just can’t easily remove prior knowledge about NumPy’s behavior. The PEP text itself doesn’t say anything about concrete complex types, including 'Zd'.

Another signal that this part of the proposal wasn’t discussed. Probably this happened somewhere in the NumPy community, but I can’t find any traces of it.

In principle, yes, in my previous post:

Probably that’s a very minor point: in most cases, a struct with two double fields will have same memory layout as an array of two doubles (or double _Complex). But the C standard doesn’t guaranteed this.

If we don’t count this as a bug — we can’t change that without a deprecation. Though, I think you are correct that such change will not introduce much breakage.

Sorry for ignoring this part, but that is incorrect: the C and C++ standards does guarantee this[1] for anything that is specified as being a complex. So it is even abiguous/minor.

Yeah, but I expect we have established well enough that this is an available option (even if it might need “signoff”)
But, while my gut feeling is probably slightly in favor for just sticking with Zd. Even considering that I started this thread, I don’t really have a strong enough opinion to make a recommendation right now :confused:.


  1. E.g. Arithmetic types - cppreference.com includes the note: “Each complex type has the same object representation and alignment requirements as an array of two elements of the corresponding real type (float for float complex, double for double complex, long double for long double complex). The first element of the array holds the real part, and the second element of the array holds the imaginary component.” ↩︎

How about

typedef struct {
    double real;
    double imag;
} Py_complex;

This is same as MSVC’s (which doesn’t have complex types from the C standard point of view) _Dcomplex (well, at least for some versions). Does the standard guarantee that this has same memory representation as array of two doubles? I don’t think so.

It’s minor, yes.