Choice of complex buffer protocol format intentional break with PEP?

Yes, unless you actually argue that technically there is no definite C standard and everyone just randomly agrees (of course there is a C++ one and the optional complex.h in C).[1]

In my personal opinion: Can we please drop these lines of arguments? I honestly doubt anyone here can suggest serious alternative interpretations for anything relevant…

Yes, a proper standard should be precise enough that it doesn’t need the reader to do this type of work. But, the PEP is very old and the truth is it isn’t written like a “proper standard”.
That may be very frustrating but while it is a good argument to want to disregard it; it isn’t a reason for it not being an API break to do so.
And to be clear I don’t think it ever was a reason for that: the reason, as noted, was that there is no API break because it was deemed unimplemented (and that happens to be incorrect).


But it still leaves everything open :slight_smile: and Python has completely free choice here: Use Zd to ensure that the buffer protocol has one obvious way to spell complex numbers. Or choose D because it is nice if the struct module long-term agrees with the buffer protocol!

EDIT: Let me add a poll for the heck of it:

  • Use Zd to avoid two spellings.
  • Use D for long-term struct-buffer protocol alignment
  • I don’t care
0 voters

  1. The biggest – not even relevant – “deviation” I have ever seen myself is cuda::std::complex having a larger alignment requirement. ↩︎

It actually does, hiding in section “Examples of Data-Format Descriptions”:

While memory fades, at the time I probably figured it was just a typo, overlooking that my best reading of the earlier text was that ‘Z’ was the 1-character type code tentatively (or so it seemed to me) suggested for complex.

When things go wrong, assigning “blame” isn’t at all the point - it’s much more “those who don’t learn from history are doomed to repeat it”. Digging into what went wrong here is needed to inform decisions about what might be done to cut the chances of it going wrong again in rhyming ways.

What appears to be the core thing thing that went wrong here: the PEP intended Python’s struct module docs to be the definitive reference for buffer protocol type formats. But that part of the PEP was never implemented, leaving us for 2 decades with no central definitive reference. It’s beyond time to plug that hole, especially because we know now that more new type formats are almost certainly coming.

We can’t settle that in this topic, though, which is micro-focused on complex

There needs to be some central definitive reference, though, and for all its good intentions, PEP 3118 is too loose about all the picky details to pretend it is one.

I didn’t vote, because the scope of it wasn’t clear. If you’re asking only about the buffer protocol, yes, it should certainly accept Zd. But not necessarily only Zd. If a Python-specific alias D is also accepted, that’s also fine by me. But it’s doomed to become an ever-more incoherent mess in the absence of central definitive reference docs.

For which the Python docs are the natural home. Or a new “informational PEP” incorporated by reference, although that’s less atrtractive to me.

2 Likes

the way to get SC involvement is to file a ticket on the SC’s tracker.

1 Like

The C23 standard, section 6.2.5 “Types” states:

Each complex type has the same representation and alignment requirements as an array type containing exactly two elements of the corresponding real type; the first element is equal to the real part, and the second element to the imaginary part, of the complex number.

C++ allows reinterpreting a complex number (or a pointer thereto) as an array of two elements.

The Python struct (note 10) documents that

For the 'F' and 'D' format characters, the packed representation uses the IEEE 754 binary32 and binary64 format for components of the complex number, regardless of the floating-point format used by the platform. Note that complex types (F and D) are available unconditionally, despite complex types being an optional feature in C. As specified in the C11 standard, each complex type is represented by a two-element C array containing, respectively, the real and imaginary parts.

2 Likes

See this.

The problem is that both NumPy and CPython works with systems, that don’t conform to this part of the C standard. The MSVC has no complex numbers in sense of Annex G. And C standard doesn’t guaranteed that two-element homogenous struct will have same memory representation as two-element array.

NumPy uses such custom complex types. CPython — not.

1 Like

I agree that’s right. The standards require that arrays be contiguous (no padding between adjacent elements), but a struct with the same number of members (as the array has elements) all of the array’s element type may suffer internal padding, at the compiler’s discretion. “Same memory layout” isn’t guaranteed.

Although I don’t think the distinction matters on any platform Python runs on, or is likely to run on in the future. If such a platform pops up, numpy could presumably worm around it by, instead of using a struct, use a union, one branch of which is the two-double struct, and the other branch an array of two doubles. Although whether that’s guaranteed “to work” may vary between C and C++.

It’s worth giving it some thought, but not much :wink:. Python’s own struct would break on such platforms too in “native” mode, because it has no idea how the platform C actually pads structs. It relies instead on the obvious “common sense” layout that follows from enforcing per-member minimal alignment requirements - which is, I believe, how all compilers used to compile Python actually do work.

1 Like

Well, it’s a case-by-case decision. You don’t have to accept all possibilities.

But you should not make up alternative format codes for the types that you have decided to support (such as complex doubles).

In other words, if Python ever wants to support complex integers, it should use Zi for the buffer protocol, not I, J or any other variation.

That’s a very good question, but fortunately the answer is simple :slight_smile: It’s just a pair of numbers packed together, first real then imaginary, as you would “naturally” expect. There are no alignment or padding issues given that the two types are the same and machine-sized:

>>> np.array([1+2j, 3+4j])
array([1.+2.j, 3.+4.j])
>>> np.array([1+2j, 3+4j]).view(np.float64)
array([1., 2., 3., 4.])

And so it also naturally maps to Py_complex and C’s double _Complex and other similar datatypes that made the same “natural” decision.

And the C standard itself has a similar guarantee about layout compatibility:

Each complex type has the same object representation and alignment requirements as an array of two elements of the corresponding real type (float for float complex, double for double complex, long double for long double complex). The first element of the array holds the real part, and the second element of the array holds the imaginary component.

It’s quite convenient that everyone made the same natural decision :wink:

Are you arguing that MSVC will insert arbitrary padding between the real and imaginary part of a trivial struct? This is not what I’m seeing here.

@pitrou. I think you’re missing that the C standards do not guarantee that

sometype a[2];

and

struct {
    sometypa a;
    sometype b;
};

have the same memory layouts. I believe they do under all compilers used to compile Python, but the C standard defines the layout of complex “as an array of of two elements of the corresponding real type” for a reason: to guarantee that the C compiler inserts no padding of any kind. Many discussions of this already out there, like here:

Short course: C only defines that there’s no padding before the first member of a struct, so that the address of the struct is the same as the address of the struct’s first member.

Not of any practical importance to Python - yet :wink:.

1 Like

I did not miss that part, but I also don’t think we should care about it.

Since we’re being pedantic, it’s not a matter of compiler, it’s a matter of platform ABI, which is defined by the operating system.

(of course, a compiler could gratuitously deviate from the platform ABI… well, who cares about such a compiler?)

AFAIR, we (CPython) have decided that we mandate IEEE-compatible binary floating-point numbers, even though C probably does not mandate it at the standard level. This would not be the first time we require something from the platform ABI that’s not in the standard.

And requiring that a pair-of-numbers struct have the same layout as an array-of-two-numbers is actually more reasonable than requiring IEEE-compatible binary floating-point numbers. :wink:

Huh. You didn’t mention standards, but merely observed that a specific compiler (MSVC) did give the same layout. Which you will agree is irrelevant to what standards guarantee. Your

There are no alignment or padding issues given that the two types are the same and machine-sized:

simply isn’t true of structs - according to the standards.

but I also don’t think we should care about it.

Well, I, for one, don’t care about it. But I thought that was clear already :wink:

Yes, the CPython implementation does; the Python language does not.

It does not - unless the compiler defines the macro __STDC_IEC_559__, in which case it is mandatory. Not just the float formats, but essentially conformance to all of what 754 says. Which isn’t just a function of the compiler/architecture in use, but also a function of which compiler options are in use.

It isn’t, but it’s rare. I believe there are programs that stick to strictly standard C, but that they’re rare, and more likely to be written for oddball small platforms where “bizarre” architectures haven’t yet vanished.

When C was being worked out, things like word sizes of 12, 18. 24, and 36 bits were common, 6-bit characters were very common, and no two manufacturers used the same floating-point formats. When I worked at Cray Research, under our C “a byte” was 64 bits - the smallest addressable storage unit. “Strictly standard C” has to work the same way on all of those.

It was, though, remarkably easy to get CPython working on Cray boxes, although I left there before that was done. Python’s C source assumed that right shifts of signed ints duplicated the sign bit. But Cray’s HW did not - it zero-filled. Which is the secret origin of pyport.h’s Py_ARITHMETIC_RIGHT_SHIFT macro :wink:

1 Like

I’d like to see @ngoldbaum’s upcoming PEP before voting here. Of course, that won’t happen for 3.15 :‍)

My current recommendation would be:

In 3.16, default to Zd/Zf everywhere (struct, array, ctypes, memoryview). This needs a PEP, and the plan can change.

Whenever we parse the format (struct, memoryview), accept both F/D/G and Zd/Zf – regardless of which of them end up being canonical. This would be nice for 3.15 but can go in 3.16. (In some cases, adding support for 2-letter codes might be significant surgery.)

In array and memoryview.cast, support both F/D and Zd/Zf – re-export what the user specified. If that can’t be done in time for 3.15, revert the 3.15 changes.

In ctypes: In 3.14 and 3.15, document that complex numbers’ _type_, and exported buffer format, may change. Change them in 3.16, with SC approval for skipping the deprecation period.

4 Likes

:+1:

All if this sounds good approach to me, thanks (more aggressive then I even care). (Not sure I follow quite that a PEP is needed but that is irrelevant and up to you anyway. There is certainly room for discussion and a plan change.)

It will be, for sure.

Maybe it would be better to keep stdlib as it was before, for now. Until maybe a new PEP someday, which propose more consistent extension of the struct format syntax.

I’ve not seen any practical suggestions on how to handle this brain split. My attempt, even for much less ambitious plans shows that this is very non-trivial.

So, I’ll prepare reversion of my pr’s, disturbing NumPy people: Revert PR 146237 and PR 146241 by skirpichev · Pull Request #148674 · python/cpython · GitHub.

I hope we can keep changes in the struct module and in the ctypes module, coming with v3.14.

2 Likes

Please quote me in full instead of truncating the sentence you’re replying to:

I did not miss that part, but I also don’t think we should care about it [emphasis mine].

And, really, it’s the second part of the sentence that’s important to me. I’m not interested in theoretical arguments about standards, or whether CPython once worked on legacy Cray machines, and I’ll happily leave that fight to you alone. I’m interested in the practical matter of not deteriorating Python’s usefulness in the scientific ecosystem, on today’s practically existing machines. :wink:

While the SC is deciding what to do, I prepared a draft PR to add Zd/Zf support to array, ctypes, memoryview and struct. The implementation is tricky since so far, array, ctypes, memoryview and struct only supported single character to identify a format: my PR changes character formats to strings.

Note: Currently, my PR leaves D/F formats unchanged.

5 Likes

I did quote that part in full too. That sentence wasn’t truncated, but split;

Of course. I also said that it’s not important to me either. You make it hard to agree with you :wink:.

Not interested in fighting, but am interested in historical context, which is key to understanding how the standards evolved, and why they still cater to “bizarre” architectures. You’re not required to.be interested in that too, but my hope was that especially younger readers would find it educational.

Which is something I’m also interested in, and certainly more directly on-topic in this thread.

@skirpichev was right to correct people who misunderstood what standards actually say here, and as I said to him,

It’s worth giving it some thought, but not much :wink:.

That was all shortly before you chimed in, and the stuff about standards ideally “should have” ended then - unless someone has something new to say about what the standards actually do or don’t guarantee.

I’m weary of repeating that it makes no practical difference in this topic, since all compilers known to be used to compile Python in fact do not insert “mystery padding” inside structs.

1 Like

The SC is supportive of my PR, so I marked the PR as ready for review. I added more tests and fixed a couple of bugs. For ctypes, F/D/G formats are now replaced with Zf/Zd/Zg, instead of keeping F/D/G formats and adding Zf/Zd/Zg. Reviews are welcome :slight_smile:

5 Likes

I think we should:

  • revert the 3.15 changes
  • keep the 3.14 changes in 3.15
  • write a PEP for 3.16 (I can do that, or co-author)

This is the conservative option, and IMO aligns with SC statement

  • following numpy’s lead: the possible interpretations of that cover our existing choices, so I wouldn’t use this as guidance
  • documentation improvements would come in 3.16 (and be backported where it makes sense)
  • not breaking backward compatibility for 3.14: that essentially locks in what we can do for struct & ctypes (except for starting a deprecation period, which we can do in a later beta if Hugo allows it).

The way forward isn’t clear, let’s fall back and regroup.

3 Likes

Hi,

In urgency (to make sure that they land in Python 3.15 beta1), I made two changes to move the Python stdlib towards Zf/Zd formats:

  • Remove F and D formats in array and memoryview.
  • Changes ctypes _type_ from F/D/G to Zf/Zd/Zg – backward incompatible change made on purpose to enhance compatibility with numpy.
  • Add Zf/Zf formats to array, memoryview and struct.

These changes should enhance the stdlib compatibility with numpy in both directions.

Summary of supported complex formats.

Python 3.14:

  • struct (2): F/D
  • array (0): none
  • memoryview (0): none
  • ctypes (3): F/D/G

Python 3.15 with this PR:

  • struct (4): F/D and Zf/Zd
  • array (2): Zf/Zd
  • memoryview (2): Zf/Zd
  • ctypes (3): Zf/Zd/Zg

TODO: recommends F/D or Zf/Zd formats in the struct module in Python 3.16.

Note: Thanks Sergey B Kirpichev for his great work to support complex types in array, ctypes, memoryview and struct!

Example: ctypes

import ctypes
import numpy as np
a = (ctypes.c_double_complex * 3)()
m = memoryview(a)
print(f"Format: {m.format}")
a2 = np.array(a)
print(a2)

Output:

Format: <Zd
[0.+0.j 0.+0.j 0.+0.j]

Example: numpy array to memoryview

import numpy as np
a = np.array([1, 2, 3], dtype='D')
# The numpy dtype 'D' is converted to PEP 3118 buffer format 'Zd'
m = memoryview(a)
print(f"Format: {m.format}")
print(f"tolist(): {m.tolist()}")

Output:

Format: Zd
tolist(): [(1+0j), (2+0j), (3+0j)]
2 Likes