Choice of complex buffer protocol format intentional break with PEP?

Good work! One background concern, which may well be ignorable.

>>> numpy.array([1j], dtype="D")
array([0.+1.j])
>>> numpy.array([1j], dtype="Zd")
Traceback (most recent call last):
  File "<python-input-10>", line 1, in <module>
TypeError: data type 'Zd' not understood```

That is, numpy already uses ‘D’ for this purpose, and does not accept ‘Zd’ in this context. But numpy arrays aren’t Python array.arrays, and for all I know heavy numpy users (which I’m not) may well not expect them to work alike in any way.

So if no “numpy people” chime in, by all means ignore thiis.

1 Like

I opened PR, that removes F/D from the table (to note) and make a deprecation.

Current situation at least better than you original PR: we don’t have to support duplicated types.

That was exactly my concern about this way. And it seems, NumPy people are not willing to mitigate incompatibilities.

The proposal: gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct by vstinner · Pull Request #148676 · python/cpython · GitHub
Answer: gh-148675: Add Zd/Zf formats to array, ctypes, memoryview, struct by vstinner · Pull Request #148676 · python/cpython · GitHub

BTW, incompatible type codes were introduces, because “people wanted a more general and extensible design”.

For sure, they will (remember how this interface was introduced). ‘F’/‘D’ type codes — user-facing API of the NumPy, buffer codes are more hidden.

And it seems, NumPy people are not willing to mitigate incompatibilities.

Give them time. :wink:
To be fair, the comment made was “After a first look at this: probably not.”
The CPython developers have had more time to struggle with the choices and implications…

I would not be surprised if, in the process of adding 'Ze’and 'Zbf16’(or similar), NumPy developers decide there’s no harm in numpy.array() accepting dtype=’Zf’ and dtype=’Zd’ as well. We shall see. Anyway, Tim’s example of numpy.array([1j], dtype="Zd")not being understood by NumPy can be changed in NumPy if “numpy people” wish it. As Petr pointed out in the SC issue, it’s always possible to accept both values (e.g., both 'Zd’ and 'D’) but only one of them can be chosen when providing a format string. I’m very happy to see that everyone has converged on providing the PEP 3118 format strings. Becoming more liberal in what to accept in a user-facing API can happen as needed/desirable.

I’m looking forward to @ngoldbaum’s PEP and to general documentation improvements!

I’m also genuinely impressed that people here argued for their point of view passionately and based on technical merits and that major changes/improvements were made despite the time pressure. In other places, unrelated to Python and which I won’t name, I’ve seen less effective and less professional levels of teamwork. And, perhaps worse, places were nobody cares.

6 Likes

Well, the main issue is the buffer protocol and there we would certainly add support to import/consume D if Python used it for array.array.

So do we want to allow np.array([1], dtype="Zd")? Maybe, but I am not convinced yet, the only reason would be because Python uses it and users get really confused by it not working.

Otherwise, np.array([1], dtype="complex128"/np.complex128) is nicer anway. We have a much more niche short-hand np.zeros(10, dtype="i,D") for structured dtypes that currently only supports the single character codes.
That doesn’t match struct module syntax though, but sure if people ask for it I don’t care just adding Zd as a supported alias (but I could also see a np.dtype.from_struct() or so to support the identical syntax).

For us it would be adding a 4th (or so) spelling and transitioning users away from D seems not remotely enticing. So yeah, that is not an easy choice to just say “sure adding that is clearly a good API addition”.

Which suggests to me that we (CPython) would be best off leaving complex types out of array.array entirely until coordinated PEP/NEPs settle on a coherent vision. I don’t recall people asking for complex types in array.array, it was more a “purity” thing.

Which was news to me! I didn’t realize dtype supported all-but-self-evident names too. That’s what I’d use.

ctypes doesn’t “type codes”, it uses classes, like ctypes.c_double_complex. It doesn’t accept “type codes” as input, although it exposes them via an instance’s _type_ attribute:

>>> import ctypes
>>> x = ctypes.c_double(3.14)
>>> x
c_double(3.14)
>>> x._type_
'd'

I don’t really case about multiple spellings for the same thing. For a dirt simple example, there are it least 6 ways to spell the ASCII character ‘a’ in a Python string literal:

  1. ‘a’
  2. octal escape - \141
  3. hex escape - ‘\x61’
  4. “short” Unicode escape - ‘\u0061’
  5. “long” Unicode escape - ‘\U00000061’
  6. named Unicode escape - ‘\N{LATIN SMALL LETTER A}’

Such things shouldn’t proliferate “beyond reason”, but backward compatibility and interoperability with major adjacent software are reasons enough.

2 Likes

Well, alternatively we could add ‘F’/‘D’ as aliases for struct/array. That means using 'Zf'/'Zd' for array.typecode’s, just accept a different spelling as input. For the struct module that means removing these types from the table (to notes), but without deprecation.

Though, I would prefer to keep just one variant. Maybe we should wait here for users feedback first.

I hope that Python could be better match NumPy conventions.

This might look too conservative now, but I still support this.

Recent changes are useful (at least for me), but much less than complex types in the ctypes. So, removal of 3.15 changes (array/memoryview) and deprecation of the struct changes from 3.14 — seems to be a good, safe option. Maybe PEP authors could find a better way to deal with NumPy incompatibilities.

PS: Meanwhile, I opened the docs issue about current implementation of the PEP 3118 in the CPython.