It’s an accepted PEP and it’s in the “standards track” category. Many projects implement it, treating it (rightfully) as a specification. But if some CPython developer would rather believe it is not a specification, then apparently it’s not.
I don’t want to waste my breath over this. If CPython wants to violate a PEP that its own community accepted 20 years ago, stifling interoperability with third-party libraries that implement the PEP properly, then so be it.
Let’s not discuss how bad the docs are, or if there are holes? Both historically and practically, NumPy is more important than holes in the docs. I.e. in practice projects will implement exactly as much as they need to talk to NumPy, Cython, nanobind, and pybind11 (as you can see, nobody probably even noticed memoryview not liking the correct format JAX uses; and most simply don’t need structured types or byte-order stuff at all).
There are two questions here:
Does it make sense for Python to just use Zd in the buffer protocol for array.array and maybe also for ctypes or is this impractical? (if it doesn’t for ctypes, then you may want to keep array.array consistent with it just for the heck of it.)
If it is practical, would it be reasonable to break ctypes on grounds of it being a meaningless part of the buffer protocol ecosystem?
And otherwise, there is just the relatively straight forward asks I said above, I think:
Python should accept Zd as the correct and correct/dominant spelling indefinitely, probably. And since it isn’t hard, let’s just implement for importing everywhere out of principle (unless there is a very hard corner somewhere, but I doubt so).
We should make sure there is no documentation that tells people that D is the correct spelling here (e.g. in the 3.15 release notes). Unless there is a warnings that Zd is the actual adopted main standard for the time being.
The reason for both is just to be clear that users who want maximum interoperability (i.e. the typical reason to implement it) must not start exporting D for a few years at least (or maybe at least only for something like Python 3.16/3.17+ if they are happy to do Python version switches but ).
We don’t even have to answer those question in total depth. I am not sure how much it is worth it, because as I said above… This isn’t the end of the world. You are basically extending the buffer protocol for the sake of Python’s convenience. That is mildly annoying for giving two ways of spelling things, but otherwise it doesn’t break anyone (unless they are naive enough to follow Python).
This is rather a separate issue. memoryview probably could accept the = qualifier in simple cases (I can’t immediately think of a case where it would have a different effect than then default @).
Sadly, the process is broken – the PEP was marked “Final” without the changes that it specifies being implemented. Note that it specifically describes changing the the struct module, ctypes, and the C API; the changes were objectively not done.
Whether such a PEP should be binding 20 years later, that’s a question for the SC – one that’s now answered. You can appeal of course.
I’d hope NumPy would be taken into account when adding new stuff. But that’s NumPy, not necessarily PEP 3118. Wording like “complex (whatever the next specifier is)” works in a change document, but as documentation of an interop format it’s unfortunate at best.
Well, as I explained more in detail in PEP 3118: Add canonical-doc & mention unimplemented changes. by encukou · Pull Request #4200 · python/peps · GitHub, NumPy was the primary motivation for PEP 3118 and its author @teoliphant . While the section about format codes is unfortunately titled “Additions to the struct string-syntax”, the entire PEP is about specifying the new Python 3 buffer protocol, and has to be understood as such. The “struct string-syntax” additions are really format codes for the buffer protocol, and as such many of them have been implemented long ago (by NumPy and other 3rd party projects).
More generally, it’s useful to remember that PEPs - especially “standards track” PEPs - standardize behavior at the ecosystem level, and that CPython is not the only point of reference. Commonly, CPython is the first implementor and third-party projects tend to lag a bit, but here the reverse has happened.
I really think this is all just an “ooops”, and whether I say it’s violating a PEP or not doesn’t matter. It still is just an oops.
That said, I think we should just agree that “Python the community” adopted the PEP (at least this part). In fact, this is clearly a vital part to the PEP for the community (we clearly need to exchange complex numerical data!).
Python the project is basically irrelevant with respect to broad dtype support, which is presumably the main reason it wasn’t implemented.
Python the project provides the vital infrastructure and provides the specification, though, but it (+the core libraries) is probably just not an important user at all.
Some of the proposed struct module additions look far from straightforward; I find that section of the PEP significantly lacking in details and motivation.
or
I think a lot of this discussion needs to go back to python-dev; with luck, we can get some advice and clarifications from the PEP authors there. I’m not sure whether it’s appropriate to modify the original PEP (especially since it’s already accepted), or whether it would be better to produce a separate document describing the proposed changes in detail.
How we can be sure that they do this properly? Assuming you convinced, that PEP here “lacking details” — are there better documents in the NumPy docs? No, I don’t see them: they just reference the PEP.
I think it shouldn’t be much difficult than for the array module in above patch.
That’s something I would like to avoid. Better if either CPython and it’s stdlib OR the Python ecosystem will evolve to some one convention ('D' or 'Zd'). And I believe that there is some sense to change conventions in the NumPy.
More generally, it’s useful to remember that PEPs - especially “standards track” PEPs - standardize behavior at the ecosystem level, and that CPython is not the only point of reference.
Yes, but: they should be written that way. PEP 3118 doesn’t read, and more importantly work, as an interop standard. As I read the document, I can’t help inferring that this job was left to the struct docs, and never finished.
We should definitely, 100%, be looking at NumPy when touching the struct/array/ctypes format codes. (Note the See Also boxes in their docs, added specifically to nudge contributors toward not diverging more.)
Let’s add more projects to look at, and/or write an actual living standard. But, let this immutable and underspecified PEP rest; it has served its purpose.
Process-wise, ecosystem-standard PEPs really need at least a reference in the CPython docs where/when they touch the reference implementation. Requiring knowledge of a vaguely written 20-year-old document to contribute to CPython is not practical.
(Note again the NumPy mentions in See Also boxes – if we had a doc that’s usable as a standard, that’s where they’d point.)
Well, this is maybe the main reason why I started this thread, this evolution isn’t obvious. Now, I am always for being pragmatic and just doing such a switch after assessing that the fallout should be small (i.e. look at impact rather than the “classification” of the change).
But the correct starting classification here is that transitioning from Zd to D is a Python stable API break! (Which sounds much worse than it is )
And that means, without piggy-backing it on another buffer protocol extension, the best you can do is to very slowly deprecate it. But if you start deprecating (in say 3.16+) you are accepting that downstream needs version specific switches.
Plus, I am not sure that NumPy can follow that deprecation, because breaking interop with Python’s memoryview is far less important than breaking a handful of small stable API extensions that might exist. (I.e. we may want to wait for a 3.0 and a few years at least.)
Indeed, and I think we may also want to step back a bit.
NumPy and several other packages have implemented format codes as specified in PEP 3118 (regardless of whether some people think PEP 3118 is an actual “specification” or not )
Much later, CPython comes and decides to implement the same data types, but with different format codes, because PEP 3118 has ostensibly become “historical”… by decision of some CPython core devs.
I don’t think CPython should be considered a special citizen in the ecosystem. For all practical purposes, the standard array.array module is not a special citizen, it’s actually a niche module that probably gets little production use (i.e. breaking something in array.array is much less severe than breaking something in NumPy).
Obviously some project(s) will have to shoulder a compatibility burden but, given the above, that project should logically be CPython.
(I’m saying “CPython” because, while there are other Python implementations, I’m not sure they have caught up with those recent changes in array.array – PyPy hasn’t)
I think this raises a larger question about the scope of PEPs. Can they reasonably set the standard for an entire Python ecosystem? Or do they primarily reflect the reference implementation of CPython? I don’t have the answer here.
For the specific issue on the table, I don’t think there was ill-intent by anyone. I appreciate those advocating for NumPy since it is the “de facto” standard for arrays at scale. I also think everyone from the PEP editors and SC have been doing their best and are human.
So, where do we go from here? Perhaps:
Outline the specific impact on NumPy now in a single document.
Determine what, if any, mitigations CPython and PEP process can take to address this impact.
See how we can work together to improve arrays for everyone.
I don’t have an answer either, but this issue reminds me of the interoperability standards we have (and define via PEPs) in the packaging community. They are created via a PEP, but then retained as “living documents” in the packaging specifications documentation, and updated either via PR (for small changes) or follow-up PEPs. It’s arguable that the PEP process is too heavyweight for some of the standards we apply it to, but it does have the significant advantage that it explicitly requires people to consider how tools will adopt the standard, how users will be taught about the new standard, etc.
Core Python has something similar in informational PEPs like the WSGI and DB-API standards, but I don’t think the core process has the same focus on interoprability (as opposed to formal definition) that the packaging process does.
There’s plenty that’s less than ideal with the packaging process, but maybe the core could benefit from something similar for situations like this? Although there’s an argument about whether the core is even the right “owner” for this sort of standard, or whether the core should be considered one of the clients of the standards defined by the scientific community (as @pitrou suggested). That would put the onus on the scientific community to involve the core, rather than the other way around.
This seems like a good approach for the immediate issue, where my comments above are more about the longer term.
I agree that it feels like some of the packaging stuff we have been slowly making good progress on with standards. Thanks for sharing that. It was going through my mind when I typed the last response.
I am not sure about the PEP process question. I agree, this PEP defines an interoperability protocol and that does seem similar to the packaging PEPs. (EDIT: One should clarify that parts of the PEP do, not all of it)
But I think it also has to be a PEP because it defines infrastructure that provided by the Python core and honestly, it’s a great place to host this if willing (because in the end it isn’t just one community here as well, I am sure)!
I am actually happy to see things moving, in the end this is just a really minor nuisance! But, originally, I had hoped Python can just switch to use Zd and be done with it.
From an impact perspective… If Python finds it inconvenient or not possible to use Zd that just means we extended the buffer protocol with a second way to spell Zd:
We will have Zd and D convention used side-by-side indefinitely.
This is surprising but overall a pretty small nuisance (mainly user should use Zd for now for export, and can considering starting to use D in a few years).
Downstream will add support for D so that you can e.g. pass array.array to cython functions.
This is a very easy fix, but it may take time because you only notice it doesn’t work when a user tries it. So then downstream may say: “Yeah, this is due to a Python ooops, let’s add support for D”, and that should be that.
For Python: When importing you should probably accept Zd (very easy anyway).
Honestly, it’s easy and very little churn. The hardest part in all of this is fixing up the docs to remove the confusion in the future.
Play fair. @skirpichev quoted Mark verbatim, and linked to his original comment for those who want full context.
Mark resigned his CPython core dev position well over a year ago, and is rarely active here anymore. It’s unlikely we’ll hear from him. He was a major contributor to both CPython and numpy, and always worth listening to.
You know, I read Mark’s message that was linked to, and it doesn’t talk about the format code for complex numbers. Which is what we’re discussing here (see discussion title).
It’s one of the things being discussed here, yes. but not the only thing. Unless I’m mistaken, you too believe the existing docs are inadequate.
Touche!
I don’t know what this refers to., @skirpichev quoted Mark verbatim, with attribution, and linked to Mark’s original. He didn’t “put words in” Mark’s mouth. He pointed them out,.
In any case, PEP 3118 is - as it says at the top - only of historical interest now.
Much of it wasn’t implemented, neither at the time nor later. The PEP is also irrelevant to that now:
Not all features proposed here were implemented. …
This PEP targets Python 3.0, which was released more than a decade ago. Any proposals to add missing functionality should be discussed as new features, not treated as finishing the implementation of this PEP.
I agree it’s easy to live with minor glitches for the single type raised in this topic. But it’s not “one and done”. An unexpected offshoot of the rise of machine learning apps is a proliferation of new floating-point formats, with ever less precision. There are two 16-bit floating types in wide use in that application area now, and NVIDIA has newer HW support for an 8(!)-bit floating type too.
numpy is also “behind the curve” on those, but it’s just a matter of time.
The larger issues persist regardless of what’s done for complex-number types.
It’s really the primary reason people are posting. The docs being “inadequate” didn’t stop PEP 3118 from being implemented broadly – and correctly – in the ecosystem, because those implementors understood how to interpret it (or, I suppose that, when they didn’t, they were wise enough to look around and see what other implementors did – especially NumPy).
CPython and the SC do not have a magic power to decree that an existing widely-used standard is “historical”, especially if they don’t have anything marginally better to propose. That’s like the Académie française trying to tell people that the French they speak is not real French [1].
This PEP is very much being relied on in production, every day, for exchanging data between packages of the Python scientific ecosystem. It’s actually an extremely successful PEP that has opened up possibilities that didn’t exist with the legacy Python 2 buffer protocol.
In any case, this was already explained above, and it would probably help if we didn’t rehash the same arguments over and over.