Add bfloat16 support to struct

I don’t think it matters for the purposes of this suggestion if MSVC never supports 16-bit float types of any kind. It could matter to ctypes, but not to struct. We’re not proposing to do arithmetic on such types, just to unpack/pack the storage formats to/from Python’s floats (64-bit floats).

If, e.g., an ML researcher wants to pick through a 2 GB file of bfloat results on Windows, using Python, the compiler used to build Python is irrelevant. Nothing about bfloat matters in that context except the raw bits, which Python can rewrite into native 64-bit float format with no support from the compiler (beyond general-purpose int arithmetic, shifting, and logical operations).

Likewise in the other direction. There’s never a need for Python to do any bfloat arithmetic.

That some compilers do support it isn’t a strong argument to me. It’s only suggestive of that the type does have enough use to convince other OSS projects to devote much more extensive (than is being asked of Python) efforts to support it “for real”.

What’s being asked for here is relatively tiny (in the absence of “scope creep”), and implementation of the little that’s wanted is wholly divorced from compiler support for the type.

2 Likes

Neither complex types. Yet, they are available in the ctypes conditionally.

But it could. What do you think on this? Strongest argument I have seen here so far (besides it’s backed by the C standard): numpy has it.

Fine by me. My understanding is that the only mandatory floating types in C++23 remain float, double, and long double. But their bit widths are not defined by the std. Most variance in real life now is probably in long double, which is most often IEEE quad (128 bit) or double extended (80 bits).

All other float types are optional, including, e.g., float64_t. They all have defined bit widths.

The inclusion of complex in struct had nothing to do with standards, but with that Python itself offered its own complex type early on.

Primarily that it’s out of scope for this topic :wink: I don’t see much sense in ctypes trying to deal with types that aren’t native to the platform C. ctypes seems intended to expose platform quirks. In which case it “should” support float types of all and only the kinds the platform C supports.

So, e.g., it’s fine by me that ctypes supports no flavor of complex on Windows. But struct must. because Python supports “complex” on all platforms, and part of struct’s job is to create portable representations (if you use it in a mode that suppresses native C padding and endianness decisions).

I never really grasped why array does not support complex numbers. Although I never had a real use for that myself, so it never rose to the level of a scratch I had to itch :wink:

Given that this conversation ended up in the IEEE float16 “e” being added to array.array, I wonder if it wouldn’t be worth adding these bfloat16 to array.array right now as well.

Maybe it would be less confusing than adding a 16bit FP value in Python 3.15 that is “not quite usable in some ML workloads”, and then come back 1 or 2 years later with another 16bit FP, that can be used in those workloads.

(And as a consequence, just put forward adding it to struct as well).

If these get rolling, I’d can help immediately with the “grunt work” for docs and stuff (up to full implementation if that is the case).

1 Like

IEEE 16-bit floats is not a new thing for the struct module & co. It’s not a new storage format.

But this is a new type. This requires entirely different argumentation. Why it should be added? Why not add also other data types, introduced as language extensions by some compilers? E.g. __int128. And so on.

There are not only ML tasks.

Coding is not the problem here. But IMO it deserves a PEP, where possible alternatives could be explored.

1 Like

Agreed - and I believe most of that is in the thread already.

The new factor that motivated my posting today is just that ‘e’ has just been added to array.array (and yes, it’s been in struct for years), and maybe nowadays the bfloat16 format has more uses “in the real World” than the implemented _Float16, and it would look weird to offer one format right now, and come with another format in one year of so.

That said, nothing that is not work-aroundable by a 3rd party package, and certainly there could be one to just add all types with a minimum traction around.

You do not need the bfloat16 support in struct to use bfloat16. You can unpack bfloat16 as 2-byte bytes object or 16-bit integer and then convert it to float using bit manipulations. You can do this in pure Python, you can write an extension containing the conversion functions and put it on PyPI.

What support of bfloat16 in struct gives you – it is convenience (and a little performance). But you only could benefit from it several years after release of Python version containing this feature. In meantime, you will need to use conversion functions, written by you or by third party.

With such long time perspective, I think it is time to design a way to register custom types. If this mechanism was in place, it would be easy to use third-party providers with all benefits of builtin support. We can finally add builtin support for bfloat16, bfloat8, float24 (needed for WAV files, I guess?), float80 (really was needed for AIFC files), float128, int128, etc, you could use these types long time before this.

For perfomance, we need the C API for registering custom types, but for convenience, we need also the Python API. The registered item should provide four element:

  • size
  • alignment
  • packing function
  • unpacking function

I do not have good ideas about supporting variable-sized types, like Python or C strings, so I think we can do without this.

Global registry should be enough, like for encodings and error handlers.

struct, memoryview and array should support types with long names, the syntax should not conflict with possible future support of nested structures and arrays (see PEP 3118).

5 Likes

Maybe. Maybe not. How to verify this claim?

Again, there is nothing new in the 'e' format. It was here for a decade: Add half-float (16-bit) support to struct module · Issue #55943 · python/cpython · GitHub (BTW, see on argumentation for adding this). It was supported by the memoryview also for several years.

Adding that type to the array module improves consistency across stdlib modules, but it not brings something really new. Adding the bfloat16 storage format — does.

Exactly. That’s why I think that the PEP is desirable here.

This I can get behind. It would be awesome if Python could standardize a way to do this.

One consequence of this if it were threaded into the buffer protocol is that Cython would be able to use this interface to enable passing buffers directly to native functions as typed memoryviews using custom types.

NumPy and Arrow both have elaborate APIs for defining custom data types, there’s a lot of prior art here.

7 Likes