Add bfloat16 support to struct

hpkfft · March 28, 2026, 7:23pm

I’d like to suggest that the bfloat16 type is now popular/useful enough to be worth a struct format character.
For example, C++23 has std::bfloat16_t.
Some compilers (e.g, gcc, clang) support __bf16 for C and C++.
Intel and AMD are adding AVX10.2 to x86_64, which will support bfloat16 arithmetic on both server CPUs (e.g., Diamond Rapids) and client CPUs (e.g., Nova Lake). For AArch64, bfloat16 instructions are added with the feature FEAT_SVE_B16B16.

If folks agree, what format character to use? How about m, for machine learning? I don’t love it, but I don’t have a better idea. I would suggest reserving g for std::float128_t. Ctypes uses g for long double, which may be considered EOL by the time we want to add 128-bit floating-point numbers….

storchaka · March 28, 2026, 11:32pm

How does it differ from the ‘e’ conversion code?

hpkfft · March 29, 2026, 12:02am

The e code is used for IEEE binary16 floating-point numbers. These have 1 sign bit, 5 exponent bits, and 10 significand bits. A C compiler names this type _Float16.

A bfloat16 has 1 sign bit, 8 exponent bits, and 7 significand bits. It’s inspired by IEEE binary32 (float), but with less precision.

Here’s 1.0f (binary32):

            S ---E8--- ----------F23----------
    Binary: 0 01111111 00000000000000000000000

and 1.0bf16 (bfloat16):

            S ---E8--- --F7---
    Binary: 0 01111111 0000000

and 1.0f16 (binary16):

            S -E5-- ---F10----
    Binary: 0 01111 0000000000

Rosuav · March 29, 2026, 12:47am

What’s the use-cases for this? Is this for parsing/creating binary files that use this as a field? Or is it meant to be a way to do different precision arithmetic?

mdrissi · March 29, 2026, 1:24am

Arithmetic. It’s used frequently in machine learning models. Often focus is reducing memory usage. For some hardware improving speed as well, but I have seen use cases where memory savings was main goal and efficiency was secondary. If you have a model that needs 40 GB of ram to run on fp32 then having 20 GB version for bf16 is valuable as it allows more machines to actually be able to fit it at all.

It does reduce disk storage as well but usually in memory savings are what I’ve encountered as the big motivation.

edit: There exist 8 bit reduced precision data types used in ML too, but those are much rarer. 16 bits in practice is enough for a lot of models. And typically training of these models uses a mixture of fp32 and bf16 with some operations more important for numerical stability left in fp32 and others in bf16. For model inference bf16 tends to be enough.

hpkfft · March 29, 2026, 2:10am

The heavy-duty arithmetic would be done in a Python extension written in, say, C++. The Python struct facilitates data transfer between the main Python application and the C/C++ layer.

tim.one · March 29, 2026, 2:42am

Python’s struct added support for 16-bit floats in the 3.6 release.

Changed in version 3.6: Added support for the 'e' format.

Python 3.14.1 (tags/v3.14.1:57e0d17, Dec  2 2025, 14:05:07) [MSC v.1944 64 bit (AMD64)] on win32
...
>>> import struct
>>> struct.pack('e', 1.1)
b'f<'
>>> struct.unpack('e', _)
(1.099609375,)

Is that not what you want?

hpkfft · March 29, 2026, 2:52am

No, it’s not.

>>> u, = struct.unpack('H', struct.pack('e', 1.0))
>>> bin(u)
'0b11110000000000'

I want ‘0b11111110000000’

tim.one · March 29, 2026, 3:34am

Ah! Got it. Python only supports one kind of floating-point format for 64-bit, 32-bit, and 16-bit flavors of floats, all following IEEE-754 specs. Best bet is that a complete PR may be accepted, meaning one that does everything:

Adds the new type to struct, memoryview, and array.
Tests.
Doc changes.
NEWS blurb.

But I think that will have to come from a motivated contributor, and it will face stiff resistance if, e.g., numpy doesn’t yet support it. Adding support for binary16 dragged on for some years, but didn’t require a PEP because “well, it’s an IEEE standard, after all”. And was helped out enormously by Mark Dickenson, who has since resigned his core dev position.

Work out replies to “will it ever end?” These aren’t the only 16-bit float formats, although they’re (I believe) the most widely used.

Reworking struct to have a richer notion of type codes and pluggable packer/unpacker helpers would be the way to go if “will it ever end?” is answered with “maybe not!”, but that would certainly require a PEP.

ngoldbaum · March 29, 2026, 3:48am

NumPy doesn’t yet have built-in bfloat16. It’s certainly come up but for the most part the version of bfloat16 provided by jax’s ml_dtypes does the job for most people. The use-case is also less compelling for NumPy computationally, since CPUs don’t have hardware bfloat16 support, it would mostly exist for compatibility and interop with GPU code.

IMO this request is symptomatic of a larger issue with the buffer peotocol: it’s not extensible to support data besides the concrete list of types supported by the struct module. I’d prefer to see a standardized scheme for buffer type provider and consumer packages (see @seberg’s pre-proposal here).

hpkfft · March 29, 2026, 4:29am

Thanks, Tim. I’m tempted to submit a PR, though I was thinking not to add support to array, since that seems not to support e. Maybe supporting array could be a follow-up PR if the first one is successful. In which case, I’d add e as well. And maybe complex D, E, F.

I’m hoping this doesn’t require a PEP because, “well, C and C++ support it.” That’s kinda my response to, “will it ever end?”. If a floating-point type is important enough for C and C++, it’s important enough for Python.

May I ask your suggestion for a letter code? Does m seem good? I’d like also to add M for complex numbers where real and imaginary parts are bfloat16. My day job is an FFT library, and FFTs use complex numbers. (And FFTs are one way to implement convolutions for convolutional neural networks.) Mostly, I like orthogonality. Since Dand F are supported, why not E and M?

skirpichev · March 29, 2026, 4:34am

It’s fun to see this glued together

But I see what you want: roughly same range as for IEEE single-precision, but with 1/3 of precision. Sorry, I doubt this thing is useful for doing arithmetic. That’s why this type is missing in the IEEE 754.

IMO, it’s better suited for an external extension.

There are some other crazy “floating-point” formats, like 4-bit floats. Should we add support for them?

That it’s not a strict requirement. E.g. recently we added complex types (available in the C standard annex) to the struct, the array module and to the memoryview. But I think that this case worth a PEP.

There is a pr, adding this support.

Now we have support for complex types (float/double) in array/memoryview. This was merged recently.

There is no 'E' formatting type, as we don’t have a suitable complex type in C.

Are we still talking about bfloat16? If so, no. This type is missing in the C standard, as far as I know.

tim.one · March 29, 2026, 4:44am

But there’s currently an open issue to add support for e to array too. I expect that will be accepted, and then that raises the bar for all future extensions. See “will it ever end?” earlier

That used to carry much more weight. All languages are suffering from “will it ever end?” now, and “it’s in C and C++ now” isn’t treated as mandatory anymore.

That’s kinda my response to, “will it ever end?”. If a floating-point type is important enough for C and C++, it’s important enough for Python.

But, as before, it’s apparently not yet important enough for numpy, and what numpy does still carries major weight in Python-world. They’re family. The C++ committee is a bunch of shady alien eggheads who apparently can’t say “no” to anything

I’m too old to bikeshed when then there are no obvious choices to pick from. Suit yourself! m for “machine learning” is certainly defensible. But the whole “1-letter type code” business exceeded its comfortable limits already. It’s looking ever-more arbitrary .

hpkfft · March 29, 2026, 4:57am

Hi Nathan,
I’d like to see NumPy add complex float16, bfloat16, and complex bfloat16, but I think that should be a separate discussion in a NumPy forum. Note that ml_dtypes does not really add support for bfloat16 to NumPy. It adds a void object type that maybe is OK for some things, but fails for interoperability. For example:

>>> import numpy as np
>>> import ml_dtypes
>>> a = np.array([3, 7], dtype=ml_dtypes.bfloat16)
>>> a
array([3, 7], dtype=bfloat16)
>>> a.__dlpack__()
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    a.__dlpack__()
    ~~~~~~~~~~~~^^
BufferError: DLPack only supports signed/unsigned integers, float and complex dtypes.

Note that DLPack does support bfloat16, but NumPy does not recognize ml_dtypes.bfloat16 as a genuine bfloat16 data type.
The solution is for NumPy itself to support bfloat16.
Of course, it can do that with or without Python struct supporting bfloat16.

Please note that future x86_64 CPUs will support arithmetic instructions (add, sub, mul, FMA, div, sqrt) for bfloat16. Compiler support is already available; hardware is not yet released.

I suggest that C and C++ support, together with CPU hardware support, is good reason to add support to Python struct. But, I’d like some more buy-in, philosophically, before I invest time in coding.

Rosuav · March 29, 2026, 5:03am

What’s the advantage, though? I asked earlier, and will clarify: Adding support to struct is not going to give you access to CPU arithmetic operations. So why should struct support these?

hpkfft · March 29, 2026, 5:09am

Thanks, again, for your time. I agree with the importance of NumPy; my strategy was to add support to Python struct first (because it’s much easier) and then use that as evidence as to why NumPy should support it too.

I’d also like to add that PyTorch supports bfloat16 (including DLPack support)

>>> import torch
>>> a = torch.tensor([3.25, 7.5], dtype=torch.bfloat16)
>>> a
tensor([3.2500, 7.5000], dtype=torch.bfloat16)

and so does JAX

>>> import jax
>>> from jax import numpy as jnp
>>> a = jnp.array([3.2500, 7.5000], dtype=jnp.bfloat16)
>>> a
Array([3.25, 7.5], dtype=bfloat16)

and I’ve verified that both interoperate with a C++ extension made with nanobind.

skirpichev · March 29, 2026, 5:14am

I don’t think so. The 'e' format type is supported long time ago in the struct module. It was also supported by the memoryview. Allowing one in the array module rather reduces inconsistency (for no good reasons) between stdlib modules. But strictly speaking it doesn’t add support for some new data type. Minor feature, as it is. Hence, no PEP.

BTW, newer C standards (AFAIK since C23) have (optionally) the _Float16 type. So, in principle, we could add conditional support for this fundamental data type in the ctypes module.

I wouldn’t say it’s easier. So far I also don’t see answer on “why should struct support these?”

tim.one · March 29, 2026, 5:18am

I believe bfloat16 is the IEEE 32-bit float format, but with the last 16 bits chopped off. So same dynamic range, but much less precision.

Whether it’s useful for doing arithmetic is app-specific. Certainly not for general use, but for masses of low-precision computations it can shine. Far as I know, the only significant HW support in is GPUs, though. For most of the world, it’s just another storage format.

Please don’t give people ideas

In context, I wasn’t talking about bfloat16, but a hypothetical reworking of the struct module to free it from 1-letter type codes, and adding a way to register pack/unpack helper functions for any new formats users may care to add. That would absolutely require a PEP.

I was unable to tell for sure, but I’m not willing to pay $ for the latest version. Certainly not through C17. Gemini says:

The upcoming C23 standard (along with C++23) is working to normalize std::bfloat16_t.

tim.one · March 29, 2026, 5:29am

I don’t understand. You’re saying you expect that adding e to array will be rejected? If so, why? As you well know, it’s already supported by struct and memorview, and the cross-module inconsistency is pointless. As these things go, making array consistent too is trivial.

hpkfft · March 29, 2026, 5:32am

Hi Chris,
My understanding is the the Python struct is intended for interoperability with the C and C++ layer, i.e., Python extensions written in C or C++. (Sorry, I keep saying C as well since it’s supported by some compilers, but as noted in a comment above, bfloat16 is not (yet) in the C language standard. Personally, I think it will happen, but that’s just a guess.)

Why was e for float16 added to struct? It doesn’t give access to CPU arithmetic operations. The same is true for f for 4-byte float. Sorry, I’m not trying to be difficult. I guess my thinking is that bfloat16 is becoming a first-class floating-point type, and Python seems to be highly regarded for machine learning. Interoperability is a good thing. As Tim mentioned, memoryview support would also be necessary. Maybe I’m not answering your question, but I don’t have anything better to say. It’s very valuable feedback to me if this discussion concludes with advice not to pursue this.