Behavior of struct "?" format (native _Bool)

Hello,
The docs for the struct module state that the conversion between C and Python values should be obvious given their types. This is not quite true for the "?" format, i.e. native _Bool.
I find the docs confusing (which may be due to references to the C99 standard that I don’t have access to). I’d like to make them clearer (and possibly adjust implementation to match), but for that I need a solid understanding the intent.

(This was previously discussed in bpo-39689, whose scope is only fixing tests that broke with clang 10.)

In one place the docs say:

The '?' conversion code corresponds to the _Bool type defined by C99. If this type is not available, it is simulated using a char.

As far as I know, the C99 _Bool only has two valid values, 0 and 1. Something like char c=2; _Bool b=*(_Bool*)(&c); is undefined behavior. For struct, this could mean that b'\x02' is an incorrectly packed ? struct and anyone unpacking it should expect undefined behavior. This is what the current implementation does.
Until recently, all tested compilers used the same semantics as below in this case, but that is changing with Clang 10.


Elsewhere the struct docs say:

For the '?' format character, the return value is either True or False. When packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.

This is spelled out quite clearly, but may be read as contradicting the above quote.
It may also be non-trivial to implement correctly, as Stefan Krah mentions in a bpo-39689 comment:

You could determine sizeof(_Bool), use the matching unsigned type,
unpack as that, then cast to _Bool. But do you really want to force
that procedure on all array libraries that want to be PEP-3118
compatible?


So, I see three possibilities for struct.unpack('?', b'\x02')[0]:

  • it triggers undefined behavior: in practice it gives True with some compilers and False on others
  • it is an incorrectly packed array: CPython will helpfully always give True to avoid UB, but other libraries are free to do anything
  • it is defined to be True

And two possibilities for struct.pack("?", x), which are equivalent in practice but I don’t know if C99 guarantees it:

  • Either (_Bool)0 or (_Bool)1 is packed
  • Either (uint_8)0 or (uint_8)1 is packed

So the questions are: What do the docs mean? What do implementers of PEP-3118-compatible libraries think they mean? And of course, what should be done?

1 Like