Printing binary strings

alexandru_d · April 8, 2025, 7:26pm

Hello all,

Why is the Python 3 interpreter printing byte strings by decoding bytes as UTF-8 (perhaps actually only true for the ASCII analogue codepoints), instead of printing \xNN values? Am I missing something?

Thank you

tjreedy · April 8, 2025, 7:41pm

Because 1) Python 2 used byte strings for text and 2) Many internet protocols still do so. So printable ascii codes are printed as the characters because that is more often what people want and to avoid breaking existing code.

alexandru_d · April 8, 2025, 9:24pm

I do not know that much, but I don’t think that “text” is a data type. Bytes of memory used as data, have values, and there there are encodings which allow for abstraction of that data as characters. So when representing (printing) data, I think it is optimal that you should not mix abstraction levels.

elis.byberi · April 8, 2025, 9:39pm

It’s easier to debug:

b = b'POST /users HTTP/1.1\r\n'
print(b[0:4] == b'POST')

jeff5 · April 8, 2025, 9:58pm

It isn’t. It is decoding them as ASCII, then giving you \xNN as the representation of non-ASCII codes. The reason is historical, that originally Python aimed no higher than C in multi-lingual support. In those days, character meant byte, and countries with funny keyboards would make do with a local work-around (like codepages and other bad ideas).

Officially, bytes and bytearray are arrays of small integers. They aren’t “binary strings”. The default repr() is fairly unreadable if it isn’t ASCII text, but it is easy to code something more suitable to your immediate purpose, like:

' '.join(f'{x:02x}' for x in b)

bschubert · April 8, 2025, 10:07pm

Or a bit more concisely: b.hex(" ")

>>> "café".encode("utf-8").hex(" ")
'63 61 66 c3 a9'

alexandru_d · April 15, 2025, 2:47pm

Thank you all for responding. This behavior is a coherence flaw in my opinion, albeit a minor one. I’ve found the proper workaround to be:

b_utf16 = 'café'.encode('utf-16')
print("b'" + ''.join(f'\\x{byte:02x}' for byte in b_utf16) + "'")

elis.byberi · April 15, 2025, 4:39pm

You’re missing the point — it wasn’t meant to print all bytes using the \x prefix, just the non-printable ones. The standard string encoding has been ASCII since dinosaurs roamed the terminal, and it’s still the de jure and de facto standard in almost all protocols.

Stefan2 · April 15, 2025, 4:39pm

For what purpose is that result better?

alexandru_d · April 15, 2025, 6:34pm

There is no point that you are making.

alexandru_d · April 15, 2025, 6:35pm

Probably up to you to find it.