PEP-515 (underscores in numeric literals) and .format width specifier

dalton · December 2, 2022, 3:49pm

Hello all,
I stumbled across this behavior and wondered if it was done for a reason. When trying out underscores in hex strings, I would have expected the underscore to NOT be included in the width of the numeric literal, but it is:

# 32-bit hex value, 0x0ee1_f00d
print('0x{:08X}'.format(0x0ee1_f00d))
# prints 0x0EE1F00D (0x plus 8 digits; great)
print('0x{:08_X}'.format(0x0ee1_f00d))
# prints 0xEE1_F00D (0x plus 7 digits plus _)
print('0x{:09_X}'.format(0x0ee1_f00d))
# prints 0x0EE1_F00D (0x plus 8 digits plus  _)

This problem is going to compound given that I also have to deal with 64-bit values, 48-bit values, 24-bit values, etc.

Thanks in advance for any help/explanations.

BowlOfRed · December 2, 2022, 4:47pm

As the field width is often used for column alignment and formatting, I would think it odd to not include any character in the width. Rather than “how many digits should be printed?”, it’s “how wide should this field be in the output?”.

steven.daprano · December 1, 2022, 7:49pm

Underscores in numeric literals are irrelevant. There is no difference to Python whether you write 0x0ee1f00d or 0x0_e_e_1_f_0_0_d or 249688077, they are all the same number.

The format width field is the minimum width of the string, and it includes everything in the string: spaces, zeroes, decimal points (for floats), leading negative sign, everything. Why should underscores be different?

I don’t understand what “problem” you are experiencing.

What is the width you want the string to have? If it is eight hex digits plus an underscore, that makes 9 so your minimum width is 9. Where is the problem?

(Remember that if the string is larger than the minimum width, it will not be truncated.)

tjreedy · December 2, 2022, 6:18pm

From string — Common string operations — Python 3.11.0 documentation
" width is a decimal integer defining the minimum total field width, including any prefixes, separators, and other formatting characters."

>>> print('0x{:8_X}'.format(0xeee1_f00d))
0xEEE1_F00D # 8 minimum, 9 needed