Intro
We propose implementing the precision
specifier of the Format Specification Mini-Language for most integer presentation types, to allow the formatting of integers with a guaranteed minimum number of ‘content characters’, that is the actual digits of the number excluding a base prefix created by #
, underscores / commas created by grouping_option
, and a possible space or sign created by sign
.
For a quick example:
>>> x = 25
>>> f"{x:#08b}"
'0b011001' # only 6 content characters since '#' has eaten two to create the '0b' prefix
>>> f"{x:#.8b}"
'0b00011001' # all 8 content characters since '.8' demands a precision of 8
The former already exists. The latter is what we desire to implement. Currently the format spec docs inform that “the precision is not allowed for integer presentation types”, though I don’t see an immediate technical reason why we can’t do this, and the justifications to do so are sound.
Rationale
When formatting an int
x
of known bounds using f-strings (and other methods such as str.format
), one often wishes to pad x
’s string representation with 0s to a sensible width. For example using f"{x:08b}"
to pad x
’s binary representation to a width of eight bits, or f"{x:04x}"
to pad an ‘unsigned short’ x
to four hex digits. Python also provides the wonderful #
format specifier to prefix the result with the appropriate '0b'
, '0o'
, or '0x'
base prefix, however this eats into the width
number of characters allocated.
>>> x = 13
>>> f"{x:04x}"
'000d' # four hex digits :)
>>> f"{x:#04x}"
'0x0d' # only two hex digits :(
The width
format specifier is for the length of the entire string, not just the ‘content characters’ aka the ‘digits’.
One could argue that since the length of the prefix is known to always be 2, one can just account for that manually by adding two to the desired number of digits. In our example above that would be f"{x:#06x}"
, but there’s several reasons this is a bad idea:
- at a glance
f"{x:#06x}"
looks like it may produce 6 hex digits, but it only produces four, namely'0x000d'
- 6 is thus too much of a ‘magic number’, and countering that by being overly explicit, eg with
f"{x:#0{2+4}x}"
, looks ridiculous - things get more complicated when we introduce a
sign
specifier, egf"{x: #0{1+2+4}x}"
to produce' 0x000d'
- things get even more complicated when introducing a
grouping_option
:k = 4 ; f"{x: #0{1+2+4*k+(k - 1)}_x}"
to produce' 0x0000_0000_0000_000d'
iek
number of hex-groups joined by'_'
- in the future perhaps a
'O'
type
specifier may be added to format a number into C-style octal, with a prefix of'0'
instead of'0o'
, meaning not all the prefixes would be of length 2
This proposal is not a new special-case behavior being demanded of int
data: the precision specifier for float
data ensures that there are a fixed number of digits after the decimal point. For example f"{0.2: .4f}"
produces ' 0.2000'
, the 4 not counting the minimum total number of characters in the string, but the four digits '2000'
.
The only integer presentation type that precision
wouldn’t make sense with is c
(convert to the nth Unicode codepoint). Perhaps a ValueError
could be raised if the user tries using precision with type c
for a int
data, eg f"{65:.8c}"
.
(Personally) rejected alternatives
I’ve mulled over this proposal for a couple of weeks.
My original thoughts were to add in another format specifier ~
, as an alternative to #
. This would create the same 0b
, 0o
, 0x
prefixes as #
, but they would not count towards the width
specifier’s count. This would be mutually exclusive with #
, just as _
and ,
are for grouping_option
- con:
~
and#
are only on the same key on my UK keyboard, but that’s not the case on US-and-elsewhere keyboards - con: more clutter added to the format spec for a single purpose
- con: I hadn’t considered
grouping_option
’s impact onwidth
/precision
at that point
Comparisons between width
and precision
Were this proposal to go ahead, here are some examples of how precision
simplifies things down, and highlights current limitations without precision
for int
s
x |
f-string code | explanation | resulting string | remarks |
---|---|---|---|---|
73 | f"{x:08b}" |
width of 8, binary |
'01001001' |
|
f"{x:.8b}" |
precision of 8, binary |
'01001001' |
same behavior as expected with width |
|
f"{x:#08b}" |
0b prefixed, width of 8, binary |
'0b1001001' |
'0b' prefix stole 2 of the 8 width requested! |
|
f"{x:#.8b}" |
0b prefixed, precision of 8, binary |
'0b01001001' |
all 8 precision chars are given to x , and '0b' tacked on; good |
|
- | - | - | - | - |
300 | f"{x:#08b}" |
0b prefixed, width of 8, binary |
'0b100101100' |
|
f"{x:#.8b}" |
0b prefixed, precision of 8, binary |
'0b100101100' |
same behavior as expected with width since x is larger than 8 bits |
|
- | - | - | - | - |
8086 | f"{x: #08x}" |
leading sign space, 0x prefixed, width of 8, hex |
' 0x01f96' |
' 0x' prefix stole 3 of the 8 width requested! |
f"{x: #.8x}" |
leading sign space, 0x prefixed, precision of 8, hex |
' 0x00001f96' |
all 8 precision chars are given to x , and ' 0x' tacked on; good |
|
- | - | - | - | - |
18 | f"{x:#03o}" |
0o prefixed, width of 3, octal |
'0o22' |
'0o' prefix stole 2 of the 3 width requested, thus x ’s size only padded it to 2 chars |
f"{x:#.3o}" |
0o prefixed, precision of 3, octal |
'0o022' |
all 3 precision chars are given to x , and '0o' tacked on; good |
Teaching
This could replace the default way of teaching formatting integers to fixed widths, eg f"{x:.8b}"
instead of f"{x:08b}"
.
However people coming from C might expect the old behavior . I was pleasantly surprised to discover that printf("%08b\n", 73);
required to produce 01001001
printf("%.8b\n", 73);
is perfectly valid C syntax. The difference between %08d
and %.8d
is that in the former a negative sign consumes one of the 8 width
characters, whereas in the latter a negative sign doesn’t consume one of the 8 precision
characters (source). This is consistent with what we’re trying to implement.
It should at least be used in documentation examples whenever #
is with a desired length, eg f"{x:#.3o}"
for file mode literals instead of f"{x:#05o}"
.
Another example:
>>> def hexdump(b: bytes) -> str:
... return " ".join(f"{i:#.2x}" for i in b)
...
>>> hexdump(b"Hello")
'0x48 0x65 0x6c 0x6c 0x6f'