Implement `precision` format spec for `int` type data

jb2170 · April 1, 2025, 6:30am

Implement `.` ‘precision’ for f-strings the same as the existing behavior for %-strings

Preliminary: Binary Representations

Expansion

Observe that one can always extend a signed number’s binary representation by extending the the leading digit prefix:

-19 char (8 bit)                          0b11101101
-19 int  (32 bit) 0b11111111111111111111111111101101
 47 char (8 bit)                          0b00101111
 47 int  (32 bit) 0b00000000000000000000000000101111

This is what C with twos-complement can do

printf("%#hhb\n",              -19); // 0b11101101
printf("%#hho\n",              -19); // 0355
printf("%#hhx\n",              -19); // 0xed

printf("%#b\n", (unsigned char)-19); // 0b11101101 same as 237 mod 256
printf("%#o\n", (unsigned char)-19); // 0355
printf("%#x\n", (unsigned char)-19); // 0xed

printf("%#b\n",                -19); // 0b11111111111111111111111111101101
printf("%#o\n",                -19); // 037777777755
printf("%#x\n",                -19); // 0xffffffed

This also generalizes beyond ‘standard’ machine widths of powers of two of course.

Contraction

Conversely one can losslessly truncate a signed binary number’s representation to have only one leading 0 if it is non-negative, and one leading 1 if it is negative:

 5 as 0b00000101 -> 0b0101
-3 as 0b11111101 -> 0b1101

If one were to truncate another digit off these examples, then both would end up as 0b101, 5 indistinguishable from -3 when using only 3 binary digits because they are the same modulo 2 ** 3.

Variable width and minimal width binary representations

Therefore to losslessly, unambiguously represent a signed binary number, let us define a convention for ‘variable width and minimal width binary representations’.

The leading binary digit represents the -2^{n-1} value column, the other digits representing columns of value 2 ^ {n-2} \cdots 2^{0}

1 digit : 0 =   0b0 ,                                                                            -1 =   0b1
2 digits: 0 =  0b00 , 1 =  0b01 ,                                                   -2 =  0b10 , -1 =  0b11
3 digits: 0 = 0b000 , 1 = 0b001 , 2 = 0b010 , 3 = 0b011 , -4 = 0b100 , -3 = 0b101 , -2 = 0b110 , -1 = 0b111

In general n digits can represent the range [-2^{n-1}, 2^{n-1}-1), range(-2**(n-1), 2**(n-1)) in Python syntax, permitting the overlong representations of [-2^{n-1}, 2^{n-1}-1). This convention is obviously just signed arithmetic, the special cases of 8, 16, 32, and 64 bits everyone who has ever used C will have encountered, but it’s good to understand how extension and contraction works. This is the way I plan to implement arithmetic, representation, and storage of infinite precision ints in my project UTF-8000 (an infinite extension of UTF-8). One needs to be able to represent numbers beyond a finite size_t upper bound, without infinite leading 0s and 1s. This is related, I’m not just waffling here, bare with me.

Let us call the ‘minimal’ representation the shortest possible representation of a signed number

0b1 (1 * -2 ** 0) is the shortest representation of -1, whereas 0b111 ((1 * -2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is an overlong representation
0b010111 ((0 * -2 ** 5) + (1 * 2 ** 4) + (0 * 2 ** 3) + (1 * 2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is the shortest represent of 23, whereas 0b00010111 is overlong and 0b10111 is one digit too truncated and is the representation of -9.

Formatting of negative numbers in C and Python

C’s formatting of negative numbers in binary, octal, and hex is influenced by machine-width
Python’s integers however are not limited by a machine width, they are infinite precision
In both C and Python the precision format specifier is only the minimum requested number of digits; it should not truncate to exactly this number of digits

C

printf("%#.2b\n", -1);                // 0b11111111111111111111111111111111 this would go on infinitely if we built a Turing Machine with infinite length tape...
printf("%#.2b\n", 50);                // 0b110010 beyond the 2 digits requested

Python

"%#.2x" % 1000 # 0x3e8 is 3 digits, one more than the minimum of 2 requested

It appears to me we therefore have four options for implementing formatting of negative numbers in Python with the precision specifier

Infinite length strings,
- eg f"{-19:#.8b}" completely ignoring the padding and giving 0b1111...{infinitely long}...111111101101.
- Con: Not practical…
- Con: This is really an artefact of C taking into account the machine’s width
- Verdict: Bad idea…
Minimal width binary representation as given above, padded to precision length.
- eg f"{-1:#.4b}" is '0b1111', padding to the requested overlong representation of length 4
- eg f"{-19:#.2b}" is '0b101101', using more than the 2 precision requested
- Con: Requires teaching everyone a new convention
- Con: One has to juggle variable width binary in one’s head
- Verdict: Bad idea…
Keep %-formatting’s behavior of using a sign: format a negative number x = -y (where y is non-negative) using a negative sign and y’s representation with the requested precision
- eg f"{-255:#.4x}" is '-0x00ff'
- Pro: Internally consistency within Python, %-strings and f-strings having the same behavior more-or-less, albeit slightly different to C.
- Verdict: Yes
Have precision restrict to exactly the number of digits requested
- ie f"{x:.{n}b}" wraps x into the range [0, 2 ^ n) and formats the non-negative number as expected
- Pro: One can choose familiar numbers for n like 8, 16, 32 etc, as Raymond wanted
- Pro: Working modulo 2 ^ n is often desired
- Neutral: the context of 0b11111111 being 255 unsigned or -1 signed is up to the user’s interpretation, but usually one will know what they’re up to. Also in Python source code this will always be interpreted as 255, in case one was trying some eval(repr()) chicanery
- Verdict: Also yes

We choose to implement both 3 and 4. 3 is how %-formatting works, and is how I opened up this thread. 4 we discuss in a little more detail below:

Implement new `!` ‘exact precision’ format specifier

To implement behavior 4 from above, that is an exact number of digits, I propose a new ! ‘exact precision’ specifier, mutually exclusive with . ‘precision’ in the formatting mini-language

f"{x:!{n}{base}}" performs \text{mod}(x, \text{base} ^ n) and formats that with the precision specifier . to n digits in base base. That is equivalent to f"{divmod(x, base ** n)[1]:#.{n}{base}}".

Examples:

f"{-19:#!8b}" is f"{237:#!8b}" is f"{493:#!8b}" is 0b11101101. These are examples of -19 + 256N with N \in \mathbb{Z}.
f"{-19:#!11o}" is '0o77777777755'. Notice that this differs from C’s '037777777755', even discounting the 0o/0 prefix. 11 octal digits requires 33 bits. In C with x86_64 ints are 32 bits, which leads to '3777...' instead of 7777.... Any 32, 64, 2 ** n bit machines’ widths are not divisible by 3. Python flexes its infinite precision.
f"{513:#!2x}" is '0x01', which differs from f"{513:#.2x}" which is '0x201'.

Syntax Justification

Pro

! is graphically related to ., an extension if you will. ‘exact precision’ is indeed an extension of ‘precision’: ‘exact precision’ takes \text{mod} of the number to format, then passes that to ‘precision’
! in the English language is often used for imperative, commanding sentences. So too ! commands the exact number of digits to which its input should be formatted, whereas . is only a suggested minimum
Backwards compatible and optional

Contra

This is another addition to the mini-formatting language, however as ! is to be mutually exclusive with . the overall complexity of one’s written code is unaffected
f"{x:#!8b}" is equivalent to f"{divmod(x, 2 ** 8)[1]:#.8b}", are we just lazily avoiding writing divmod()[1]?

Rejected Alternatives

Add a B type specifier. To me that indicates a capital prefix 0B, like X is used for 0X1234.... Also the problem we are trying to solve is not just for binary (base 2), it applies to all bases, right now binary, octal, and hex (maybe even decimal?), and the X specifier is already taken for implementing a capital 0X prefix as mentioned.

Sorry if this is a long message, and that it’s taken me a few days to distill it down to this, but these are my thoughts so far.