Implement .
‘precision’ for f-strings the same as the existing behavior for %-strings
Preliminary: Binary Representations
Expansion
Observe that one can always extend a signed number’s binary representation by extending the the leading digit prefix:
-19 char (8 bit) 0b11101101
-19 int (32 bit) 0b11111111111111111111111111101101
47 char (8 bit) 0b00101111
47 int (32 bit) 0b00000000000000000000000000101111
This is what C with twos-complement can do
printf("%#hhb\n", -19); // 0b11101101
printf("%#hho\n", -19); // 0355
printf("%#hhx\n", -19); // 0xed
printf("%#b\n", (unsigned char)-19); // 0b11101101 same as 237 mod 256
printf("%#o\n", (unsigned char)-19); // 0355
printf("%#x\n", (unsigned char)-19); // 0xed
printf("%#b\n", -19); // 0b11111111111111111111111111101101
printf("%#o\n", -19); // 037777777755
printf("%#x\n", -19); // 0xffffffed
This also generalizes beyond ‘standard’ machine widths of powers of two of course.
Contraction
Conversely one can losslessly truncate a signed binary number’s representation to have only one leading 0
if it is non-negative, and one leading 1
if it is negative:
5 as 0b00000101 -> 0b0101
-3 as 0b11111101 -> 0b1101
If one were to truncate another digit off these examples, then both would end up as 0b101
, 5 indistinguishable from -3 when using only 3 binary digits because they are the same modulo 2 ** 3
.
Variable width and minimal width binary representations
Therefore to losslessly, unambiguously represent a signed binary number, let us define a convention for ‘variable width and minimal width binary representations’.
The leading binary digit represents the -2^{n-1} value column, the other digits representing columns of value 2 ^ {n-2} \cdots 2^{0}
1 digit : 0 = 0b0 , -1 = 0b1
2 digits: 0 = 0b00 , 1 = 0b01 , -2 = 0b10 , -1 = 0b11
3 digits: 0 = 0b000 , 1 = 0b001 , 2 = 0b010 , 3 = 0b011 , -4 = 0b100 , -3 = 0b101 , -2 = 0b110 , -1 = 0b111
In general n digits can represent the range [-2^{n-1}, 2^{n-1}-1), range(-2**(n-1), 2**(n-1))
in Python syntax, permitting the overlong representations of [-2^{n-1}, 2^{n-1}-1). This convention is obviously just signed arithmetic, the special cases of 8, 16, 32, and 64 bits everyone who has ever used C will have encountered, but it’s good to understand how extension and contraction works. This is the way I plan to implement arithmetic, representation, and storage of infinite precision ints in my project UTF-8000 (an infinite extension of UTF-8). One needs to be able to represent numbers beyond a finite size_t
upper bound, without infinite leading 0s and 1s. This is related, I’m not just waffling here, bare with me.
Let us call the ‘minimal’ representation the shortest possible representation of a signed number
0b1
(1 * -2 ** 0) is the shortest representation of -1, whereas0b111
((1 * -2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is an overlong representation0b010111
((0 * -2 ** 5) + (1 * 2 ** 4) + (0 * 2 ** 3) + (1 * 2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is the shortest represent of 23, whereas0b00010111
is overlong and0b10111
is one digit too truncated and is the representation of -9.
Formatting of negative numbers in C and Python
- C’s formatting of negative numbers in binary, octal, and hex is influenced by machine-width
- Python’s integers however are not limited by a machine width, they are infinite precision
- In both C and Python the precision format specifier is only the minimum requested number of digits; it should not truncate to exactly this number of digits
C
printf("%#.2b\n", -1); // 0b11111111111111111111111111111111 this would go on infinitely if we built a Turing Machine with infinite length tape...
printf("%#.2b\n", 50); // 0b110010 beyond the 2 digits requested
Python
"%#.2x" % 1000 # 0x3e8 is 3 digits, one more than the minimum of 2 requested
It appears to me we therefore have four options for implementing formatting of negative numbers in Python with the precision specifier
- Infinite length strings,
- eg
f"{-19:#.8b}"
completely ignoring the padding and giving0b1111...{infinitely long}...111111101101
. - Con: Not practical…
- Con: This is really an artefact of C taking into account the machine’s width
- Verdict: Bad idea…
- eg
- Minimal width binary representation as given above, padded to precision length.
- eg
f"{-1:#.4b}"
is'0b1111'
, padding to the requested overlong representation of length 4 - eg
f"{-19:#.2b}"
is'0b101101'
, using more than the 2 precision requested - Con: Requires teaching everyone a new convention
- Con: One has to juggle variable width binary in one’s head
- Verdict: Bad idea…
- eg
- Keep %-formatting’s behavior of using a sign: format a negative number
x = -y
(wherey
is non-negative) using a negative sign andy
’s representation with the requested precision- eg
f"{-255:#.4x}"
is'-0x00ff'
- Pro: Internally consistency within Python, %-strings and f-strings having the same behavior more-or-less, albeit slightly different to C.
- Verdict: Yes
- eg
- Have precision restrict to exactly the number of digits requested
- ie
f"{x:.{n}b}"
wraps x into the range [0, 2 ^ n) and formats the non-negative number as expected - Pro: One can choose familiar numbers for n like 8, 16, 32 etc, as Raymond wanted
- Pro: Working modulo 2 ^ n is often desired
- Neutral: the context of
0b11111111
being 255 unsigned or -1 signed is up to the user’s interpretation, but usually one will know what they’re up to. Also in Python source code this will always be interpreted as 255, in case one was trying someeval(repr())
chicanery - Verdict: Also yes
- ie
We choose to implement both 3 and 4. 3 is how %-formatting works, and is how I opened up this thread. 4 we discuss in a little more detail below:
Implement new !
‘exact precision’ format specifier
To implement behavior 4 from above, that is an exact number of digits, I propose a new !
‘exact precision’ specifier, mutually exclusive with .
‘precision’ in the formatting mini-language
f"{x:!{n}{base}}"
performs \text{mod}(x, \text{base} ^ n) and formats that with the precision specifier .
to n
digits in base base
. That is equivalent to f"{divmod(x, base ** n)[1]:#.{n}{base}}"
.
Examples:
-
f"{-19:#!8b}"
isf"{237:#!8b}"
isf"{493:#!8b}"
is0b11101101
. These are examples of -19 + 256N with N \in \mathbb{Z}. -
f"{-19:#!11o}"
is'0o77777777755'
. Notice that this differs from C’s'037777777755'
, even discounting the0o/0
prefix. 11 octal digits requires 33 bits. In C with x86_64int
s are 32 bits, which leads to'3777...'
instead of7777...
. Any 32, 64, 2 ** n bit machines’ widths are not divisible by 3. Python flexes its infinite precision. -
f"{513:#!2x}"
is'0x01'
, which differs fromf"{513:#.2x}"
which is'0x201'
.
Syntax Justification
Pro
-
!
is graphically related to.
, an extension if you will. ‘exact precision’ is indeed an extension of ‘precision’: ‘exact precision’ takes \text{mod} of the number to format, then passes that to ‘precision’ -
!
in the English language is often used for imperative, commanding sentences. So too!
commands the exact number of digits to which its input should be formatted, whereas.
is only a suggested minimum -
Backwards compatible and optional
Contra
-
This is another addition to the mini-formatting language, however as
!
is to be mutually exclusive with.
the overall complexity of one’s written code is unaffected -
f"{x:#!8b}"
is equivalent tof"{divmod(x, 2 ** 8)[1]:#.8b}"
, are we just lazily avoiding writingdivmod()[1]
?
Rejected Alternatives
- Add a
B
type specifier. To me that indicates a capital prefix0B
, likeX
is used for0X1234...
. Also the problem we are trying to solve is not just for binary (base 2), it applies to all bases, right now binary, octal, and hex (maybe even decimal?), and theX
specifier is already taken for implementing a capital0X
prefix as mentioned.
Sorry if this is a long message, and that it’s taken me a few days to distill it down to this, but these are my thoughts so far.