Implement `precision` format spec for `int` type data

Intro

We propose implementing the precision specifier of the Format Specification Mini-Language for most integer presentation types, to allow the formatting of integers with a guaranteed minimum number of ‘content characters’, that is the actual digits of the number excluding a base prefix created by #, underscores / commas created by grouping_option, and a possible space or sign created by sign.

For a quick example:

>>> x = 25
>>> f"{x:#08b}"
'0b011001'       # only 6 content characters since '#' has eaten two to create the '0b' prefix
>>> f"{x:#.8b}"
'0b00011001'     # all 8 content characters since '.8' demands a precision of 8

The former already exists. The latter is what we desire to implement. Currently the format spec docs inform that “the precision is not allowed for integer presentation types”, though I don’t see an immediate technical reason why we can’t do this, and the justifications to do so are sound.

Rationale

When formatting an int x of known bounds using f-strings (and other methods such as str.format), one often wishes to pad x’s string representation with 0s to a sensible width. For example using f"{x:08b}" to pad x’s binary representation to a width of eight bits, or f"{x:04x}" to pad an ‘unsigned short’ x to four hex digits. Python also provides the wonderful # format specifier to prefix the result with the appropriate '0b', '0o', or '0x' base prefix, however this eats into the width number of characters allocated.

>>> x = 13
>>> f"{x:04x}"
'000d'            # four hex digits :)
>>> f"{x:#04x}"
'0x0d'            # only two hex digits :(

The width format specifier is for the length of the entire string, not just the ‘content characters’ aka the ‘digits’.

One could argue that since the length of the prefix is known to always be 2, one can just account for that manually by adding two to the desired number of digits. In our example above that would be f"{x:#06x}", but there’s several reasons this is a bad idea:

  • at a glance f"{x:#06x}" looks like it may produce 6 hex digits, but it only produces four, namely '0x000d'
  • 6 is thus too much of a ‘magic number’, and countering that by being overly explicit, eg with f"{x:#0{2+4}x}", looks ridiculous
  • things get more complicated when we introduce a sign specifier, eg f"{x: #0{1+2+4}x}" to produce ' 0x000d'
  • things get even more complicated when introducing a grouping_option: k = 4 ; f"{x: #0{1+2+4*k+(k - 1)}_x}" to produce ' 0x0000_0000_0000_000d' ie k number of hex-groups joined by '_'
  • in the future perhaps a 'O' type specifier may be added to format a number into C-style octal, with a prefix of '0' instead of '0o', meaning not all the prefixes would be of length 2

This proposal is not a new special-case behavior being demanded of int data: the precision specifier for float data ensures that there are a fixed number of digits after the decimal point. For example f"{0.2: .4f}" produces ' 0.2000', the 4 not counting the minimum total number of characters in the string, but the four digits '2000'.

The only integer presentation type that precision wouldn’t make sense with is c (convert to the nth Unicode codepoint). Perhaps a ValueError could be raised if the user tries using precision with type c for a int data, eg f"{65:.8c}".

(Personally) rejected alternatives

I’ve mulled over this proposal for a couple of weeks.

My original thoughts were to add in another format specifier ~, as an alternative to #. This would create the same 0b, 0o, 0x prefixes as #, but they would not count towards the width specifier’s count. This would be mutually exclusive with #, just as _ and , are for grouping_option

  • con: ~ and # are only on the same key on my UK keyboard, but that’s not the case on US-and-elsewhere keyboards
  • con: more clutter added to the format spec for a single purpose
  • con: I hadn’t considered grouping_option’s impact on width / precision at that point

Comparisons between width and precision

Were this proposal to go ahead, here are some examples of how precision simplifies things down, and highlights current limitations without precision for ints

x f-string code explanation resulting string remarks
73 f"{x:08b}" width of 8, binary '01001001'
f"{x:.8b}" precision of 8, binary '01001001' same behavior as expected with width
f"{x:#08b}" 0b prefixed, width of 8, binary '0b1001001' '0b' prefix stole 2 of the 8 width requested!
f"{x:#.8b}" 0b prefixed, precision of 8, binary '0b01001001' all 8 precision chars are given to x, and '0b' tacked on; good
- - - - -
300 f"{x:#08b}" 0b prefixed, width of 8, binary '0b100101100'
f"{x:#.8b}" 0b prefixed, precision of 8, binary '0b100101100' same behavior as expected with width since x is larger than 8 bits
- - - - -
8086 f"{x: #08x}" leading sign space, 0x prefixed, width of 8, hex ' 0x01f96' ' 0x' prefix stole 3 of the 8 width requested!
f"{x: #.8x}" leading sign space, 0x prefixed, precision of 8, hex ' 0x00001f96' all 8 precision chars are given to x, and ' 0x' tacked on; good
- - - - -
18 f"{x:#03o}" 0o prefixed, width of 3, octal '0o22' '0o' prefix stole 2 of the 3 width requested, thus x’s size only padded it to 2 chars
f"{x:#.3o}" 0o prefixed, precision of 3, octal '0o022' all 3 precision chars are given to x, and '0o' tacked on; good

Teaching

This could replace the default way of teaching formatting integers to fixed widths, eg f"{x:.8b}" instead of f"{x:08b}".

However people coming from C might expect the old behavior printf("%08b\n", 73); required to produce 01001001. I was pleasantly surprised to discover that printf("%.8b\n", 73); is perfectly valid C syntax. The difference between %08d and %.8d is that in the former a negative sign consumes one of the 8 width characters, whereas in the latter a negative sign doesn’t consume one of the 8 precision characters (source). This is consistent with what we’re trying to implement.

It should at least be used in documentation examples whenever # is with a desired length, eg f"{x:#.3o}" for file mode literals instead of f"{x:#05o}".

Another example:

>>> def hexdump(b: bytes) -> str:
...     return " ".join(f"{i:#.2x}" for i in b)
...
>>> hexdump(b"Hello")
'0x48 0x65 0x6c 0x6c 0x6f'
6 Likes

I’m in favour of this - I’ve used it a lot in C, and find it useful. In fact, the old style %-formatting in Python already supports this:

>>> "%.8x" % (73,)
'00000049'

So in some senses this is just bringing the new formatting language into parity with %-formatting.

I don’t think this is a big enough change to need a PEP - have you considered submitting a PR implementing this, and seeing what reaction you get?

1 Like

A couple of weeks ago when I was hacking around implementing ~ as an alternative to # (which as discussed in OP I’ve discarded in favour of precision) there seemed to be more than one place in which f-string / format spec parsing is done: grepping for F_ALT (which is defined in Include/internal/pycore_format.h) I found

  • Python/formatter_unicode.c has parse_internal_render_format_spec

  • Objects/unicodeobject.c has unicode_format_arg_parse

  • Objects/bytesobject.c has _PyBytes_FormatEx

etc. I got it working by hacking just at Python/formatter_unicode.c, and I could try something similar for implementing precision for ints. For a merge however this seems like it needs to be done correctly, in all the appropriate places. Looking at the experts index would it be appropriate to tag @ericvsmith of f-strings? :slight_smile:

>>> "% #.4x" % (65,)
' 0x0041'

>>> f"{65: #.4x}"
# ValueError: Precision not allowed in integer format specifier

lol, perhaps it was an oversight of PEP 3101 (Advanced String Formatting):

The precision is ignored for integer conversions.

Since this proposal would supersede that clause, would a PEP be required for proper legislative purposes? PEP 682 (Format Specifier for Signed Zero) which added the z format specifier to coerce ‘negative zeros’ to positives required a PEP, even though it is backwards compatible.

Yeah, it’s probably worth writing up a PEP for this. In addition to other benefits, it would make it more of a “hey, check out this new feature” than a mere line in a What’s New.

I’m in favour of this too.

2 Likes

I doubt we need a PEP here, see e.g. Add underscore as a decimal separator for string formatting · Issue #87790 · python/cpython · GitHub. Just open an issue and submit a PR.

You may want, however, take look on PEP 3101 discussions. Probably, ignoring the precision setting was discussed at some point, e.g. it introduced an incompatibility with printf-style formatting (and C).

Okay today I’ve done lots of digging though PEPs, Python mail archives, git history and I’ve a few remarks:

I was reading through PEP 3101 trying to discern a reason why precision wasn’t implemented for ints in new-style formatting back then (2006), even though testing Python 3.0 it works for old-style formatting, eg "%#.4x" % (65,) returns 0x0041. Since I couldn’t see a reason, and I had done plenty of looking through the mail archives (thread 1 and thread 2), I decided to start drafting an email to Talin / Viridia, the author.

I was wondering whether there was a particular reason for forbidding precision for int data, or whether this was just an oversight because of the overlap of width and precisions’ behaviour for simple formatting were both to be added, eg f"{21:04x}" and f"{21:.4x}" both returning '0015'?

I have looked through the git history of PEP 3101, whence I have narrowed down the decision to this commit. The re-wording suggests to me that one realised the precision specifier should apply to more than just the float type: that it should behave for Python str strings as it does for C const char * strings, truncating to the first n characters, however for int type data precision could be left unimplemented since f"{21:04x}" and f"{21:.4x}" would render to the same result.

But. Upon expanding the whole version of the file at that git commit, not just the diff, and then at this later commit in 2007 of PEP 3101 being accepted, the format spec is given by

[[fill]align][sign][0][width][.precision][type]

This differs from what is currently displayed on the PEP 3101 page!

[[fill]align][sign][#][0][minimumwidth][.precision][type]

It was retroactively updated in 2008 to include the # ‘alt’ presentation form specifier in this git commit. This anachronistic addition is what has thrown me off since the start. I can’t find ‘issue 3138’ that it references either. Also since the 2023 switchover of PEP 3101 from .txt to .rst the # is formatted wrong, like a comment, mocking me…

In the original format spec the overlap of width and precisions’ behavior for ints is 99%, the 1% difference that I can see being when sign is ' '. This is a miniscule oversight, and I don’t feel the need to pester Talin with an email over an off-by-one: ignoring that, it was perfectly fine to implement only width, ala the Zen of Python “There should be one-- and preferably only one --obvious way to do it.” and “Special cases aren’t special enough to break the rules.”. It is the later additions of # (which interestingly works in Python 3.0, even though not formally documented in PEP 3101), and grouping_option (',' or '_') that really start to broaden the distinction between precision and width, and fully justify the re-admission of precision as laid out in OP.

So my plan now:

  • Write a PEP for this, just as there is PEP 682 for [z].
    • Firstly this is for proper documenting of feature addition, which requires a condensed and inspiring version of OP, as @Rosuav suggests.
    • Secondly this is to avoid any chicanery of sneaking new features into old documentation :sweat_smile:
3 Likes

What’s status of this proposal?

BTW, we have Add integer formatting code for fixed-width signed arithmetic (2's complement) · Issue #74756 · python/cpython · GitHub, which might be solved in some way (with or without breaking backward compatibility) using precision field. I.e. for binary/hexadecimal formatting types we will use twos complements for negative integers:

>>> f"{-12:.8b}"  # a shortcut for f"{(-12) % 2**8, '08b')}"
'11110100'

Hi Sergey

I got distracted with other things, but don’t worry this issue has been on my mind!

The clamped binary representation discussion on GitHub prompts an interesting point about C and Python already differing in their formatting of negative numbers (for hex and octal as well as binary) using precision.

C

#include <stdio.h>

int main() {
  printf("%#.4x\n",  19); // 0x0013
  printf("%#.4x\n", -19); // 0xffffffed
  return 0;
}

Precision promoted (sic) from 4 to 8,

Python

print("%#.4x" % ( 19,)) # 0x0013
print("%#.4x" % (-19,)) # -0x0013

I’ve now got some more thoughts about formatting of Python’s infinite-precision int, clamped and otherwise, that I shall lay out clearly in my next message after I sleep. Just acknowledging your message and that this discussion isn’t abandoned :smiley:

I’m not sure if this does require a PEP, so here is an implementation to play with: gh-74756: support precision field for integer formatting types by skirpichev · Pull Request #131926 · python/cpython · GitHub (it’s mostly dedicated to solve referenced issue).

Comment from reading the PR description: making f"{200:.8b}" an error seems a bit odd to me. I’d definitely want to be able to apply the new formatting to unsigned bytes (e.g., elements of a bytes instance).

Or is the plan to use a different format code (B?) for unsigned, and reserve the interpretation of b for two’s complement signed values? The unsigned use-case seems like the bigger one to me.

I think it would be worth a PEP, just to discuss and communicate answers to design questions like this one.

Meta (and opinionated): “does this require a PEP?” seems like the wrong question to be asking: that framing of requiring or not requiring a PEP feels as though it’s viewing a PEP primarily as a regulatory roadblock, rather than as an aid to discussion and development. “Would a PEP be useful?” might be a better question.

3 Likes

I would expect #B and #O producing upper-case prefixes 0B and 0O, like #X produces 0X. Not sure that we should implement this feature, but B and O should be reserved.

Two’s complement representation of negative integer has infinite number of digits. In C it is limited by the length modifier (default, hh, h, l, ll), but in Python integers has unlimited size and the length modifier is ignored. It would also be not convenient, because it supports only limited set of sizes.

So I suggested to use precision to limit the number of digits. This is compatible with using precision for strings:

>>> '%.3s' % 'abcd'
'abc'
>>> format('abcd', '.3s')
'abc'

If the formatted number is negative, the result will only contain the number of rightmost digits specified by precision. But what if it is positive, but contains more digits? For consistency with string formatting, it should drop redundant leftmost digits. But it would be incosistent with formatting in C. Output all digits? Then we should output more than precision digits for large negative numbers. How to distinguish two’s complement numbers, for example format(-42, '.2x') and format(214, '.2x')?

2 Likes

Alternative approach: precision — being the minimum number of digits appear (C-like). The problem is two’s complement representation for negative integers. Which size we should pick up for negative value if it doesn’t fit into specified precision digits?

1 Like

tldr for my proposal based off discussions so far, 2 PEPs:

. ‘precision’ for f-strings should operate the same as it does for %-strings, formatting the number of digits only, and keeping a sign in the case of negative numbers, f"{-255:#.2x}" is "%#.2x" % -255 is -0xff. It is also only the minimum number of digits to which an integer should be formatted, not an exact number, eg f"{-65535:#.2x}" is "%#.2x" % -65535 is '-0xffff', with 4 digits, not restricted to the minimum of 2. This is the approach I would take for the feature that this thread started off as, and yes I would suggest this be a PEP, given the discussions / alternatives discussed here, and because it leads onto the next proposed addition:

! ‘exact precision’ is a new specifier to be added that formats to the exact number of digits requested. f"{x:!{n}{b}}" formats x to n digits in base b, effectively wrapping x into range(0, b ** n) using modular arithmetic, and then formatting that using the standard . precision specifier as laid out above. This would require its own PEP, and would cite the one above in its reasoning. eg f"{-1:!8b}" is '11111111', f"{-19:!8b}" is f"{237:!8b}" is 11101101. This would need to be a PEP.

No new type specifier B or O should be added, as these indicate to me a capital prefix 0B1010... and 0O3535..., or alternatively for octal a C-like octal prefix with no o/O in it, eg 03535.

I’ll go into the full reasoning in a following message. Sorry I’ve been really busy but we must get this right :sweat_smile:

Implement . ‘precision’ for f-strings the same as the existing behavior for %-strings

Preliminary: Binary Representations

Expansion

Observe that one can always extend a signed number’s binary representation by extending the the leading digit prefix:

-19 char (8 bit)                          0b11101101
-19 int  (32 bit) 0b11111111111111111111111111101101
 47 char (8 bit)                          0b00101111
 47 int  (32 bit) 0b00000000000000000000000000101111

This is what C with twos-complement can do

printf("%#hhb\n",              -19); // 0b11101101
printf("%#hho\n",              -19); // 0355
printf("%#hhx\n",              -19); // 0xed

printf("%#b\n", (unsigned char)-19); // 0b11101101 same as 237 mod 256
printf("%#o\n", (unsigned char)-19); // 0355
printf("%#x\n", (unsigned char)-19); // 0xed

printf("%#b\n",                -19); // 0b11111111111111111111111111101101
printf("%#o\n",                -19); // 037777777755
printf("%#x\n",                -19); // 0xffffffed

This also generalizes beyond ‘standard’ machine widths of powers of two of course.

Contraction

Conversely one can losslessly truncate a signed binary number’s representation to have only one leading 0 if it is non-negative, and one leading 1 if it is negative:

 5 as 0b00000101 -> 0b0101
-3 as 0b11111101 -> 0b1101

If one were to truncate another digit off these examples, then both would end up as 0b101, 5 indistinguishable from -3 when using only 3 binary digits because they are the same modulo 2 ** 3.

Variable width and minimal width binary representations

Therefore to losslessly, unambiguously represent a signed binary number, let us define a convention for ‘variable width and minimal width binary representations’.

The leading binary digit represents the -2^{n-1} value column, the other digits representing columns of value 2 ^ {n-2} \cdots 2^{0}

1 digit : 0 =   0b0 ,                                                                            -1 =   0b1
2 digits: 0 =  0b00 , 1 =  0b01 ,                                                   -2 =  0b10 , -1 =  0b11
3 digits: 0 = 0b000 , 1 = 0b001 , 2 = 0b010 , 3 = 0b011 , -4 = 0b100 , -3 = 0b101 , -2 = 0b110 , -1 = 0b111

In general n digits can represent the range [-2^{n-1}, 2^{n-1}-1), range(-2**(n-1), 2**(n-1)) in Python syntax, permitting the overlong representations of [-2^{n-1}, 2^{n-1}-1). This convention is obviously just signed arithmetic, the special cases of 8, 16, 32, and 64 bits everyone who has ever used C will have encountered, but it’s good to understand how extension and contraction works. This is the way I plan to implement arithmetic, representation, and storage of infinite precision ints in my project UTF-8000 (an infinite extension of UTF-8). One needs to be able to represent numbers beyond a finite size_t upper bound, without infinite leading 0s and 1s. This is related, I’m not just waffling here, bare with me.

Let us call the ‘minimal’ representation the shortest possible representation of a signed number

  • 0b1 (1 * -2 ** 0) is the shortest representation of -1, whereas 0b111 ((1 * -2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is an overlong representation
  • 0b010111 ((0 * -2 ** 5) + (1 * 2 ** 4) + (0 * 2 ** 3) + (1 * 2 ** 2) + (1 * 2 ** 1) + (1 * 2 ** 0)) is the shortest represent of 23, whereas 0b00010111 is overlong and 0b10111 is one digit too truncated and is the representation of -9.

Formatting of negative numbers in C and Python

  • C’s formatting of negative numbers in binary, octal, and hex is influenced by machine-width
  • Python’s integers however are not limited by a machine width, they are infinite precision
  • In both C and Python the precision format specifier is only the minimum requested number of digits; it should not truncate to exactly this number of digits

C

printf("%#.2b\n", -1);                // 0b11111111111111111111111111111111 this would go on infinitely if we built a Turing Machine with infinite length tape...
printf("%#.2b\n", 50);                // 0b110010 beyond the 2 digits requested

Python

"%#.2x" % 1000 # 0x3e8 is 3 digits, one more than the minimum of 2 requested

It appears to me we therefore have four options for implementing formatting of negative numbers in Python with the precision specifier

  1. Infinite length strings,
    • eg f"{-19:#.8b}" completely ignoring the padding and giving 0b1111...{infinitely long}...111111101101.
    • Con: Not practical…
    • Con: This is really an artefact of C taking into account the machine’s width
    • Verdict: Bad idea…
  2. Minimal width binary representation as given above, padded to precision length.
    • eg f"{-1:#.4b}" is '0b1111', padding to the requested overlong representation of length 4
    • eg f"{-19:#.2b}" is '0b101101', using more than the 2 precision requested
    • Con: Requires teaching everyone a new convention
    • Con: One has to juggle variable width binary in one’s head
    • Verdict: Bad idea…
  3. Keep %-formatting’s behavior of using a sign: format a negative number x = -y (where y is non-negative) using a negative sign and y’s representation with the requested precision
    • eg f"{-255:#.4x}" is '-0x00ff'
    • Pro: Internally consistency within Python, %-strings and f-strings having the same behavior more-or-less, albeit slightly different to C.
    • Verdict: Yes
  4. Have precision restrict to exactly the number of digits requested
    • ie f"{x:.{n}b}" wraps x into the range [0, 2 ^ n) and formats the non-negative number as expected
    • Pro: One can choose familiar numbers for n like 8, 16, 32 etc, as Raymond wanted
    • Pro: Working modulo 2 ^ n is often desired
    • Neutral: the context of 0b11111111 being 255 unsigned or -1 signed is up to the user’s interpretation, but usually one will know what they’re up to. Also in Python source code this will always be interpreted as 255, in case one was trying some eval(repr()) chicanery
    • Verdict: Also yes

We choose to implement both 3 and 4. 3 is how %-formatting works, and is how I opened up this thread. 4 we discuss in a little more detail below:

Implement new ! ‘exact precision’ format specifier

To implement behavior 4 from above, that is an exact number of digits, I propose a new ! ‘exact precision’ specifier, mutually exclusive with . ‘precision’ in the formatting mini-language

f"{x:!{n}{base}}" performs \text{mod}(x, \text{base} ^ n) and formats that with the precision specifier . to n digits in base base. That is equivalent to f"{divmod(x, base ** n)[1]:#.{n}{base}}".

Examples:

  • f"{-19:#!8b}" is f"{237:#!8b}" is f"{493:#!8b}" is 0b11101101. These are examples of -19 + 256N with N \in \mathbb{Z}.

  • f"{-19:#!11o}" is '0o77777777755'. Notice that this differs from C’s '037777777755', even discounting the 0o/0 prefix. 11 octal digits requires 33 bits. In C with x86_64 ints are 32 bits, which leads to '3777...' instead of 7777.... Any 32, 64, 2 ** n bit machines’ widths are not divisible by 3. Python flexes its infinite precision.

  • f"{513:#!2x}" is '0x01', which differs from f"{513:#.2x}" which is '0x201'.

Syntax Justification

Pro

  • ! is graphically related to ., an extension if you will. ‘exact precision’ is indeed an extension of ‘precision’: ‘exact precision’ takes \text{mod} of the number to format, then passes that to ‘precision’

  • ! in the English language is often used for imperative, commanding sentences. So too ! commands the exact number of digits to which its input should be formatted, whereas . is only a suggested minimum

  • Backwards compatible and optional

Contra

  • This is another addition to the mini-formatting language, however as ! is to be mutually exclusive with . the overall complexity of one’s written code is unaffected

  • f"{x:#!8b}" is equivalent to f"{divmod(x, 2 ** 8)[1]:#.8b}", are we just lazily avoiding writing divmod()[1]?

Rejected Alternatives

  • Add a B type specifier. To me that indicates a capital prefix 0B, like X is used for 0X1234.... Also the problem we are trying to solve is not just for binary (base 2), it applies to all bases, right now binary, octal, and hex (maybe even decimal?), and the X specifier is already taken for implementing a capital 0X prefix as mentioned.

Sorry if this is a long message, and that it’s taken me a few days to distill it down to this, but these are my thoughts so far. :slight_smile:

1 Like

! conflicts with coverter indicator (like !r). It is a big no.

1 Like

It does not conflict.

The grammar for Format String Syntax is replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}". Notice that format_spec is separated by a well-defined ":", and so adding a ! to Format Specification shouldn’t be a problem right?

And even if it did the three converters I can see are !r, !s, and !a. It seems a numerical value eg !8 for 8 digits wouldn’t clash.

The same for f-strings replacement_field ::= "{" f_expression ["="] ["!" conversion] [":" format_spec] "}"

More discussions on the open GitHub PR have made me ponder the virtues of adding both behaviours 3 and 4.

  • 3 differs from C, but is the same as Python %-formatting

  • 4 is similar to C, but truncated exactly to the precision demanded (an improvement in the context of Python as Python ints are machine-width independent), but differs from Python %-formatting

The reason I was anxious to implement both behaviours was out of respect for ‘thou shalt not break compatibility’, but really Python’s ‘new’ style formatting (str.format and f-strings, which both use the [":" format_spec] syntax in their replacement_field) have already moved away from some old %-style behavior, and were introduced for a reason: to improve the language. Therefore there are two questions:

  • Is it worth ‘new’ string formatting to move further away from ‘old’ %-formatting, by implementing . to use the ‘exact precision’ behavior?

  • If . were implemented to exhibit the ‘exact precision’ behavior, would anyone miss the '-0x13' signed-negative behavior? ‘±0xNN’ ie two hex digits can represent -255 to 255, a strange range, which upon reflection I don’t think I personally would use.

Mulling over this I think I could be convinced that there isn’t too much a demand for ‘-0x13’ style behavior, and since this wouldn’t be the first time new-formatting breaks away from old I’m fine with that too.

TLDR:

So the resolution at this point would be to implement behavior 4, ‘exact precision’, but with ., instead of a new ! specifier, as @skirpichev has kindly already done :slightly_smiling_face: . It seems no PEP(s) is necessary then, if these discussions stand as a log of the debate over the implementation / syntax?

My only concern with the GitHub PR is that for f"{x:.{n}{base}}" it forces x to be in the signed range(-base ** (n - 1), base ** (n - 1)) instead of taking a divmod, being more agnostic say of f"{255:#.2x}" is f"{-1:#.2x}" is '0xff'

I do agree that we shouldn’t feel unnecessarily constrained by what %-style formatting does; still less by what C does. So my answers to your two questions are:

Yes!

I certainly wouldn’t.

My primary use-case would be consistent zero-padding for formatting nonnegative ints (that are really fixed-length bit strings masquerading as ints for convenience). E.g., things coming out of struct.unpack('<Q', struct.pack('<d', some_float_that_i_want_to_examine))[0]. So I don’t really much care what happens to negative ints, but I do care that the full range of nonnegative ints for a given bit-width is supported.

1 Like

Thanks for a detailed proposal. Though, I suspect that 2 PEPs is overkill for extension of the format mini-language: both seems to be related with the precision field.

Your original proposal looks for me as “let’s do as we have for %-formatting, but with support for thousands separators too”. I found this rather weak: almost same you can do using 0-padding with appropriate width. Maybe that’s why this wasn’t in the new-style string formatting from beginning (despite introduced incompatibility). Anyway, this is the first option (1): precision — is just number of digits in the magnitude.

Raymond’s issue looks more motivated. Maybe it’s not too common, but sometimes we want to print the signed integer in two’s complement representation, in range(-m, m) with m=2**(k*n-1) (where k=1,3,4, respectively for 'b', 'o' and 'x' formatting types). Precision option might be used to provide n. This is the second (2) option. Alternative would be using the width option or even extend formatting mini-language with some new syntax. Neither looks attractive.

Do we want both meaning for precision option? In the referenced pr — (1) was implemented for decimal formatting type and (2) — for power of 2 bases. Another possible alternative — add some flag to select (1) or (2) for a given formatting type (or reuse existing, e.g. 'z').

What to do if value doesn’t fit into specified precision? For (1) it seems natural to interpret precision as minimal number of digits. With (2) we can either end with an exception (BTW, OverflowError seems more suitable here), or also appropriately enlarge range as needed for the given value, e.g. (pr thread has patch to play):

>>> f"{200:.8b}"
'011001000'
>>> format(-42, '.2x')
'd6'
>>> format(214, '.2x')
'0d6'
>>> format(-128, '.8b')
'10000000'
>>> format(-129, '.8b')
'101111111'
>>> format(127, '.8b')
'01111111'
>>> format(128, '.8b')
'010000000'
>>> format(-129, '.2o')
'577'

My proposal: use (1) per default for all integer formatting types (but not 'c'), use (2) meaning iff 'z' flag was selected for base-2 formatting types, where the precision option specifies minimal range for twos complement representation (like in above examples). The (1) doesn’t seems too helpful for base-2 cases, but using this meaning per default will be consistent with current behavior (i.e. when no precision specified). Reduced version: just implement (2) for base-2 formatting types.