Implement `precision` format spec for `int` type data

Intro

We propose implementing the precision specifier of the Format Specification Mini-Language for most integer presentation types, to allow the formatting of integers with a guaranteed minimum number of ‘content characters’, that is the actual digits of the number excluding a base prefix created by #, underscores / commas created by grouping_option, and a possible space or sign created by sign.

For a quick example:

>>> x = 25
>>> f"{x:#08b}"
'0b011001'       # only 6 content characters since '#' has eaten two to create the '0b' prefix
>>> f"{x:#.8b}"
'0b00011001'     # all 8 content characters since '.8' demands a precision of 8

The former already exists. The latter is what we desire to implement. Currently the format spec docs inform that “the precision is not allowed for integer presentation types”, though I don’t see an immediate technical reason why we can’t do this, and the justifications to do so are sound.

Rationale

When formatting an int x of known bounds using f-strings (and other methods such as str.format), one often wishes to pad x’s string representation with 0s to a sensible width. For example using f"{x:08b}" to pad x’s binary representation to a width of eight bits, or f"{x:04x}" to pad an ‘unsigned short’ x to four hex digits. Python also provides the wonderful # format specifier to prefix the result with the appropriate '0b', '0o', or '0x' base prefix, however this eats into the width number of characters allocated.

>>> x = 13
>>> f"{x:04x}"
'000d'            # four hex digits :)
>>> f"{x:#04x}"
'0x0d'            # only two hex digits :(

The width format specifier is for the length of the entire string, not just the ‘content characters’ aka the ‘digits’.

One could argue that since the length of the prefix is known to always be 2, one can just account for that manually by adding two to the desired number of digits. In our example above that would be f"{x:#06x}", but there’s several reasons this is a bad idea:

  • at a glance f"{x:#06x}" looks like it may produce 6 hex digits, but it only produces four, namely '0x000d'
  • 6 is thus too much of a ‘magic number’, and countering that by being overly explicit, eg with f"{x:#0{2+4}x}", looks ridiculous
  • things get more complicated when we introduce a sign specifier, eg f"{x: #0{1+2+4}x}" to produce ' 0x000d'
  • things get even more complicated when introducing a grouping_option: k = 4 ; f"{x: #0{1+2+4*k+(k - 1)}_x}" to produce ' 0x0000_0000_0000_000d' ie k number of hex-groups joined by '_'
  • in the future perhaps a 'O' type specifier may be added to format a number into C-style octal, with a prefix of '0' instead of '0o', meaning not all the prefixes would be of length 2

This proposal is not a new special-case behavior being demanded of int data: the precision specifier for float data ensures that there are a fixed number of digits after the decimal point. For example f"{0.2: .4f}" produces ' 0.2000', the 4 not counting the minimum total number of characters in the string, but the four digits '2000'.

The only integer presentation type that precision wouldn’t make sense with is c (convert to the nth Unicode codepoint). Perhaps a ValueError could be raised if the user tries using precision with type c for a int data, eg f"{65:.8c}".

(Personally) rejected alternatives

I’ve mulled over this proposal for a couple of weeks.

My original thoughts were to add in another format specifier ~, as an alternative to #. This would create the same 0b, 0o, 0x prefixes as #, but they would not count towards the width specifier’s count. This would be mutually exclusive with #, just as _ and , are for grouping_option

  • con: ~ and # are only on the same key on my UK keyboard, but that’s not the case on US-and-elsewhere keyboards
  • con: more clutter added to the format spec for a single purpose
  • con: I hadn’t considered grouping_option’s impact on width / precision at that point

Comparisons between width and precision

Were this proposal to go ahead, here are some examples of how precision simplifies things down, and highlights current limitations without precision for ints

x f-string code explanation resulting string remarks
73 f"{x:08b}" width of 8, binary '01001001'
f"{x:.8b}" precision of 8, binary '01001001' same behavior as expected with width
f"{x:#08b}" 0b prefixed, width of 8, binary '0b1001001' '0b' prefix stole 2 of the 8 width requested!
f"{x:#.8b}" 0b prefixed, precision of 8, binary '0b01001001' all 8 precision chars are given to x, and '0b' tacked on; good
- - - - -
300 f"{x:#08b}" 0b prefixed, width of 8, binary '0b100101100'
f"{x:#.8b}" 0b prefixed, precision of 8, binary '0b100101100' same behavior as expected with width since x is larger than 8 bits
- - - - -
8086 f"{x: #08x}" leading sign space, 0x prefixed, width of 8, hex ' 0x01f96' ' 0x' prefix stole 3 of the 8 width requested!
f"{x: #.8x}" leading sign space, 0x prefixed, precision of 8, hex ' 0x00001f96' all 8 precision chars are given to x, and ' 0x' tacked on; good
- - - - -
18 f"{x:#03o}" 0o prefixed, width of 3, octal '0o22' '0o' prefix stole 2 of the 3 width requested, thus x’s size only padded it to 2 chars
f"{x:#.3o}" 0o prefixed, precision of 3, octal '0o022' all 3 precision chars are given to x, and '0o' tacked on; good

Teaching

This could replace the default way of teaching formatting integers to fixed widths, eg f"{x:.8b}" instead of f"{x:08b}".

However people coming from C might expect the old behavior printf("%08b\n", 73); required to produce 01001001. I was pleasantly surprised to discover that printf("%.8b\n", 73); is perfectly valid C syntax. The difference between %08d and %.8d is that in the former a negative sign consumes one of the 8 width characters, whereas in the latter a negative sign doesn’t consume one of the 8 precision characters (source). This is consistent with what we’re trying to implement.

It should at least be used in documentation examples whenever # is with a desired length, eg f"{x:#.3o}" for file mode literals instead of f"{x:#05o}".

Another example:

>>> def hexdump(b: bytes) -> str:
...     return " ".join(f"{i:#.2x}" for i in b)
...
>>> hexdump(b"Hello")
'0x48 0x65 0x6c 0x6c 0x6f'
3 Likes

I’m in favour of this - I’ve used it a lot in C, and find it useful. In fact, the old style %-formatting in Python already supports this:

>>> "%.8x" % (73,)
'00000049'

So in some senses this is just bringing the new formatting language into parity with %-formatting.

I don’t think this is a big enough change to need a PEP - have you considered submitting a PR implementing this, and seeing what reaction you get?

1 Like

A couple of weeks ago when I was hacking around implementing ~ as an alternative to # (which as discussed in OP I’ve discarded in favour of precision) there seemed to be more than one place in which f-string / format spec parsing is done: grepping for F_ALT (which is defined in Include/internal/pycore_format.h) I found

  • Python/formatter_unicode.c has parse_internal_render_format_spec

  • Objects/unicodeobject.c has unicode_format_arg_parse

  • Objects/bytesobject.c has _PyBytes_FormatEx

etc. I got it working by hacking just at Python/formatter_unicode.c, and I could try something similar for implementing precision for ints. For a merge however this seems like it needs to be done correctly, in all the appropriate places. Looking at the experts index would it be appropriate to tag @ericvsmith of f-strings? :slight_smile:

>>> "% #.4x" % (65,)
' 0x0041'

>>> f"{65: #.4x}"
# ValueError: Precision not allowed in integer format specifier

lol, perhaps it was an oversight of PEP 3101 (Advanced String Formatting):

The precision is ignored for integer conversions.

Since this proposal would supersede that clause, would a PEP be required for proper legislative purposes? PEP 682 (Format Specifier for Signed Zero) which added the z format specifier to coerce ‘negative zeros’ to positives required a PEP, even though it is backwards compatible.

Yeah, it’s probably worth writing up a PEP for this. In addition to other benefits, it would make it more of a “hey, check out this new feature” than a mere line in a What’s New.

I’m in favour of this too.

2 Likes

I doubt we need a PEP here, see e.g. Add underscore as a decimal separator for string formatting · Issue #87790 · python/cpython · GitHub. Just open an issue and submit a PR.

You may want, however, take look on PEP 3101 discussions. Probably, ignoring the precision setting was discussed at some point, e.g. it introduced an incompatibility with printf-style formatting (and C).

Okay today I’ve done lots of digging though PEPs, Python mail archives, git history and I’ve a few remarks:

I was reading through PEP 3101 trying to discern a reason why precision wasn’t implemented for ints in new-style formatting back then (2006), even though testing Python 3.0 it works for old-style formatting, eg "%#.4x" % (65,) returns 0x0041. Since I couldn’t see a reason, and I had done plenty of looking through the mail archives (thread 1 and thread 2), I decided to start drafting an email to Talin / Viridia, the author.

I was wondering whether there was a particular reason for forbidding precision for int data, or whether this was just an oversight because of the overlap of width and precisions’ behaviour for simple formatting were both to be added, eg f"{21:04x}" and f"{21:.4x}" both returning '0015'?

I have looked through the git history of PEP 3101, whence I have narrowed down the decision to this commit. The re-wording suggests to me that one realised the precision specifier should apply to more than just the float type: that it should behave for Python str strings as it does for C const char * strings, truncating to the first n characters, however for int type data precision could be left unimplemented since f"{21:04x}" and f"{21:.4x}" would render to the same result.

But. Upon expanding the whole version of the file at that git commit, not just the diff, and then at this later commit in 2007 of PEP 3101 being accepted, the format spec is given by

[[fill]align][sign][0][width][.precision][type]

This differs from what is currently displayed on the PEP 3101 page!

[[fill]align][sign][#][0][minimumwidth][.precision][type]

It was retroactively updated in 2008 to include the # ‘alt’ presentation form specifier in this git commit. This anachronistic addition is what has thrown me off since the start. I can’t find ‘issue 3138’ that it references either. Also since the 2023 switchover of PEP 3101 from .txt to .rst the # is formatted wrong, like a comment, mocking me…

In the original format spec the overlap of width and precisions’ behavior for ints is 99%, the 1% difference that I can see being when sign is ' '. This is a miniscule oversight, and I don’t feel the need to pester Talin with an email over an off-by-one: ignoring that, it was perfectly fine to implement only width, ala the Zen of Python “There should be one-- and preferably only one --obvious way to do it.” and “Special cases aren’t special enough to break the rules.”. It is the later additions of # (which interestingly works in Python 3.0, even though not formally documented in PEP 3101), and grouping_option (',' or '_') that really start to broaden the distinction between precision and width, and fully justify the re-admission of precision as laid out in OP.

So my plan now:

  • Write a PEP for this, just as there is PEP 682 for [z].
    • Firstly this is for proper documenting of feature addition, which requires a condensed and inspiring version of OP, as @Rosuav suggests.
    • Secondly this is to avoid any chicanery of sneaking new features into old documentation :sweat_smile:
2 Likes