Implement `precision` format spec for `int` type data

jb2170 · April 3, 2025, 1:24pm

Okay sorry for flip-flopping here: my previous message which suggested implementing ‘exact precision’ with . was based off if we had to choose only one behaviour for . for integers, lest we add some new syntax like ! to achieve both. I hadn’t considered that we could use the z format specifier (which is currently unused for integers) in order to implement both. So disregard my last message.

I think we do. . for precision and z. for modulo followed by precision.

I agree. f"{25:#.2x}" is '0x19' and f"{1025:#.2x}" should be '0x401', not an OverflowError / ValueError. So ‘precision’ . (without a z) will be identical to %-formatting. Although I said I was okay with f-strings / str.format differing from %-formatting, that was only if we had to choose exactly one behaviour, i.e. without z having an effect to activate twos-complements mode; there’s no point purposely making % and str.format differ from each other.

Yes.

f"{25:#.2x}" is "%#.2x" % 25 is '0x19'
f"{1025:#.2x}" is "%#.2x" % 1025 is '0x401'. No truncation to exactly 2 digits, as 2 is only the minimum, So the PR needs to be changed to not be an OverflowError / ValueError
f"{-1025:#.2x}" is "%#.2x" % -1025 is '-0x401'. Same behaviour as %-formatting. I don’t think I’ve ever used negative hex format, but there’s no point breaking compatibility if we can avoid it.
f"{1025:#.2}" is "%#.2d" % 1025 is '1025'. Normal.
f"{-1:#.4}" is "%#.4d" % -1 is '-0001'. Normal.

I disagree with this. This is like the ‘Minimal width binary representation’ I talked about in a previous message. It’s okay when you know what you’re looking for, but even I got confused at the example on GitHub

>>> format(-129, '.8b')
'101111111'

That threw me off, I started reading it as 256 + 127 = 383. That’s why I suggested against ‘minimal width binary representation’

So I would go back to my proposal to implement . as ‘precision’, behaviour 3, the same as %-formatting, and behaviour 4 (take modulo base ** n first, then pass to .) as z. instead of adding a new !.

Also

I think I made a strong case in OP that having to re-calculate the width to account for the prefix is un-user-friendly, so this whole proposal is motivated

One of my original examples was a hexdump function, which fails for any non-ASCII bytes if a signed range were enforced

def hexdump(b: bytes) -> str:
    return " ".join(f"{i:#.2x}" for i in b)
hexdump("привет".encode()) # ValueError: Expected integer in range(-2**7, 2**7)

The solution is to implement the z format specifier for ints iff used with precision as z. with behaviour 4 that I laid out here

TLDR

So tldr: I suggest implementing the previous proposal, but with z. instead of adding a new !

. should do the same job in f-strings and str.format as it does with %, precision being the number of digits, as laid out in OP
z. should take modulo first, moving x into range(0, base ** n), and then formats that with precision, ie f"{x:z.{n}{base}}" is f"{divmod(x, base ** n)[1]:.{n}{base}}"

jb2170 · April 3, 2025, 1:59pm

When the syntax + behavior is settled I’ll re-write up the OP + resolution for negative numbers (this message but with z. instead of !) as one PEP more cleanly: to document the behaviour, rejected alternatives, and something more than a mere what’s-new section. The z. for ints genuinely is new. I’ll open a PR.

Syntax aside my method is correct, I shouldn’t have entertained the compromise yesterday being overly-agreeable: 1) breaking away from %-formatting by unnecessarily forcing only one of ‘precision’, ‘exact precision’, or the Frankenstein half-and-half compromise, 2) there’s no way this PR can go ahead with only the restrictive signed range, instead allowing the formatting any integer modulo-ed into [0,\text{base}^n). Even invalid escape codes like print("\y") only raise a syntax warning SyntaxWarning: invalid escape sequence '\y'

skirpichev · April 4, 2025, 4:22am

Problem with just taking remainder (i.e. value%2**n for 'b' type) is that is not a lossless conversion. An this information loss is silent.

I doubt it helps too much, especially if you have digit separators.

Though, %-like formatting (as specified in my proposal) might be helpful for solving Mark’s concern. Without 'z' option we can print unsigned integers without extra leading zero, e.g.:

>>> f"{200:.8b}"
'11001000'
>>> f"{200:z.8b}"  # now twos complement, can't fit in 8bit
'011001000'

BTW, in principle we could interpret values as twos complements also for 'd' formatting type. But I’m not sure if this is helpful.

jb2170 · April 4, 2025, 10:17am

I know, but I think that’s fine. . is implemented for str data in a lossy truncating way

f"{'Hello World':.5}" # 'Hello'

Implementing z. in a lossy modular arithmetic way is how we want it. It’s like a signed char in C going 125, 126, 127, -128, -127, … round and round.

The most likely contexts in which z. is used is dealing with range(0, 256) or range(-128, 128) bytes, which z. consistently formats, the binary representations of signed char and unsigned char being the same. Any hypothetical problem arising from a user’s program printing 0b00000001 which the user interprets as 1 when the underlying integer is actually 257 sounds like a problem with a library, not a problem with the formatting (eg a poorly written bitmap library trying to write pixels with value 257).

Another context is when one purposely doesn’t care about the q part of q, r = divmod(x, base ** n). The formatting is a well defined bijection between the equivalence classes \mathbb{Z}/\text{base}^n \mathbb{Z} and formatted strings. I think this is what Raymond wanted.

Without the z the . precision should be the same as % formatting: the minimum number of digits (that is excluding 0{b,o,x} prefix, sign space, grouping separators etc), and negative numbers ie x = -y should be formatted as %-formatting does, y’s formatting with a negative sign.

With the z (which looks like a 2 if you squint hard enough ) that activates twos complements mode and takes x mod base ** n and formats that. It shouldn’t do the weird variable-length formatting

f"{200:z.8b}" # '011001000' is wrong
f"{200:z.8b}" # '11001000'  is right

Yeah I don’t think I’ve ever seen a 2s complement (10s complement?) rofl. I don’t think that would be useful to anyone. We would only implement z. for binary, octal, and hex.

jb2170 · April 4, 2025, 12:44pm

PEP in progress.

storchaka · April 4, 2025, 2:31pm

Is a PEP necessary? This is a pretty minor feature.

jb2170 · April 4, 2025, 2:51pm

Yes, it interacts with PEP 3101, and both . and z. need a formalised documented rigorous defence. I want to write it (I’m part way through), and as previously discussed it’s nice that it’s more than just a What’s-New section.

Don’t worry, I’ve added you and Raymond to the authors field

skirpichev · April 5, 2025, 4:44am

Here we had no options.

It’s an option (obvious pro’s: easy to implement and explain it’s behavior), but not a unique one. Raising an exception - another approach.

What if your input not fits into specified precision range? With lossy conversion you will silently get a garbage.

It’s not weird in no-'z' case, isn’t?

Nothing unusual, it’s same binary arithmetic. But output will be in decimal and computation of an appropriate bitsize will be more tricky. It’s doable, but I don’t see obvious use-cases for this.

Recently support for digit separators in the fractional part was introduced without a PEP, despite PEP 378 and PEP 515.