PEP 786: Precision and modulo-precision flag format specifiers for integer fields

PEP 786 in a nutshell proposes to implement the format specifiers precision (.) and modulo-precision flag (z) for integer data fields in new-style PEP 3101 formatting.

Precision (.) shall work in the same way as it does in old-style %-formatting, producing a minimum number of digits, and a leading negative sign for negative numbers.

The ‘modulo’ flag (z) shall first reduce its argument into range(base ** precision) to produce a predictable two’s complement format, most noticeable for negative numbers. eg: f"{-1:z.2x}" is equivalent to f"{-1 % 16 ** 2:.2x}", which is ff not -01.

The PEP serves as a comprehensive exploration of the potential ways these two format specifiers could have been implemented, with respect to both functionality and formatting syntax. It may be obvious™ now that precision should be implemented the same old way, but rejected alternatives were inspected lest we squander an opportunity to implement something superior. Similarly modulo-precision is something that I and others have desired, and the choice of using z as the flag is discussed in the PEP, to keep the formatting spec modest but without precluding any future features.

Previous pre-PEP discussion thread here

Quick link to reference implementation branch here; have a play!

3 Likes

Thanks for a short summary, it really helps for the given PEP.

I think we should split the proposal on two parts.

First one is precision ('.') meaning for integer format types. This is that initially suggested in the old thread and it seems to be much less controversial for me: the precision settings ensure an appropriate minimal number of digits in magnitude of the result (sign-magnitude representation). In principle, same could be achieved with the width field & 0-padding with a little math. Simple and clear idea. Doesn’t look super useful, but is harmless at least. And it reduces incompatibility wrt old-style formatting. IMO, this doesn’t require a PEP and might be implemented in a regular PR right now.

The second proposal is the behavior of the precision setting, altered by the 'z' flag. I don’t think it’s a good idea. In particular:

— no, it’s not a two’s complement format. You don’t show us original integer value, but in two’s-complement representation. Sometimes it will be true, but not always.

This is something new for formatting of integers: other options are lossless, they preserve the value of integer. You propose to display some new number instead, unrelated to the original value in general. The mentioned property holds also for C’s printf-type functions: all not corrupt the original integer value. And much worse: you don’t warn user in a some way, if the value was irreversibly truncated.

Lets ensure, that @rhettinger (bug creator) could confirm this your judgement.

My guess, that feature in this form — rather useless for teaching (bug use case). Instead, for such application — I would expect that 'z' option will produce original integer value, but in twos-complement format, instead of sign-magnitude. Then the meaning of the precision settings is clear: e.g. for '.<n>b' formating string — we want the value, formatted as unsigned, in n-bits twos-complement. And if the value not fits for the given precision — we want to warn user somehow and have following options:

  1. Either raise an exception (OverflowError, like for 'c' format type, or ValueError)
  2. Emit more digits than specified by precision settings (that will warn user in a soft way)

(N.B. both variants were implemented in my pr. The current pr state reflects (2) variant.)

PS: I’m not discussing other details of the proposal, like using of the 'z' field (e.g. there could be a new flag or we can use some value of the sign flag). This looks premature for me.

I disagree. As per the PEP’s abstract:

Both are presented together in this PEP as the alternative rejected implementations entail intertwined combinations of both.

If one is added without the other, and the other later is also to be added, but its implementation requires changing the first’s, it would be a mess at best and at worst a permanent incompatibility.

The whole point of the PEP, and many format specifiers in general, is so that this is done programmatically for the user.

?

The reduction of the integer into range(base ** precision) is intended and useful. Whether an underlying number formatted to "0xff" is -1 or 255 depends on what the user is programming, but in a machine-width oriented environment either integer is expected to print as "0xff". In other environments it’s an intended truncation / wrapping.

By two’s complement I mean what the number looks like if we try jamming it into a precision-bit register. Think of it as formatting the number with infinite length, unlimited preceding 0s for +ves, unlimited 1s for -ves, then truncating. Similarly for octal and hex. Yes two’s complement can only support a fixed number of bits in a fixed width register / C programming language, but Python’s integers are unlimited*, and we implement a sane extension.

The only deviation I can think of is that when using octal one can achieve 30 or 33 bits using precision 10 or 11 respectively, but not 32 bits, though I expect binary and hexadecimal to be the most common case invocation of z in machine-width oriented environments, and outside of that again works as intended.


>>> import os

>>> mode = os.stat("/bin/passwd").st_mode & 0o7777

>>> f"{mode:#.4o}" # full mode including setuid, setgid, sticky/temp

'0o4755'

>>> f"{mode:z#.3o}" # just the plain perms

'0o755'

Nope, it’s an intended feature


>>> key_id = 0xCC586D97DC421CB9A0ED6CA903D283F3F02C2660 # random int gpg key

>>> f"{key_id:z.8X}" # short

'F02C2660'

>>> f"{key_id:z.16X}" # long

'03D283F3F02C2660'

>>> f"{key_id:X}" # full

'CC586D97DC421CB9A0ED6CA903D283F3F02C2660'

To also quote the PEP

if a library sets a pixel brightness integer to be 257 […] that’s not our problem or doing […] let the appropriate ‘layer’ of code raise the exception

Your PR is not the intended behavior. It was also buggy.

Just lol. The PEP has already marinated for a year. This bickering is what led me to put this idea on hold for the year, even though it feels right.

You don’t understand the z flag’s purpose and you have very bad taste in wanting "0x080" when +128 is passed to f"{num:z#.2x}" and "0x80" when -128 is passed to it; that went into the rejected alternatives.

I’m looking more for the second opinions of other people in this thread at this point sorry. Not trying to be passive-aggressive it’s just pointless us butting heads lol.

2 Likes

Is the PEP missing some sections? I’m used to seeing Specification and/or Implementation sections after Rationale but this one just goes straight to Rejected Alternatives after Rationale. Part of the rationale actually seems to be a specification so maybe there’s just some headers missing?

3 Likes

Both proposals looks completely orthogonal. Handling 'z' flag (or whatever else it will be) could be implemented later without touching first part of code. That will also simplify review. And I think that this part can enter to the v3.15.

While for the rest I think it’s better to not rush with changes. The formatting specification mini-language is already complicated. Lets not clutter it without really good reason.

As proposed solution for the referenced issue — it’s the intended behavior, of course. It was approved by author of the issue, hence I believe my judgement that it fits his needs has rather strong ground.

Could you please elaborate on this? The fact that it does something else than simply wrapping of the integer value — not a bug per se. It’s an intended and documented behavior.

Lets agreed on not doing claims like above without good proofs.

Then sorry about that. I will abstain from following discussion, unless some core devs will find my participation here constructive.

But note that in the comment right above people also find your PEP hard to read and do suggestions just like ignored by you during PEP review:

Lets not use subjective arguments. The reason why people might want such output in both cases (or an exception in the first case) was explained right above: the +128 value doesn’t fit into 8-bit signed integer, while -128 — does. So, if someone would like to demonstrating, and teaching two’s complement arithmetic — this difference is significant. It’s about mathematics, not taste.

2 Likes

This is a good reason that people might want the output to differ between 128 and -128, but it doesn’t seem like a reason that they should output 0x80 and 0x080 respectively[1]. It is not obvious at all to mii at a glance that the leading 0 is carrying meaningful information. The way the PEP proposes seems far more intuitive and useful than this somewhat arcane notation.


  1. Or do I have them backwards? It’s really not intuitive to mii which one is supposed to be which here ↩︎

1 Like

Well, it was explained that in the first case (+128) you can’t represent the value as two’s complement with the specified number of bits (8 in this case). Hence, you would like to signal about this to user.

One possibility is to emit more digits, i.e. the precision settings in this approach will specify minimum number of bits for two’s complement representation. Assuming you know what’s two’s complement (BTW, we might want to add this term to the Glossary, as it already appears in several places across docs), it’s of course obvious where to put extra digits (not necessary zeros).

As an alternative to this — we could raise an exception. (Like for the 'c' formatting type, if integer value not fits into character range — thus OverflowError looks appropriate.) Here is a draft pr to play with such approach: gh-74756: support precision field for integer formatting types by skirpichev · Pull Request #21 · skirpichev/cpython · GitHub

(As I’m not going to be involved in further discussion, I will not suggest you to elaborate more on your intuition. It certainly not fits mine, as I wouldn’t expect that format flags could alter the value of integer. That’s something expected for floats, not integers. But in any case — you will have to look in docs to find out what’s 'z' flag means, there is no way to replace this learning step by some “intuition”.)

Please note that above was discussed a concrete issue, mentioned in the thread description. And I argued, that PEP’s approach is useless in this context, i.e. teaching hardware arithmetic.

I hope this post finally make clear my argumentation.