PEP 682: Format Specifier for Signed Zero

belm0 · February 9, 2022, 12:14am

Abstract

Though float and Decimal types can represent signed zero, in many fields of mathematics negative zero is surprising or unwanted – especially in the context of displaying an (often rounded) numerical result. This PEP proposes an extension to the string format specification allowing negative zero to be normalized to positive zero.

Example

>>> x = -.00001
>>> f'{x:z.1f}'
'0.0'

gpshead · February 9, 2022, 8:36am

Some suggestions: cover what, if anything, other languages provide for formatting floats in such a manner. The answer to that may well be “nothing”; if so explicitly state that. Your Motivation section hints at that. I suggest saying it directly. List what other languages you have looked at. Python doing this could influence other languages to do the same. A win for everyone.

Are there any alternatives to z that you considered? If so add a rejected alternatives list with reasons.

z is allowed for numerical types other than integer

This could be a confusing restriction. I suggest allowing it for all numeric types. Otherwise people must convert potentially unknown number types into a float to avoid z raising a ValueError.

I also suggest at least considering how this could be done in % formatting even if doing so winds up rejected. logging uses that. In C99 printf a z already has meaning as a specifier for size_t, but a symbol such as _ as a no -0 sign flag might work.

If done, there is some value in being consistent between what python format and % format do for this, defining it for % today instead of picking up another inconsistent way in the future could save pain.

Has anyone proposed negative 0 sign elision within a future C standard printf?

A potential downside to doing it in % formatting is if C ever does change and decides to go another way we’d be long term inconsistent with that.

belm0 · February 9, 2022, 10:42am

Thank you-- good suggestions.

cover what, if anything, other languages provide for formatting floats in such a manner. The answer to that may well be “nothing”; if so explicitly state that. Your Motivation section hints at that. I suggest saying it directly. List what other languages you have looked at. Python doing this could influence other languages to do the same.

Here’s what’s written now:

To date, there doesn’t appear to be other widely-used languages or libraries providing such a formatting option for negative zero. However, the same z option syntax and semantics has been proposed for C++ std::format().

I’m not aware of anything providing this option, and the author of the C++ proposal apparently wasn’t either. If someone is aware of a language or library with such an option, do let us know…

There was a survey by Rust devs checking which languages propagate or suppress negative 0 (-0.0 should format with a minus sign by default · Issue #1074 · rust-lang/rfcs · GitHub). If some language offered it as a formatting option I expect they would have noted it.

(Proving that something doesn’t exist is hard-- maybe I’m lazy.)

Are there any alternatives to z that you considered? If so add a rejected alternatives list with reasons.

Originally I thought ~, but that looks too similar to -. And then I came across the C++ proposal, which uses z.

I don’t know if we can predict the future as to what option character will get consensus across languages and C libraries. If we do guess wrong (or don’t have enough influence), it would still be feasible to adopt the new character, while supporting z for legacy programs.

z is allowed for numerical types other than integer

This could be a confusing restriction. I suggest allowing it for all numeric types. Otherwise people must convert potentially unknown number types into a float to avoid z raising a ValueError.

Sorry for the confusion, I’ll clarify this. It refers to the type option of the format, i.e. d for integer. Just as you can’t use precision with type d, z can’t be used either. As for the runtime type passed into the format, integers are fine.

In C99 printf a z already has meaning as a specifier for size_t

C99 appears to use z as a qualifier on integer type u specifically. So it may be feasible to disambiguate it with z for non-integer numeric types.

gpshead · February 9, 2022, 6:45pm

Excellent, somehow I missed the C++ mention when reading the first time. That they’ve also proposed z is a good sign that we should go the same route. Linking to that rust survey from the PEP is useful, it’s interesting data.

Agreed on ~ being undesirable. So many font rendering things screw that character up. I was also pondering = but realized that’d wind up causing a walrus operator conflict. f'{value:=.1f}' would not be fun for anyone to parse.

jeff5 · February 12, 2022, 11:02am

f'{value:=.1f}' parses fine. “=” is an alignment character.

You can’t use _ because it is a grouping option, and since everything is optional, you wouldn’t be able to parse f'{value:_.1f}'

The proposed PEP 682 is different from the C++ proposal cited in a possibly crucial detail. In the latter, the ‘z’ is an optional addition to the (optional) sign production. 'z.1f' would not be legal. It would be expressed '-z.1f'. Put another way, the PEP 682 grammar might be [sign[z]] not [sign][z].

My instinct was also to make the desired capability part of the optional sign specifier. I was thinking of another one-character symbol to add to the alternatives there, but it has to be possible to express other sign rules in combination with the desired sign elision on rounding. C++ has shown how.

I wonder if making # do this elision, along with its suppression of the decimal point, although technically not backwards-compatible, would really spoil anyone’s day?

I wasn’t sure what was meant by:

it may be feasible to disambiguate it with z for non-integer numeric types.

It is really useful from an implementation perspective to be able to parse the format specifier without needing to know the type of value to be formatted. (I haven’t looked but suspect f-strings rely on it.) Not every format applies to every type, to be sure, but you’d like to check that after interpreting what’s been asked for.

belm0 · February 12, 2022, 11:55am

The proposed PEP 682 is different from the C++ proposal cited in a possibly crucial detail. In the latter, the ‘z’ is an optional addition to the (optional) sign production. 'z.1f' would not be legal. It would be expressed '-z.1f' .

I don’t think that’s C++ proposal author’s intention. Granted, the specification wording is a little sloppy, but he clearly shows examples using z without sign:

string s4 = format("{0:z.0},{0:+z.0},{0:-z.0},{0: z.0}", -0.1);
// value of s4 is "0,+0,0, 0"

Put another way, the PEP 682 grammar might be [sign[z]] not [sign][z] .

In the PEP I’ve explained why I see [sign[z]] as problematic:

The proposed extension is intentionally [sign][z] rather than [sign[z]] . The default for sign ( - ) is not widely known or explicitly written, so this avoids everyone having to learn it just to use the z option.

jeff5 · February 12, 2022, 1:38pm

Good point. They’ve got the spec or the example wrong.

mdickinson · February 13, 2022, 1:57pm

A few words about why I support this PEP:

Binary floating-point is weird. It’s machine-friendly, but it’s not particularly human-friendly. So you do your calculations using binary floating-point because that’s what’s efficient for the machine, but when it’s time to present the results of those calculations to a human, the recommended and standard approach is to format your floats, converting them to a form more appropriate for human consumption than a simple str or repr (or .hex()) would give.

Float formatting can be regarded as a composition of two operations: the first operation potentially changes the value - it conceptually rounds to an internal decimal fixed-point or floating-point format - for example, for “.3f” formatting, we round the input to the nearest representable decimal value in a decimal fixed-point format with 3 digits after the point. For “.5g” formatting, we round to the nearest value representable in a decimal floating-point format with precision 5. Then the second operation chooses how to turn the value into that format into a string, making use of the user’s choices about sign formatting, length padding, whether to display the value in scientific notation or standard fixed-point notation, etc.

The key point for me is that this internal decimal format should be targeting humans, not machines - that’s what formatting is for. The oddity with current formatting is that that internal decimal format includes signed zeros, and that’s peculiar for something that’s aimed at humans rather than machines. The PEP effectively gives an option to target a decimal format that does not have signed zeros, and so is closer to how humans expect to see numbers written. I have encountered complaints in practice about signed zeros in human-facing stuff, though this was for numbers displayed in a GUI rather than in a printed report.

There are other oddities arising from the current formatting: if I format general float values using a format ".1f", I’m effectively binning those values into bins of size 0.1. Except that because of the inclusion of the sign into the result, there are two bins of size 0.05 instead of a single bin of size 0.1 around 0.0.

So I agree that there’s a real problem to be solved, and that there isn’t an obvious non-fragile way to do it right now, without this PEP or something like it. I don’t love the z spelling, but I don’t have any better suggestions, and the potential future alignment with C++ is a concrete argument to prefer this spelling over other possibilities.

And yes, other languages don’t seem to support this (yet). But I don’t think that’s for lack of need: with just a few minutes of searching I turned up Stack Overflow questions asking about how to do this in Python, Java, Objective C, and Swift, and I’m sure there are many more examples out there. I don’t see any reason why Python shouldn’t lead the way here.

mdickinson · February 13, 2022, 2:01pm

John, do you want to send a post to the python-dev mailing list that points to this discussion? (Similar to this post for PEP 679, for example: Mailman 3 [Python-Dev] PEP 679 – Allow parentheses in assert statements - Python-Dev - python.org .) I’m not sure everyone on python-dev has got used to the idea of looking here for PEP discussions.

kalvdans · February 14, 2022, 12:48pm

It is not clear from the PEP (as of version a89d703) if only exact negative zero should be affected or if negative small numbers should also be printed without minus sign. Please enhance the proposal with that information.

belm0 · February 14, 2022, 1:51pm

Thank you. It was implied by the examples, but definitely should be stated explicitly in the specification:

When 'z' is present, negative zero (whether the original value or the
result of rounding) will be normalized to positive zero.

belm0 · February 14, 2022, 11:32pm

Posted-- thank you.

ericvsmith · February 15, 2022, 11:26am

@mdickinson’s reasoning is persuasive to me. I support this PEP.

vstinner · February 21, 2022, 11:42am

If most people don’t expect negative zeros, why not changing the default and add a formatter to opt-in for negative zero? Maybe str(float) (“for humans”) should format -0.0 as 0.0, whereas repr(float) (to debug) should format -0.0 as -0.0). f'{-.00001:.1f}' would return 0.0 and f'{-.00001:z.1f}' would return -0.0.

It’s a backward incompatible change which is likely to break the test suite of many Python projects. Maybe it’s worth it? Python 3.1 went through a similar breakage when str(float) was modified to return “short” representation (Python/dtoa.c): sys.float_repr_style.

belm0 · February 21, 2022, 12:25pm

The Rust developers explicitly decided against this (see thread). The argument was roughly: display print (i.e. str()) is used for logging, and logging should not discard info that may be useful in some applications. It sounds as though Swift does have such a distinction though.

It would be unfortunate if some some other language (C++) subsequently adopted z with the opposite meaning, given a very probable decision that existing programs cannot be broken.

Signed zero formatting is not only a problem for Python programs. The oversight in the lack of an option and nice default was made long ago in the early printf implementations, and permeates all software. It would be controversial to say that we’re now going to break every program in existence that depends on displaying -0 (whether intentional or not). As a goal that’s “good enough” and more feasible, I’d rather see languages and libraries reach some rough consensus on opt-ing in via format spec.

vstinner · February 21, 2022, 12:42pm

Would it be possible to explain that in the PEP?

gpshead · March 2, 2022, 5:18pm

The steering council briefly discussed this PEP yesterday and we decided it made sense to defer to @mdickinson to be the PEP delegate decider for this one assuming he’s willing to take that role. (Submission for SC consideration: PEP 682 -- Format Specifier for Signed Zero · Issue #110 · python/steering-council · GitHub)

h-vetinari · March 2, 2022, 10:20pm

Regarding parity with C++, the cited P1496 was withdrawn, but I did find a paper by Victor Zverovich – principal author of fmtlib/fmt, and driver of the monumental standardization process for std::format & std::print (in its 13th revision…) – that argues pretty strongly against doing this in C++.

Edit: just saw that this paper was withdrawn as well. I don’t think the promised consensus paper ever materialized.

Not saying this is a bad thing to do for python, just that the references to C++ should probably be updated (and perhaps this changes the equilibrium of trade-offs e.g. regarding syntax).

belm0 · March 2, 2022, 10:26pm

If you follow the issue trackers for P1496 and P2021, the conclusion was that both the P1496 and P2021 authors withdrew their proposals, and would provide a consensus proposal for C++23.

From my reading, Victor Zverovich’s response seemed to misunderstand the issue and overstate the performance impact. (The Python reference implementation proves that the performance affect is negligible.) There are no hints in the tracker as to what the “consensus” is, but I hope that the authors have worked it out.

h-vetinari · March 2, 2022, 10:33pm

Yeah, I saw after posting (I had been mislead by the large gap in paper number to think that one supersedes the other).

I asked about the status of the promised consensus paper, but my impression remains that it’s dead (C++ standardization is littered with corpses of both ideas and – figuratively – their authors; the default is statis, and only with extreme effort can things be made to move).