Formatting numbers to string with 'r'-formatting for rounded precision

hangaard · October 17, 2024, 2:09pm

In high school chemistry I learned to round my results to the number of significant figures given by the precision of the inputs.
I now work with chemical engineers and they also practice this way of printing the bottom line.

Python number string formatting cannot do this, but other languages can.

d3js can format numbers rounded to significant digits.

<!DOCTYPE html>
<html>
<script src="//d3js.org/d3.v4.js"></script>
<body>
<script>
var stringfloat = d3.format(".3r");
d3.select("body").append("p").text(stringfloat(0.0012));
</script>
</body>
</html>

Try it on W3Schools Tryit Editor

Reference: d3-3.x-api-reference/Formatting.md at master · d3/d3-3.x-api-reference (github.com)

Feature request

What I suggest is to copy the functionality, the ‘r’-formatting from d3js exactly into python number string formatting.

Examples of expected behavior:

>>> f"{123456.789:.3r}"
'123000'

>>> f"{0.000123456789:.3r}"
'0.000123'

>>> f"{12:.3r}"
'120'

>>> f"{0.012:.3r}"
'0.0120'

>>> f"{0:.3r}"
'0.00'

More context

>>> arg1 = 123456.789  # 9 digit precision
>>> arg2 = 100000         # 1 digit precision
>>> result = sum(arg1, arg2)
>>> result
223456.789
>>> f"{result:.1r}"  # I can only report at 1 digit precision because arg2 has 1 digit precision.
'200000'

chepner · October 17, 2024, 2:38pm

F-strings produce str values, not floats, and a float simply does not preserve any notion of precision that the string representation might supply. The built-in round does something similar to what you want:

>>> round(123456.789, -3)
123000.0

but of course the float doesn’t remember any of the precision information, so the corresponding string representation doesn’t drop the .0 which implies 7 significant digits.

>>> str(_)
'123000.0'

If you want significant digits, you need a new data type, not just a change to string formatting.

effigies · October 17, 2024, 2:42pm

>>> class Rounding(float):
...     def __format__(self, spec):
...         if spec[-1] == 'r':
...             prec = int(spec[:-1])
...             mag = math.ceil(math.log10(abs(self) * 2)) if self else 1
...             ndigits = prec - mag
...             return format(round(self, ndigits), f'.{max(0, ndigits)}f')
...         return super().__format__(spec)
...
>>> f'{Rounding(123456.789):3r}'
'123000'
>>> f'{Rounding(0.000123456789):3r}'
'0.000123'
>>> f'{Rounding(12):3r}'
'12.0'
>>> f'{Rounding(0):3r}'
'0.00'

Nodd · October 17, 2024, 2:49pm

The proposal was explicitly for formatting, so I guess @hangaard just forgot the quotes in the return values to mark that they are strings. In this case there is no need for a new data type, it’s really just formatting.

That said I think that the proposal doesn’t answer the initial problem of significant figures. I’ll take the first example: 123000 has 6 figures, not 3. The correct formatting would be using scientific notation: 1.23e+05, which has the required 3 significant figures. It’s available right now as f"{123456:.2e}", the only caveat is to declare 2 figures after the dot instead of 3 figures total.

effigies · October 17, 2024, 2:51pm

Also, a quick link to a prior, similar discussion: New format specifiers for string formatting of floats with SI and IEC prefixes

storchaka · October 17, 2024, 3:43pm

You are searching for the ‘g’ format code.

>>> f"{123456.789:.3g}"
'1.23e+05'
>>> f"{0.000123456789:.3g}"
'0.000123'
>>> f"{12:.3g}"
'12'
>>> f"{0.012:.3g}"
'0.012'

If you need to preserve trailing zeros, this is ‘g’ with the ‘#’ modifier.

>>> f"{12:#.3g}"
'12.0'
>>> f"{0.012:#.3g}"
'0.0120'

avylove · October 17, 2024, 6:37pm

The problem with the 'g' format specifier is it forces scientific notation for larger and smaller values. This may not be what the user intends and there is no equivalent of 'f' which supports significant figures.

An alternative to additional format specifiers that has been floated in the past is to add a new flag indicating precision should be interpreted as significant figures. This would then allow use of multiple existing format specifiers with significant figures such as 'f', 'F', 'e', and 'E'.

Nodd · October 17, 2024, 7:53pm

You have to use scientific notation to have the correct number of significant figures. How would you write '1.23e+05' without scientific notation while keeping 3 figures ?

jamestwebber · October 17, 2024, 8:00pm

Trailing zeros are usually not considered significant figures unless there is an explicit decimal point, so 123,000 would work.

Regarding the OP, I don’t think there’s a need for additional formatting options here–if one knows the number of sig figs at the time of display, the existing options are fine. And if you want to actually track significant figures and propagate them, it’ll require more than just formatting to get that right.

Nodd · October 17, 2024, 8:26pm

Trailing zeros are usually not considered significant figures unless there is an explicit decimal point, so 123,000 would work.

Oh, I didn’t know this rule. Note that later in the page it’s not that simple.
Anyway, that’s not the main problem here.

storchaka · October 17, 2024, 9:00pm

How would you write '1.20e+05' without scientific notation while keeping 3 figures?

hangaard · October 18, 2024, 6:08am

Yes indeed I forgot the ‘’. Edited the initial post.

hangaard · October 18, 2024, 6:11am

Great question. d3 pad with zeroes, no matter how many zeros before or after the decimal. It never switches to E notation.

hangaard · October 18, 2024, 8:17am

Agreed, you couldn’t.

Thank you for pointing out that ‘g’-formatting can preserve the trailing zeroes and that we got that functionality covered.

>>> f"{120000:.3g}"
'1.2e+05'
>>> f"{120000:#.3g}"
'1.20e+05'

The ‘r’-notation idea is to not switch notation, but pad with zeroes, like d3js, despite the fact that we lose the precision information in this case.

jagerber · October 23, 2024, 10:53pm

Yes, this has all basically been hashed in the other thread that was linked. my post there was my best attempt at summarizing what is missed by the current formatting options with respect to “signficant figure” display.

Note that I think there is some confusion when it comes to significant figures. There is one notion that has to do with rounding and one that has to do with uncertainty and error propagation.

I patently dismiss the utility of significant figures for uncertainty and error propagation. Significant figures are a BAD WAY to communicate uncertainty. If you care about uncertainty you should simply report the mean value of your measurement together with a confidence interval (often a symmetric one- or two-sigma confidence interval). E.g. 8.3 +/- 0.3. The obvious flaw with using the number of significant figures for display is that, unless you are strictly using scientific notation, you cannot tell by inspection how many “significant figures” a number has. For example, is 100 shown with 1, 2 or 3 significant figures?

However, it IS useful to be able to round to a number of significant figures independent of their poor use for uncertainty tracking. By this I mean the following. To round to 3 significant figures is to round to the second decimal place below the top decimal place. So 123456.789 rounded to 3 significant figures is of course 123000. Of course 123000 would appear the same whether we round to 3, 4, 5 or 6 significant digits but I don’t care. Why should I? It’s just specifying a decimal place to which to round. However, if we round to 7 significant digits I would display that as 123000.0.

One case where this is useful is if this number is part of a value/uncertainty pair. We could display 123456.789 +/- 789.987 but many of these digits are uninteresting. We usually only care about one or two digits of the uncertainty. so we would round the uncertainty to, say, two significant figures. We also want to round the value to the same decimal place resulting in 123460 +/- 790. This is a case where I want rounding according to sig figs but I also might happen to want fixed point notation.

So I will continue to say that I think better significant figure rounding in python string/number formatting would be valuable to me and others. There are some blind spots with the current string formatting options.

I’ve developed a pypi package called sciform that allows for clear significant figure rounding. With that package you can do something like

from sciform import SciNum

n = SciNum(123456.789)
print(f"{n:!2f}")
# 120000

See some of the documentation for some decisions I made differently for the sciform FSML compared to the python built-in FSML.

The main difference I want to highlight is that I remove the g format specifier which seems to be a weird historical hodge podge of options that are occasionally convenient for numerical display. But the g option makes a lot of decisions under the hood that are not really easy to understand without a lot of thought. sciform instead gives the user explicit control over the different aspects of number formatting they might be interested in. For example the user can select significant figure or digits-past-the-decimal rounding independent from their choice to use fixed point, scientific, or engineering notation.

hangaard · October 24, 2024, 11:25am

I disagree significant digits is a “bad way” to communicate uncertainty.

8.3 +/- 0.3 is also imperfect. What does it mean? Does +/- represent the width of a distribution? Which distribution? What is the type of uncertainty? Measurements noise, different model outcomes, aleatoric or epistemic? It’s not a complete description anyway.

‘r’-formatting is easy to read, easier than E-notation for many users of the applications we write in Python, especially users who are less technical.

jagerber · October 24, 2024, 12:39pm

I agree that formatting that rounds to a significant number of significant figures, always in fixed point mode, would be useful. That’s why I wrote sciform to allow this. I’m trying to say that people shouldn’t get hung up on whether 12000 means 12000 +/- 500 or 12000 +/- 50 or 12000 +/- 5 as an argument against significant figure rounding because significant figure rounding is a known bad way to represent uncertainty. But that doesn’t mean significant figure rounding is not useful. For example, when expressing uncertainty as value +/- uncertainty it is helpful to have significant figure rounding to round both the value and uncertainty individually. There’s other cases where you may not be being scientifically rigorous about some measurement uncertainty but you still just don’t want to show a ton of digits.

I would be behind including a notation for significant figure rounding. In sciform I replace the . that indicates precision in a format specification with !. So that

from sciform import SciNum

num = SciNum(123.456)
f"{num:.2f}"
# 123.46
f"{num:!2f}"
# 120
f"{num:.2e}"
# 1.23e+02
f"{num:!2e}"
# 1.2e+02

So .2 always means two digits past the decimal place, wherever the decimal ends up within the mantissa, and !2 always means to show the most significant digit and one digit to the right of that. Incidentally, in sciform, I use r format specification to indicate engineering notation which is like scientific notation except the exponent must always be a multiple of three for convenient representation using SI units.

I guess the “idea” here is to have a notation that makes this work without requiring an external import and wrapping 123.456 in a call like SciNum(...).

If nothing further is said, I would tacitly assume that +/- probably means plus or minus one standard deviation of a normal distribution, or a distribution that approximates a normal distribution. If nothing further is said I would assume this is statistical rather than systematic uncertainty. Yes, it’s not a complete description of the uncertainty. If a more complete description is needed then you’ll need to provide that in text, tables, graphs, or more complex uncertainty representation such as

8.3 (+1.1/-2.1)_stat (+0.2/-0.3)_sys

or something.

But the simple 8.3 +/- 0.3 or 8.3(3) gives you a lot of information pretty compactly.

In contrast if someone just reports 8.3 should I interpret this as 8.30 +/- 0.05? It seems like 8.3 could represent anything from 8.3 +/- 0.01 to 8.3 +/- 0.1, a factor of 10 difference in fractional uncertainty.

gcewing · October 24, 2024, 8:12pm

The point of significant figure rounding isn’t to convey the amount of uncertainty precisely, it’s to avoid giving the impression of more accuracy than is actually there. The term usually used to describe this is “spurious precision”. It’s about information content, not uncertainty.

jagerber · October 24, 2024, 10:29pm

Ok that’s fair. The number of sig figs gives an upper bound on the uncertainty. If you’re particular about how you format thing it can also give a decent lower bound. That is, if I tell you something is 12.34 kG and you happen to know I’m being careful about sig figs then you know my uncertainty is between 0.01 and 0.001 kG.

If I say something is 1200 kG then you know my uncertainty is between 100 and 0.1. So the upper bound is decent but the lower bound is poor because of the zeros appearing in the number. You can convert to sig figs if you want to keep the better lower bound.

But anyways, I think the point still stands that having a formatting option to round based on sigs AND always retain fixed point formatting would be useful in some cases.

Formatting according to significant figures can be home-spun using rounding and some clever formatting, but a little bit of care is needed to handle all cases and some edge cases. It also might be challenging to get it to play nice with other features available in the python built in FSML.

PeterL · October 27, 2024, 9:04pm

I would think this would go well into the Decimal class. It already has control for rounding beyond what Python does (for example ROUND_HALF_UP), already has rounding for engineering (to_eng_string), and would be a clear place to add extra functionality.