Rounding to significant figures - feature request for math library

@komoto48g On a typical machine, Python’s round is correctly rounded, while NumPy’s round is not. Neither choice is objectively wrong or right: doing correct rounding is computationally expensive relative to a simple scale-by-power-of-ten, round, then scale back approach, and NumPy favours speed over perfect correctness in corner cases.

For the particular value you chose, note that the fraction 1/200 is not exactly representable as an IEEE 754 binary64 value, so the actual value that round is working with (again on a typical machine) is the closest value which is representable, which is 0.005000000000000000104083408558608425664715468883514404296875. As you can see, it’s a little larger than 0.005, so when rounding to two decimal places it’s correct to round it up, to 0.01.

I’d strongly prefer to keep Python’s round correctly rounded. And equally, I’m sure the NumPy folks would like to keep their round as performant as possible.

And if you’re a tax officer, you should really be using decimal instead. :slight_smile:

5 Likes

@mdickinson, Thank you for correcting my misunderstanding.
My misunderstanding came from that numpy always seems to be doing even-rounding “correctly” due to the algorithm.
https://numpy.org/doc/stable/reference/generated/numpy.around.html

I tried a few test and found that numpy sometimes acts as the same way as built-in round.

>>> round(0.005, 2), np.round(0.005, 2)
(0.01, 0.0)
>>> round(2.005, 2), np.round(2.005, 2)
(2.0, 2.0)
>>> round(8.005, 2), np.round(8.005, 2)
(8.01, 8.01)

Of course, I will! But I think they must be using ceil. :grinning:

Yes: it’s mildly annoying that NumPy seems at a casual glance to be doing round-ties-to-even correctly, while Python appears to be doing it incorrectly, when in fact it’s the other way around. :frowning: What’s going on here is that NumPy’s algorithm is scale-by-power-of-ten, round-to-nearest-int (using round-ties-to-even), and then scale back. So for example for rounding to two decimal places after the point it’s doing the equivalent of this (only faster):

def numpy_round_to_two_places(x):
    return round(x*100.0)/100.0

What then happens is that the multiplication by 100.0 often (but not always) turns a case that would have been an exact tie if it weren’t for floating-point imprecision into something that is exactly a tie (exactly halfway between two integers). So then round-ties-to-even kicks in and behaves “as expected”.

Here’s some testing with values [0.005, 0.025, 0.045, ...], all of which look like ties that “should” round down with a naive What-You-See-Is-What-You-Get interpretation:

>>> import numpy as np
>>> looks_like_a_tie = [(20*x + 5)/1000 for x in range(100)]
>>> rounds_down_python = [x for x in looks_like_a_tie if round(x, 2) < x]
>>> rounds_down_numpy = [x for x in looks_like_a_tie if np.round(x, 2) < x]
>>> len(rounds_down_python) / len(looks_like_a_tie)
0.52
>>> len(rounds_down_numpy) / len(looks_like_a_tie)
0.97

So NumPy looks as though it’s doing round-ties-to-even, but still doesn’t get it quite right:

>>> set(looks_like_a_tie) - set(rounds_down_numpy)
{0.545, 1.245, 1.225}

Moral of the story: if you’re depending on predictable results from a decimal representation of a binary approximation to a decimal round of a binary approximation to a decimal halfway-case, then … Well, just don’t, okay?

4 Likes

Note that one other use of significant figures in floats is comparison: “Are these two values the same to N sig figs?”

But that’s what math.isclose() is for :slight_smile:

However, I HAVE needed (or thought I needed) this in the past. I think maybe it was writing to JSON. I wanted a value to be a number in the JSON, but wanted only a particular number of significant figures.

In [8]: pi = math.pi

In [9]: json.dumps([pi, round(pi, 3)])
Out[9]: '[3.141592653589793, 3.142]'

This is kinda-sorta like an output string – but I didn’t want it to be a string in the JSON, and I didn’t want write my own encoder.

It seems this may be more of an issue for a JSON encoder, but it could be useful for a variety of output formats: CSV, yaml, ???

strictly speaking, the same arguments hold for JSON, if you are going to use the numbers for computation later, you should keep the digits, and the application consuming the JSON can round its output as it sees fit. But JSON is also a human-readable format – so the output form can be important. And there can be storage space issues as well – fifteen digits of FP numbers do not compress well.

Does that justify it being in the math module? maybe – as some of the discussion here indicates, there are subtleties, it would be nice to have a well thought out solution in the stdlib.

Many times mathematicians, scientists, developers and users of apps like a “viewing simplicity” while still keeping numeric formats - because the numbers may vary widely by powers of 10. Which is list easier to read?
[3.1753459e15, 7.4721684e-12]
or [3.1800000e15, 7.4700000e-12] ?
Most people would argue the second list is easier to read and mentally keep track of, particularly if further calculations are involved. A Python purist wants to keep using round and that’s fine too. But for people who wade through numbers constantly, a sig_round function in the math library would avoid the annoyance of too many visible computer digits while still keeping a floating point number data type.

Again, this may not satisfy a Python purist, but it sure helps visually understanding numeric content much more quickly and keeping buckets of sometimes annoying extra digits out of the visual way - all while still allowing further calculations with one, several or multiple numbers.
Ref.
from math import log10 , floor
def round_it(x, sig):
return round(x, sig-int(floor(log10(abs(x))))-1)

print(round_it(1324,1))

Source: https://www.delftstack.com/howto/python/round-to-significant-digits-python/

Alas, that is not practical. A sig_round cannot do that, not with floats at least.

Floats are a fixed 64-bit data type, with (usually) 53 bits in the significand (sometimes called the mantissa). That corresponds to 15.9546 decimal digits, so 15 or 16 digits. You can’t increase or decrease that. Every normal float has exactly 53 bits = 15 or 16 digits.

(Aside: technically, denormalised floats have fewer significant bits, but let’s not open that can of worms. They’re not relevant here.)

The consequence of this is that although you can round the float 7.4721684e-12 to 7.47e-12, it is actually equal to 7.470000000000000e-12 and there is no way to tell that float to only use 8 signficant figures, or to tell Python to display that float with 8 sig figures by default.

Python floats display using the minimum number of digits that, when entered, will give that same value. You can’t pick and choose the display for floats, and even if we provided such a function, it could only apply globally, and not per float.

That is to say, it would not be possible to create the float 1.e-1 with one sig figure, and the float 1.00e-1 with three sig figures.

Any such sig_round function could only do what round already does, except with an adjustment to work with sig figures instead of digits after the decimal point.

The best we can do is to have a single float equal to (approximately) 0.1, and then format it as a string to however many significant figures you want. In other words, the status quo.

However, it is possible to create a whole new data type where every floating point number has an arbitrary precision, but that would come with its own issues. In fact, Python already has such a data type: the Decimal type in the decimal module.

If you are unfamiliar with Decimals, I recommend you spend some time learning how they work, their pros and cons, and what they offer that floats don’t (variable precision, to start with).

By the way, none of what I say above is anything to do with being “a Python purist”. It is how floats work in just about all programming languages.

It took me one attempt to find a flaw in that. Floats don’t display trailing zeroes, and even if they did, there is no way to configure each individual float with a different number of digits:

>>> round_it(0.1, 2)
0.1
>>> round_it(0.1, 5)
0.1
>>> round_it(0.1, 10)
0.1

Here are some more flaws:

>>> round_it(2.675, 3)
2.67

>>> x = round_it(32.702715, 4)
>>> print("Display value:", x)
Display value: 32.7
>>> print("True value:", "%.46f" % x)
True value: 32.7000000000000028421709430404007434844970703125

If you read this entire thread, you will find where I got those numbers from, and why they behave as they do :slight_smile:

Where does the 28421709430404007434844970703125 come from, given doubles on ly have 15.5 digits?

Remember that 0.125 could have only one significant figure in binary. It can sometimes take a lot of decimal digits to represent a 53-bit mantissa.

It’s because 46 digits were requested. The number can be represented in decimal as 32.7.

The decimal number 32.7 cannot be represented as a float. It simply doesn’t exist. When we type 32.7 as a literal, we get this number in hex:

>>> (32.7).hex()
'0x1.059999999999ap+5'

or as a IEEE-754 Binary64 float:

sign bit = 0 (positive)
exponent = 5 (unbiased)
significand = 0x059999999999a

That number is exactly equal to this fraction:

>>> Fraction(32.7)
Fraction(2301057934609613, 70368744177664)

which is the exact decimal 32.7000000000000028421709430404007434844970703125

See Bruce Dawson’s blog for more detail about floating point precision.

Note that when he refers to “float”, he is referring to a C single-width float (32 bits), whereas Python uses a double-width 64 bit float. So the numbers he gives are about half what Python uses, but the principle is the same.

1 Like

The decimal number 32.7 cannot be represented as a float.

I don’t think this is a clear way of explaining floating point representations. After all, you can ask for a floating point representation of 32.7, so it “can be represented as a float”.

I think it’s better to think of the (almost surjective) mapping f from real numbers to their corresponding floating point representations. That is, for every floating point representation a, there is a set of corresponding real numbers {x} for which f(x) = a.

When printing out some float a, Python does the logical thing of choosing the shortest x for which f(x) = a --unless you explicitly ask for extra digits.

When you ask for a floatiing-point representation of 32.7, what you’re asking for is the nearest representable number. And when Python stringifies that, it does its best to find the shortest number that would be represented by this value, which masks the issue by printing out the digits “32.7”.

You’re right that, for any given float value, there are (uncountably infinitely) many reals for which it is the closest. That’s just how rounding/quantization works.

If I ask you to identify the closest integer to 32.7, you would be able to do so. That doesn’t mean that 32.7 can be represented as an integer.

It’s just a question of language. I don’t think it clarifies the situation to say that “32.7 can’t be represented as a float”, which seems unnecessarily confusing. I think that process of finding a representative element is representation.

Although I agree that thinking of the “nearest representative” is also useful, not all of your intuitions will hold. For example, if you define some g, which maps a real number to the nearest representative float (your idea), then it’s not true that z = x + y \Rightarrow g(z) = g(x) + g(y). Whereas, the function f just formalizes representation without implying anything about the operations on representative elements.

A better phrasing than “not practical” would be “may not be easy to implement”. It is highly practical to have a sig_round function, for A) all the reasons listed well above and that B) many/multiple users/programmers keep asking for the functionality, for years now. And it is A) math related and B) conceptually entirely different than a simple, plain vanilla - round function.

I’m familiar with mantissa’s, much, but not all, of how python stores it’s numbers and also the decimals data type, so the flaws in your argument are that 1) you seem to misjudge the understanding level of other programmer(s, including at least this one), 2) do not seem to understand the widespread pressing need for such a sig_round function and 3) do not seem to understand that it has a credible, almost trivial math solution, hence a python math library solution great match. Simply modify mantissa structure for n significant digits, not decimal places. Most even beginner programmers understand basic mantissa + exponent features of computer floating point data types, e.g. that the ‘fraction’ 1/3e-12 does not have a 2 decimals floating-point ‘exact’ equivalent, but that 0.33e-12 ‘can and does have’ a valid math and floating point representation and that 0.330000000000000000000e-12 is easier to follow-on/subsequently apply floating point math to than 0.33333333333…e-12 and wholly exogenous to and completely uninspired by math.round.
Ref. also

And
3.3E-13
floating point:
3.2999999759718290359700176850310526788234710693359375E-13
error due to conversion:
-2.402817096402998231496894732E-21
binary: 00101010101110011100011000000011
hexadecimal: 0x2ab9c603
Note how the error introduced by computer representation is to the E-21 and several orders of magnitudes well less than the original E-13 number.
Ref. IEEE-754 Floating Point Converter
https://www.h-schmidt.net/FloatConverter/IEEE754.html

o_O

I’ve got a maths degree and I don’t know off the top of my head what “surjection” is, I had to look it up. And you think that’s plain simple language that ordinary folks will understand?

And in another post further down the thread:

Confusing or not, it is 100% accurate, precise and correct. There is no 64-bit binary float which represents the decimal number 32.7 exactly. And as you point out, there are an infinite number of other decimal numbers which as represented by the float 32.7. But 32.7 itself is not one of them.

It’s certainly less confusing than “there is a set of corresponding real numbers for which f(x) = a” which not only invents a custom syntax but also breaks transitivity for equality.

That is, there are an uncountably infinite number of pairs of real numbers x and y where there is some float a such that:

$f(x) = a = $f(y)
but x != y

To put it another way, under your definition, some number x equals the float a, and y also equals the float a, but x does not equal y.

I don’t think it is helpful to abuse equality in that way.

I’m going to push back against the notion that it is “confusing” to talk about some real numbers not being representible. Anyone who has used a calculator knows that there are numbers that you can’t enter into a calculator, because they are too big, or too small, or have too many digits.

The only weird thing here is that we’re not used to being unable to exactly represent such ordinary-looking numbers as 32.7.

That’s not “an abuse”. That’s the reality of the situation. And x doesn’t “equal” float a. a = f(x). Trying to think of representative floats as real numbers is going to create all kinds of misapprehensions about what should happen, in my opinion. The floating point numbers are a finite ring–not real numbers.

Yes, but 32.7 isn’t one of those numbers that is too big or small or has too many digits to be entered. You can enter 32.7 into Python by assigning it to a variable, and Python then finds the representation f(32.7).

I think if you said “you can’t represent 32.7 exactly”, then that would be clear and much less confusing than “you can’t represent it”.

But that’s exactly the point of confusion. People THINK that, since they have a source code representation for a number, it must be a valid floating-point number.

Let me ask you a different question. Is it possible to have a rational number (a fractions.Fraction or in the mathematical sense, whichever you like) for pi? It’s a transcendental number, and it’s been known for some time now that, no, you cannot have a finite rational that is precisely equal to pi. So is it possible to represent pi with a fraction? Well, actually, in a lot of extremely useful ways, it is - 22/7, 314/100, 355/113 are all very practical and useful values that we can pretend are equal to pi - but they’re not actually equal. You cannot represent this number as a ratio of integers. It’s not really a question of “represent exactly”. If you’re not representing it exactly, you’re not representing it.

(This may have implications in political and/or judicial contexts. I’ll let other people figure that out. “No Taxation Without Exact Rational Representation”?)

Finding a number that is close to the one you want, while not actually being it, is not representing it. It is approximating it, and can be a better choice for many reasons, but it doesn’t change the fact that 32.7 cannot be represented as a ratio of two numbers such that the denominator is a power of two and the numerator has fewer than 53 binary digits in it.

This interpretation is going back to your thinking of the floating point numbers as real numbers. And like I said, this will lead to incorrect conclusions.

You’re not “finding a number” at all. You’re finding a representative in a finite ring. The interpretation that you like is choosing to view that representative as an ordinary real number. I understand the convenience of thinking like that, but it leads to various bad conclusions to think of these representatives as numbers.

At this point we’re quibbling over language, but okay. Let me rephrase. “Finding a floating-point value that is close to the number you want”. Would that makei t clearer?

It’s broadly equivalent to finding a rational approximation for an irrational number; there are restrictions on the representable numbers, but the representations do correspond to specific numbers. I don’t think anyone would dispute that the rational value 1/8 is truly the same number as the real number represented as 0.125 or any other way you like, and so I put it to you that the floating-point value that represents this is ALSO equal to that number.

The fact still remains that there is no floating-point value which represents the number 32.7, despite it having a source code representation.

We just disagree about the meaning of the floating point number. I don’t think it’s necessary to assume that each floating point number a is equivalent in some way to some real number, let’s call it h(a). If you know something about the way a was produced, you may choose the most likely representative from the set \{x \mid f(x)=a\} differently.

For example, if you know that a was produced by a user inputting a value that is a multiple of 1/49, then you should choose a different representative than h(a).

And this is exactly what this question is about. Sometimes, you want to print out the representative with some number of significant figures—not h(a).