Rounding to significant figures - feature request for math library

I assume the last step was meant to say “output”.

One thing that’s not immediately obvious in this is that it typically assumes that the calculations are done in the same base as the rounding was. That was always trivially the case when doing things by hand, because everything was in decimal. But now, with computers in the mix, so that calculations are typically done in binary, it’s not quite as obvious how the implied “now switch from binary to decimal” (and back) steps affect this.

Regardless, I think that “rounding to a given number of significant (decimal) digits” is a reasonable operation to offer. Whether it’s as appropriate to given situations as people are suggesting is up for debate, but people can use anything wrongly, so to an extent that’s secondary[1].

Personally, I think this is related to the idea of decimal arithmetic, and as such the implementation I suggested above using the decimal module feels appropriate to me. But whether a float->float operation should go in the decimal module, or whether using the decimal module from the math module is appropriate, are complicated questions that I’m not comfortable answering. I would say that any implementation should be checked to ensure it gives the same answers as using the decimal module would.

At this point, I think a PR adding the function is probably the next step, if anyone has sufficient interest to actually try to make this happen. It might still get rejected, but I don’t think further debate here is likely to significantly[2] change the likelihood of that. And just to be clear, I don’t think that adding a function like this needs a PEP.


  1. There’s a question of whether it would be an “attractive nuisance” but I’ll ignore that. ↩︎

  2. Pun intended :wink: ↩︎

It is a “lie for children”. The first step is here only to make your calculations simpler (it was written before every pupil could have a calculator). Even with calculators, there were no point to write 10 digits if only 2 or 3 are significant. And in the last step, you should take into account the error of the result, which depends on errors and values of input data. The number of significant digits is a rough approximation, useful if you perform calculations in your head. But computer does not need this shortcut, it can do better, more correct.

I said “mostly”, because rounding intermediate results can be the part of a specification. You do not discuss external specifications, you just implement them. In that case it may be more correct to use Decimal.

We’ve had this discussion about a gazillion times before. Floating point numbers aren’t real numbers. This is not a Python problem, or a bug in round, this is a fundamental fact about computer arithmetic using floating point numbers in every language, on every operating system, on every hardware.

Like chasing a bubble that is trapped under wallpaper, all you can do is move the problematic cases around, you can’t eliminate them. Decimal can remove this specific issue but only at the cost of adding even greater errors into other calculations.

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

If the least precise of your input data has (say) 3 sig figures, and your calculation ends up with 16 because that’s what floats have, only 3 of them are genuinely meaningful so you should round the output number to 3 sig figures, matching the input, at the very end.

I don’t think it is any different than the issues that from round already faces. If you round to N sig figures, you should get the closest possible binary float to what you would get if it were done in decimal.

That closest possible float is not always what you expect, but that’s floating point maths for you :slight_smile:

You can eliminate at least some of those suprises by working with Decimal instead, but at the cost of potentially making some errors in your calculations worse.

(Search for “wobble” in Goldberg for an explanation of why binary floats are preferred over decimal.)

How old do you thing I am??? :slight_smile:

We had scientific calculators when I was in secondary school. They were available when I was in primary school, but we didn’t use them. By the time I finished secondary school and went to uni, the very first graphing calculators capable of performing algebra as well as arithmetic were just starting to be available.

Anyway, the process of dealing with input data of differing precision is complicated and I am sure different people have different opinions.

Yes, I like the idea for output rounding with the given number of significant digits.
I’ve been writing the round_to_significant_digit() functions for a long time, here are some of my experiences

  • When, after rounding, float looks like integer, it is usually better to forget about its original floatness, i.e. in most cases 12.3 should be represented as 12, not 12.0.

  • sometimes digit precision is not enough, it is nice to round a number to the closest product of 5. e.g. round 2412 to 2500, not 2000. This is what humans naturally do. I call this rounding to 1.5 significant digits.

  • sometimes it is nice to represent the rounding effect with a different symbols, e.g. 12345 rounded to two significant digit can look like 12000, 12*** or 12---, depending on application.

  • sometimes it make sense to round number to a certain precision, e.g. after the same procedure 1.41 would look like 1.4, 9.41 should look like 9 (and maybe 5.41 should look like 5.5)

So, as you can see, there is a lot of things you can do when rounding for representation. Maybe it is an argument for not implementing the function (it is so diverse, lets write it separately). But I still would be happy to see round_to_sd() function is standart python, so I would not make this argument.

Maybe it’s an argument for implementing a rounding library, though, which has all of the various options available.

2 Likes

To deal with significant figures, perhaps the best approach is not to round the float at all. Instead create a class which stores both the float value and the number of significant figures, then overrides all the arithmetic operations to compute the result unchanged, but also pass through the correct number of significant figures (minimum of the two inputs?). Then formatting would produce the right number of decimal points. Obviously much less performant than raw floats, but it would produce the right answers. Perhaps one of the libraries that track units might have this functionality?

Was I being that obtuse? Of course floats are not real numbers, nor are they decimal numbers – that’s been hashed out TO DEATH in this thread and many others.

But the fact remains that as much as everyone tells us “that’s not what you want”, and “what you want is not possible with floats” – it IS still useful. At least to me, and presumably to the very many people that ask about this, and post solutions on stack overflow, etc, etc, etc. Sure, some of those folks may misunderstand what they are getting, but many of us don’t misunderstand, we just still find it useful (and yes, I have one in my personal toolbox).

The reason I posted the round() example is to make the point that round() has exactly the same issues as a “round to significant figures” function would have, and it’s been in the stdlib since the beginning, and in the C math lib, and probably every other language’s (certainly most) math libs.

As a science guy, I’m still very surprised that it’s not a standard feature of math libraries.

Chris A: Please don’t make the argument that because you don’t use something, that means no one else should have a need for it. We all are solving different problems.

I think these are facts:

  • A properly written floating point round_to_significant_figures function has all the issues, but no more, than most other float functions, specifically round(), It is no more nor less abusable or confusing than many other floating point operations.
  • It is asked for quite often, to the point where there are a lot of discussions / solutions, etc. on stackoverflow and the like, and indeed, on python forums like this one.
  • It is not so trivial that we should expect everyone to just write that one-liner themselves – there are a lot of not-great solutions out there, including in this very thread.
  • But it IS a one (or few) liner, so it’s a bit silly to put it on PyPi – leftpad anyone?

All that leads me to think it would be a good candidate for the stdlib math module.

@pf_moore suggested that this is a small enough thing to not need a PEP (i agree), so maybe the next step is a PR, and let the core devs decide. I think that’s a fine idea, but I don’t know that I want to put the time in to write a PR (and the math lib is written in C, so not that trivial) if it’s simply going to get rejected outright. So maybe a note to the dev topic (what DO we call the “sections” on discourse?) to see if they are open to the idea is in order.

Final note: I actually agree that significant figures are mostly about “display” – at least in this case, where I’m not proposing a whole system for tracking precision. However, that doesn’t mean that you should have to do string formatting to get it. Python is very nice in that the float __str__ (and __repr__) will show you only as many decimal digits as it needs to get “as close as you can” [*] to the underlying binary value. So in practice, if you round a float to a certain number of decimal digits, and display it with anything that uses the string representation, then it will work as expected – and this is, in fact, quite useful:

  • you don’t need to use formatting, and remember the format specifier you need to simply print the number, or write it to a file, or …
  • You don’t need to do anything fancy to control how some other library will convert it to a string:

A recent example of mine: I need to write floating point values to JSON (yes, to later be displayed in a Web UI) – and I really didn’t want tp put a full 15 digits into the JSON. Turns out if I round to sig figs first, then the JSON lib simply does the right thing – so yes, it’s display (or conversion to strings), but no, I don’t want to have to do that string conversion myself:

In [68]: json.dumps(a)
Out[68]: '[1234.567890123451, 0.000234567890123451, 1.23456789012345e+30, 1.23456789012345e+24]'

ugly!

In [69]: json.dumps([sigfigs(x, 3) for x in a])
Out[69]: '[1230.0, 0.000235, 1.23e+30, 1.23e+24]'

Nice!

And oh so much easier than trying to control how the JSON is written.

[*] honestly, I’m not sure of the algorithm or the precise definition, but for practical purposes it works as expected.

5 Likes

Category

I personally don’t see an issue with single-focus libraries on PyPI

single focus is great, but a single small function? I’m not so sure:

3 Likes

Oh, why did I not look earlier, here it is:

and here’s the code in that package:

def signif(x, n):
    return round(x, n - int(math.floor(math.log10(abs(x)))) - 1)

Which, as I understand it, is not the best way to do it :frowning: – I think because of precision issues at the limits.

A stdlib function would get more review and testing.

1 Like

I’m not sure I agree with that comment, but even if I did, a stdlib function could only change to fix issues in a Python release. Whereas you could raise an issue with that library suggesting a better implementation, and have it available tomorrow…

This would fail at zero

Coming from a core dev, that’s pretty disheartening :wink: – I assume a few smart eyes on it would be all it needed.

And I agree with your point for more complex and untested libraries, but this is a one line function – it’s not going to see a lot of change in the future.

But this is the problem with PyPi – things on their aren’t vetted, so we can get no=great implementations. I could, and maybe will, post a PR on that project, but if the author isn’t amenable, or isn’t active, then someone would have to put yet another small package on PyPi, and hope that folks would find the right one.

Despite the overhead, there’s a lot to be said for the vetting and discoverability the stdlib provides.

2 Likes

Agreed, and I said a long way back that maybe someone should just write a PR.

My disagreement was simply with the idea that a CPython PR would somehow get more review and testing than a library on PyPI coupled with the discussion here. (Assuming the audience for the function was big enough in the first place…)

That’s not how the process works. We don’t take third party functions into the stdlib in order to improve them, we take them into the stdlib if they are generally useful and not too big a burden to maintain.

In this case, the function will fail on zeroes, NANs and INFs, and may or may not work correctly on other values. So you would have to fix those flaws first before even considering adding it to the stdlib.

But that’s putting the cart ahead of the horse. First we need agreement that this feature is important enough to add it to the stdlib.

I’m sorry if this seems too damn conservative and stick in the mud, but once we add something, we can’t easily change our mind even if it turns out to be useless, harmful, hard to use or difficult to maintain. Backwards compatibility effectively requires us to keep the function even if nobody ever uses it (str.swapcase()) unless it becomes a serious problem.

And even then there’s a long painful process of removing it.

This may help you see where we’re coming from.

I think it’s completely clear how a hypothetical math.round_to_figures [1] would be expected to behave on float inputs (namely, exactly like @storchaka’s format-based one-liner, which does the right thing on all corner cases). And it’s not hard to implement, either.

But I have a question for the proponents: would math.round_to_figures be polymorphic? That is, would it be expected to return (correctly rounded) int / Decimal / Fraction outputs for int / Decimal / Fraction inputs? And would user-defined classes be able to provide their own implementation for round_to_figures?

If yes, what mechanism do you propose for the polymorphism? For example, round achieves this through the special dunder __round__ method.

If no (i.e., round_to_figures should just be float to float), then that makes implementation straightforward, but it makes the functionality a little odd. The int, Decimal and Fraction types can all represent the result of a round-to-figures operation exactly in the original format, while float cannot. So we’d be implementing the operation for the one numeric type that can’t always represent the output exactly, and ignoring the three that can, which seems a little perverse. We’d also end up with hard-to-explain corner cases where a Decimal or int input gets converted to a float before round_to_figures is applied, and ends up not rounding correctly because that conversion changed the value.

The underlying problem here is that the binary floating-point type float isn’t a good target format for a decimal round-to-significant-figures operation. Either Decimal or str would be better targets, since both can not only represent the output exactly, but also keep information about trailing zeros. And for both Decimal and str, the desired functionality already exists (see @storchaka and @pf_moore’s one-liners up-thread). And then for float it’s only a float-call away from either the str-based or Decimal-based solution.


  1. exact name to be bike-shedded ↩︎

1 Like

I like your idea of supporting int, Decimal, and Fraction and agree with your reasoning. For polymorphism here, I would propose decorating round_to_figures with functools.singledispatch, which has the advantage of supporting user-defined types without adding any dunder method.

No one was suggesting taking that function into the std lib – it was the other way around, someone suggested, and I’m a proponent, of adding something new to the stdlib, I then discovered that there is such a thing already on PyPi, and it turns out to be a not-great implementation. To me, that’s a reminder of some of the advantages to the community of having things in the stdlib – what the heck happened to batteries included?

I know, and so does everyone else that’s been around python-ideas for more than ten minutes that there is a lot of overhead to adding something to the stdlib – you don’t have to keep telling us that over and over again.

@mdickinson
Thanks! those are really great points. My thoughts:

The OP said “for the math library” – in that case, only working on floats is OK. When we added math.isclose(), my original prototype supported Decimal, and I think any other type that supported the math operations – but since it was going in the math lib, and was to implemented in C, we decided to stick with just float.

As you point out, that’s a different story than this one, as one of the main purpose of isclose() it to help with the vagaries of floats, whereas significant figure rounding is perhaps better suited other types – certainly Decimal.

So yes, I would love to see it be polymorphic, and support other types – but I don’t know that it has to go full-on and implement a new __dunder__ – after all, how many custom types are there that can’t be reasonably converted to one of the stdlib ones – notably Decimal ?

In fact, supporting only float and Decimal [*] would go along way.

That being said, where would it live? – I still think there are issues with putting polymorphic functions in the math module. Maybe decimal? In fact, perhaps we could get far by converting the input to Decimal, doing the round, and converting back to whatever the input type was.

[*] I left int out because for practical purpose, the reason you want sigfig rounding, rather the plain round() is that you don’t know, when writing the code, the order of magnitude of the number – and in that case you are probably using a floating point type – i.e. float or Decimal.

1 Like