Trying to understand rounding - in python -

hello @all, I’m new here, pls. be tolerant if I harm habits I don’t know about,

I’m working on bin-FP-math imprecisions, and was pointed by Steven D’Aprano that python is doing a good job - from a decimal POV - in rounding

>>> round(0.30000000000000004, 16) 
0.3 

https://mail.gnome.org/archives/gnumeric-list/2021-July/msg00019.html
( where standard FP algorithms fail to 0.3000000000000001 reg. scaling up
=0.30000000000000004 by 10^16 → 3000000000000000.5 )
searching in the forum I found that python can also apply std. FP-math imprecisions:

>>> round(3.805,2)
3.81

>>> round(4.805,2)
4.8 

It would help me with my project if I can understand how python rounds, esp. for the (0.30000000000000004, 16) case.

  • prettyfied display?
  • calculate with decimals?
  • ‘string math’?
  • ‘ties to even’ towards decimals digit rather than binary bit?
  • trickier algorithm than other programs use?
    add. to explaning this a code pointer where ‘round’ is calculated in python would be helpful.

TIA for any help.

In terms of how it works: It’s up to the implementation. In the reference implementation, it’s implemented in C, and is fairly complicated.

In terms of the semantics (i.e., what the result should be for certain inputs): yes, it uses banker’s rounding, as described in the documentation. and rounds towards a decimal digit (since that’s how the interface is specified). However, also as described in the documentation, the results of banker’s rounding may not be as expected:

>>> round(2.675, 2)
2.67

This happens because 2.675 isn’t actually exactly halfway between 2.67 and 2.68:

>>> def quantize(f):
...     """rationalize the denominator of a float value"""
...     num, denom = f.as_integer_ratio()
...     return (num << 53) // denom
... 
>>> (quantize(2.67) + quantize(2.68)) // 2
24094258006432154
>>> quantize(2.675)
24094258006432152
1 Like

thank you @kknechtel, competent and fast,

reg. ‘code’: I see ‘buf’ and ‘dtoa’ there, thus ‘string math’, I see no scaling / multiplication beforehand, thus the ‘4’ in 0.30000000000000004 is preserved, but as well the overshot in 3.805… ( doesn’t harm ), and the undershot in 4.805… ( harms ) ( from a decimal POV ). Python ( rounding ) isn’t intended for ‘mass data’, or your dtoa is really fast?

reg. ‘expected’: The attempt with string math and the regret about the discrepancies expressed in the documentation point in the direction that python strives for decimally correct results? If ( and only if, I’ve met too many developers who love their bin-FP-math deviations in a way that’s blocking decimal ideas ) … if you or python would like results with better ‘decimal compatibility’: perform dtoa to a qualified ‘shortest roundtripping string’, and then manipulate that. It would set you / python free from the random over- or undershot in the bin-FP representations. Evtl. ‘ryu’ can serve to achieve similar in one step, I didn’t try it for ‘fixed’.

reg. ‘bankers rounding’: whichever standard, interface … defines such I consider nonsense. We have ‘math’, pure math!, and as variants ‘statistical math’, ‘bankers math’, ‘telco math’, ‘signal processing math’ … it’s easy and straightforward to perform the variants if you have ‘pure’ at hand, and you can produce much better results than with ‘ties to even’. In contrast it’s nearly impossible to perform pure math - which users expect - if an underlying standard restricts you to ‘bankers’. But that’s what we are facing today. Horrible mistake, one of the biggest problems of IEEE, I can’t eat as much as I would have to spew about it! Besides I mean to remember that IEEE defines ‘ties to even’ for binary towards bits, not for decimal towards digits!

have just realized that python indeed performs ‘ties to even’ even on the
‘decimal side’, and towards integers, e.g.
‘round( 2.5 )’ → 2,
‘round( 3.5 )’ → 4,
‘round( 4.5 )’ → 4,
can someone point to the rationale behind that? the expected benefit?
IMHO ‘the standard’ IEEE 754 proposes ‘ties to even’ only for rounding
of the binary significant in decimal → binary conversion, and that’s 0 for all
small integers, thus also for 3 and 5.
As far as I see most applications and also programming languages try to
get near(er) to human decimal expectations, e.g. spreadsheets and C-code
round(), why does python move opposite?
I see three different behaviors:
‘round( 0.2 + 0.1, 16 )’ → 0.3 - successful efforts to be better human compatible than other programs,
‘round( 0.285,2 )’ → 0.28 - problems in that effort reg. imprecise binary representation,
‘round( 2.5 )’ → 2 - arbitrary decision to go away from human common standards.
( I know that ‘Decimal’ does better, but am looking for better human compatibility with floats too. )

754 specifies that nearest/even rounding must apply by default to all operations. Not just conversion to string, but also addition, subtraction, multiplication, division, and square root. Its successor IEEE-854 applies the same rules to decimal floating-point systems too (as implemented by Python’s decimal module).

“Most applications and also programming languages” are just loathe to change what they started doing decades ago, when “add a half and chop” was almost universally used to round. That was easier to implement. Now they can’t change without becoming backward-incompatible.

“Banker’s rounding” was pioneered by accounting applications, where rounding errors in pennies applied to millions of transactions can add up to significant dollars. In long chains of operations, “add a half and chop” is systematically biased to delivering answers too large. “Banker’s rounding” is not.

See this for stark evidence of how much more numerically well-behaved nearest/even rounding is than add-a-half-and-chop.

It’s not about trying to cater to naïve human expectations, but instead trying to reduce sources of costly numeric error. Things humans didn’t have much problem with when doing arithmetic “by hand” at a snail’s pace can be sources of catastrophic errors when compounded billions of times by the enormously greater speed with which a computer can do them.

The decimal module doesn’t suffer representation error for decimal values, so here’s the same example under that:

>>> import decimal
>>> from decimal import Decimal as D
>>> cents = D("0.01")
>>> x = D("0.285")
>>> x
Decimal('0.285')
>>> x.quantize(cents)  # does nearest/even by default
Decimal('0.28')
>>> x.quantize(cents, rounding=decimal.ROUND_HALF_UP) # but can be forced to half/up
Decimal('0.29')
>>> round(x, 2) # simpler way to spell `.quantize(cents)` for decimal values
Decimal('0.28')
4 Likes

By the way, Python isn’t immune to backward-compatibility issues either. At the start, and until development on it ended, Python 2’s round() did do add-a-half-and-chop rounding.

The introduction of Python 3 allowed for incompatible changes, and that’s when round() switched to nearest/even rounding. Here’s a good blog post about that.

2 Likes

hello@Tim Peters, thanks for your help, and thank
you for your patience with alternative perceptions.

Let me start with citing Prof. Kahan from about 2006:
‘Redesigning computers costs less than retraining
people, so it behooves us to adapt computers to the
way people have evolved rather than try to adapt
people to the way computers have evolved’.

( and a hint on a common misunderstanding: IEEE is prepared
for other roots than 2 and 10 and thus calls the bits of the binary
significant ‘digits’ at some places )

( and the insight that the space here is too small to discuss stuff
which qualifies for a scientific paper )

From a purists view there are at least three objections against wide use of
‘ties to even’:
A.) It is a deceptive, very limited improvement, dependent on random input,
and neither working well with e.g. ‘= 1.5 + 1.5 + 1.5 …’ or similar, nor with
‘= 2.5 + 2.5 + 2.5 …’ or similar with dominating ‘odd.5’ or ‘even.5’ input.
B.) Other methods, e.g. ‘count number of exact half roundings and subtract
half of that in the end’ would work much better.
C.) Referential integrity and reversibility are harmed by using different wide
and overlapping windows for rounding.

Accounting that and the historically grown common use ( where I suspect
more telephone companies than banks as the driving force ) and all the
problems every new participant has in learning idiosyncrasies of IEEE math,
I’d consider it meaningful to move back from ties to even instead of
widening it’s scope.

IEEE 754 is somewhat ‘overinterpreted’ by lots of people, the rounding defined
there in 4.3 has a limited scope of ‘Rounding takes a number regarded as
infinitely precise and, if necessary, modifies it to fit in the destination’s format’.
Observe: if necessary and fit in the destinations format. Thus for cases
datatype limitations require rounding, not to change math in general.
For ‘human decimal’ format all values are representable and none is necessary
to modify.

754 specifies that nearest/even rounding must apply by default to all operations.

Where did you read that? It’s quite common to think that way, but IMHO not ‘right’
in means of ‘the standard’. Examining: ‘http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf’ I find:
The rounding modes defined in 4.3.1, and in 4.3.3 say ‘An implementation of this
standard shall provide roundTiesToEven and the …’ - provide, not apply.
It’s also proposed as default, but not! in a way prescribing to apply. What’s used
in programming or use of programs is choice of programmers and, where
possible, users. And the target there is mostly to produce ‘human compatible’
results, not irritations.

In 4.3.2 I read: ‘Three other user-selectable …’, again as not! the standard dictating
a rounding, but offering different modi from which user / programmer may select for
their needs.

As well the rounding to integer, which is overinterpreted / wrongly interpreted by
many professionals, selecting the value with ‘even last bit in binary’ may make sense
for big values above e.g. 2^53 ( doubles ) where ‘hardware rounding’ hit’s before
programmers / users intention, I don’t see it appropriate for smaller values. And
not prescribed in ‘the standard’.

What I see in the 2008 version of the standard is the demand of e.g. ‘shall provide’
‘xxxxxxxxx roundToIntegralTiesToAway’ for almost all operations,
which I consider widely neglected by hard and software implementations.

Or does python have?

Despite it’s ‘mainstream’ it’s a wide field of problems induced by ties to
even rounding, and no problem really solved.

See this for stark evidence

one example is a hint, not proof for common benefit.
adding 500 values, 50.0, 50.1, … 100 → correct result 37575,
result with rounded ties to even → 37575,
result with ties away → 37600,
minus 500 ( number of values ) * 0.1 ( probability of ‘n.5’ ) * 0.5
( ‘unjustified’ rounding up ) → 37575.
Such ‘strong evidence’ fools programmers and users into thinking
IEEE math is somewhat stable, and then they fall into the pit of
unrandom distribution. ‘count cases’ would be much more stable,
and ties away would affect attempts for exact mathematics much less.

It is a nightmare if ‘computer mathematics’ and a ‘standard’ would
dissolve into patchwork carpets like → … round ties behaviour #8750
but it has happened …

To start with, quoting Kahan is ironic :wink:. He was the driving force behind 754, and a tireless advocate for it mandating nearest/even as the default rounding mode. For example, here’s one of his papers that has no real point other than to beat to death how nearest/even produces “the right” result in a toy program where almost any other approach gets it wrong. It’s not a “toy” paper, though - he put real time and brainpower into writing it.

That’s just wrong; e.g.,

>>> from decimal import Decimal as D
>>> D(1) / 3
Decimal('0.3333333333333333333333333333')

The infinitely precise result has an infinitely long decimal representation. decimal correctly rounds it to the current context precision (28).

>>> import decimal
>>> decimal.getcontext().prec
28

I was intimately involved at the time the first 754 standard was released, back in the mid 1980s. I’m quoting from 754-1985:

Section 4.1: “An implementation of this standard shall provide round to nearest as the default rounding mode.”

In the language of standards, “shall” has a precise technical meaning, best paraphrased as “mandatory”. Not a suggestion, not an encouragement, not “a nice to have”, but a non-negotiable requirement. In contrast, e.g., “should” is just an encouragement.

You’re quoting instead from the 754-2008 revision, made over 2 decades later, and about 4 times the length. But you’re misstating its similar clause:

Section 4.3.3 “The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats. The default rounding-direction attribute for results in decimal formats is language defined, but should be roundTiesToEven.”

There’s no similar text in the original 754 because the original didn’t say one syllable about non-binary arithmetic. Note that “shall” again means there is no possible dispute about the required nearest/even default rounding for binary formats. The “should” for decimal formats is unfortunate - that one is just a suggestion. However, note that it’s suggesting the rounding mode you hate even for decimal arithmetic.

But it doesn’t matter, because 754-2008 is largely ignored in favor of the still later IEEE-854 standard, which also incorporates IBM’s General Decimal Arithmetic standard.

That mandates some different “contexts” for decimal arithmetic. The “basic” context requires round-half-up rounding, 9 digits of precision, and enables traps on things like overflow and invalid operation. The “extended” contexts are for “serious” use, and require round-half-even rounding, disable all traps by default, and come in a number of variants depending on how much precision you want.

Python’s decimal module implements all of that, and, far more, imposes no meaningful limit on how much precision you can ask for.

>>> import decimal
>>> decimal.getcontext().prec = decimal.MAX_PREC
>>> decimal.Decimal(1) / 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> decimal.getcontext().prec
999999999999999999

I asked for quadrillions of digits of precision, which is why I ran out RAM for computing 1/3.

For the rest, I don’t see a real point in continuing. You dismiss the SO post I linked to because it’s not “a proof”. True enough! But you have no proof either. Common sense and the vast bulk of empirical evidence aren’t that hard to find on the web. The fellow who wrote the SO post is a long-time valued contributor to Python, NumPy, and SciPy, and has a doctorate in number theory from Harvard. He’s acutely aware of the difference between contriving data to “prove a point” and presenting simple data typical of a point he’s trying to make. His post included the disclaimer “under suitable assumptions on the distribution of the inputs”, but he didn’t belabor it - the post was long enough without it.

If you think his point is not applicable to typical data, how about you find a study demonstrating that half-up delivers better results than nearest-even on any “real world” data set? Not piles of words, but empirical, non-contrived, evidence. I doubt one exists.

In the meantime, sounds like you’d be happiest with decimal’s “basic” context (as noted above, it uses half-up rounding by default). So use it.

>>> import decimal
>>> decimal.setcontext(decimal.BasicContext)
>>> round(decimal.Decimal("0.285"), 2)
Decimal('0.29')
4 Likes

:slight_smile: @Tim Peters,

again many thanks for your effort, and be assured:
I don’t want to ignore any of your hints, if it happens then only reg. space.

Main why I’m arguing here, I started to like python, esp. ‘decimal’,
tested it as surprisingly fast, good work, have buried my prejudices
against an interpreter language.

Reg. Prof. Kahan and his papers,
A.) his IQ is clearly above mine,
B.) what I cited was not ironically but to point out the importance
of the difficulty to teach IEEE idiosyncrasies to thousands of new
people each day.
C.) I’m unsure about his mood in the morning:
Ca.) a smile ‘fooled them 38 years ago and they didn’t yet notice’, or
Cb.) suicidal thoughts like ‘I have blown half the human brain
power by destroying their mathematics’?

Reg. conversion ‘all values’ was meant as ‘all representable e.g. doubles’.

Reg. IEEE: I see some slow tendency to turn away from the ‘old’ idea
‘binary and tiestoeven’ over add. 854 standard, combining 854 and 754
in 2008, and now adding tiestozero in 2019 to help towards human
compatibility in complex calculations on modern machines … which
being slow somehow matches your statement on the influence of
backward compatibility requirements.

‘shall … as default’ acceptable reg. backward compatibility, but doesn’t
mandate application in programs.

I won’t complain about any if ‘ties away’ would be available as alternative,
but don’t yet see that ( except in decimal ).

854 Decimal and following I see as important improvement, but alas neglected
in allday use / programming reg. ‘compatibility’ and a spinace bias as ‘slow’.
Still curious how pythons decimal managed to provide much more power.

‘Proof’: I didn’t explain in long, but ‘count occurrences of n.5 roundings and
calculate a correction’ has better power in avoiding n.5 bias independent of
randomity of input. Does less harm on ‘accurate math’. Despite not solving
‘unrandomity’ in input aside from the n.5 values. ( And then we advance to
Kahan summation )

I did mention two situations where tiestoeven is weak, and demonstrated
a calculated improvement for the example on SO, and can show a little more
wordy, ‘avoiding n.5 bias in summing big data’:
pseudocode:
loop: process next summand applying ties away rounding,
if rounding is n.5 counter +=1, end loop,
subtract counter times 0.5
( see an attempt to code in python at bottom of post )

for the SO example it would count 49 cases, subtract 24.5 from the
deviating result, and stay within 0.5 devia from unrounded summing.

In contrast summing 501 times rounded 1.5 with ties to even would
result in 1002, 250.5 above the unrounded sum of 751.5, alike summing
501 times rounded 2.5 with ties to even will also produce 1002, 250.5
below the unrounded sum 1252.5.

the same with ties away and counting:
1.5 case: result 1002 minus 501 ( counter ) times 0.5 → 751.5, exact,
2.5 case: result 1503 minus 501 ( counter ) times 0.5 → 1252.5, exact.
( it’s not ties away healing something, it’s just that there is much better
solution which doesn’t require ties to even )

Pythons decimal capabilities: as said: impressive, haven’t digged how
it works internally, would like a hint / short description.

We are dealing with limited math, we can’t achieve full real or fraction
coverage, IMHO the challenge is to come near to humans as good as possible.

As a conspiracy theorist, I could speculate that the phone companies
had paid to certify ‘ties to even’ as ‘best possible’, and then made a
brutal profit with masses of ‘odd.5’ bills, but that’s as unlikely as some
light bulb companies conspiring to limit them to 1000 hours of use …

Not explained in your posts: Am I right that ‘754’ demands ‘tiesawayfromzero’
variants since 2008, and that that’s not provided by most languages / systems?

Decimal and python: still learning about differences between ‘import decimal’
and ‘from decimal import *’, need of prefixing ‘decimal.’ or not, setting ‘contexts’,
‘quantize’ vs. ‘round’ and the like, and e.g. your example working, while

from decimal import * 
round(Decimal("0.285"), 2) 
-> Decimal('0.28') 

fools the uninformed … but making progress …

( examples based on positive numbers, generalizing to negative shouldn’t be
a big challenge )

add the following after the SO example you cited:

sum1 = 0 
for i in range( 0, 501 ): 
    sum1 = sum1 + test_values[i] 
print( sum1 )

prints the real sum,

sum2 = 0 
for i in range( 0, 501 ): 
    sum2 = sum2 + round_ties_to_even( test_values[i] ) 
print( sum2 )

prints ties to even rounded summation,

sum3 = 0 
for i in range( 0, 501 ): 
    sum3 = sum3 + round_ties_away_from_zero( test_values[i] ) 
print( sum3 )

prints ‘ties away’ rounded summation,

sum4 = 0 
counter = 0 
for i in range( 0, 501 ): 
    sum4 = sum4 + round_ties_away_from_zero( test_values[i] ) 
    if round_ties_away_from_zero( test_values[i] ) - test_values[i] == 0.5: 
        counter = counter + 1 
print( "biased sum4:     ", sum4 ) 
print( "corrected sum4:  ", sum4 - counter * 0.5 )

prints biased and corrected ‘ties away’ rounded summation.
The difference? the solution with sum4 is ‘stable’ also for datasets
with unbalanced odd.5 / even.5 values.

[edit]

The fellow who wrote the SO post is a long-time valued contributor to Python, NumPy, and SciPy

In no way did I want to diminish Mark Dickinson’s merits, he is the one who helped me with my first steps in python. He is an enthusiastic and friendly contributor just like you, please excuse me if I seemed negative in any way, it was not my intention, I had not even looked at who the SO post was from.

An attempt for a conclusion we can possibly all accept … People need / expect ‘human compatible mathematics’. Computers should try to match that and where not quite possible keep deviations as small as possible. IEEE 754 does not quite achieve this, both ‘ties away’ and ‘to even’ are useful as far as they are used for human compatible results. ‘doubles’ come closer to the goal on average than floats, long doubles closer than doubles, float128’s closer than long doubles, _Decimal32 closer than ??? _Decimal64 closer than _Decimal32, _Decimal128 closer than _Decimal64, and python.decimal is still above that.
[/edit]

This keeps expanding to consume ever-increasing amounts of time, which I can’t give to it. So I’ll reply, in pieces, when I can. I may never get to everything. Here just some easy ones:

“Shall” in standards has nothing directly to do with “backward compatibility”. Indeed, very few systems conformed to IEEE-754 when it was introduced. Apple’s SANE (Standard Apple Numerics Environment) may have been the only conformant implementation. 754 was determined to break “backward compatibility” when it was introduced, demanding that conforming systems support far “better” floating-point behavior than the status quo at the time.

A vendor cannot legally claim a conforming implementation if any “shall” clause isn’t satisfied. Period.

Yes, of course applications can change the defaults, and the standards here explicitly allow for that. But you aren’t most users :wink:. Neither am I. Most users never change any defaults.

754-2008 doesn’t require that a conforming implementation support decimal at all (or binary!). A conforming implementation must implement at least one, and may implement both. roundTiesToAway must be supplied for decimal floats (if decimal is supported at all), but is not required for binary floats. roundTiesToEven must be the default rounding mode for binary, but no default rounding mode is specified for decimal - roundTiesToEven is, however, recommended (“should”) for decimal.

Python’s decimal module intends to conform to the relevant standard for decimal floats. Python itself does not claim to conform to the relevant standards for binary floats (indeed, Python doesn’t even pretend to offer a way to change the rounding mode for binary float arithmetic - and never will before the C language offers such a way Python can use, and all relevant C compilers catch up to that C standard).

Conforming decimal implementations are more common than binary ones, because they’re implemented almost entirely in software (so, e.g., no need to come up with portable ways to change hardware flags).

decimal is a major piece of work, written entirely in C by an expert numerical programmer.(Stefan Krah). It’s actually a Python wrapper around Stefan’s libmpdec C/C++ libraries.

It’s slower for most uses than binary floats, though, because there’s no direct HW support for decimal arithmetic. Internally, decimal works in base 10**19, storing blocks of 19 decimal digits in arrays of unsigned 64-bit native C ints. So its speed is mostly constrained by how fast HW can do 64-bit integer arithmetic, plus a large maze of slow-in-software conditional branches to set all the required status flags and to check for whether various traps have been enabled.

For floats with many (over thousands of) decimal digits, decimal also implements very much faster “advanced” algorithms for multiplication and division (search for “Number Theoretic Transfom”). Indeed, when working on problems with large integers, I sometimes use decimal with high precision instead of bothering with GMP. An unbeatable advantage of decimal in such cases can be that converting a decimal string to/from a decimal object takes time linear in the number of decimal digits. Very fast. Converting to/from a GMP integer can take much longer. Converting to/from a Python integer very much longer (until very recently, Python’s int ↔ decimal string conversions took time quadratic in the number of digits).

Absolutely everything gets ever more complicated if you keep digging :laughing: .

1 Like

Except it’s so different an approach it’s simply irrelevant to the question of which rounding mode can be expected to do better “on average”. You’re keeping a separate counter of how many times “exactly half” had to be rounded away. No hardware on the planet supplies that. But, sure, given that, add-half-up could be reliably corrected. So could a hypothetical “add-half-down” rounding mode.

But if you’re allowing the universe of alternative algorithms, use Python’s math.fsum(xs). That adds everything in xs “as if” to infinite precision, and rounds back to machine precision just once, at the end. Even “in theory” it’s impossible to do better than that, no matter what.

Before he created Python, Guido (van Rossum) worked on the implementation of the ABC programming language. That was intended to be as intuitive as possible for new programmers.

It didn’t have floats at all. Instead it had unbounded-precision rationals. A literal like 3.11 was actually stored as the rational 311/100. Which worked great, until programs got more ambitious: as has been relearned anew by some generations :wink: now, unbounded rationals have a way of consuming all of RAM remarkably quickly.

So ABC eventually added a notion of “approximate” numbers too, which were machine floats under the covers, with all the surprises they’re prone to.

As a result, Python didn’t include rationals at all at its start. fractions.Fraction supplies them now for those who need exact arithmetic regardless of time or memory cost.

More interesting now is Microsoft’s “calculator” app. It too, under the covers, uses unbounded rational arithmetic, although there’s no hint of that in the docs or the user interface. But that’s not programmable, so users generally never realize it can become a massive memory hog. It’s the only implementation they tried that, over the years, didn’t swamp them with “why can’t your stupid calculator do simple arithmetc!?!?!” complaints.

>>> from decimal import Decimal as D
>>> 1 / D(3)
Decimal('0.3333333333333333333333333333')
>>> _ * 3
Decimal('0.9999999999999999999999999999')

If someone is determined to cater to naïve expectations, then they have to agree that result is horribly broken - even children know that 1/3 * 3 is exactly 1. Ironically enough, 754 double floats do give the “right” result:

>>> 1 / 3
0.3333333333333333
>>> _ * 3
1.0

But not always:

>>> 1 / 49
0.02040816326530612
>>> _ * 49
0.9999999999999999

although that one “works” in decimal instead:

>>> 1 / D(49)
Decimal('0.02040816326530612244897959184')
>>> _ * 49
Decimal('1.000000000000000000000000000')
>>> _.normalize()
Decimal('1')

Pick your poison :joy:.

… Wait, it’s possible to do better than this? Doesn’t it require repeated long division by 10?

When the values are very large, fancier algorithms can enjoy asymptotic behavior more related to the asymptotics of giant-int multiplication. As of this commit, giant string->int enjoys asymptotics inherited from CPython’ s Karatsuba multiplication, giant int->string from the even faster decimal module’s NTT multiplication.

Note that the new conversion code is written in Python. The asymptotics are so much better that recoding them in C would only yield a relatively small additional improvement. Not worth the complexity.

Here are some timings. Here under the released Python 3.11.3, which does not have the new code (64-bit on Windows 10):

$ py -m timeit -s "import sys; sys.set_int_max_str_digits(0); s = '9' * 1000000" "int(s)"
1 loop, best of 5: 3.99 sec per loop # str -> int

$ py -m timeit -s "import sys; sys.set_int_max_str_digits(0); i = 10**1000000-1" "str(i)"
1 loop, best of 5: 14.3 sec per loop # int -> str

And the same under current main branch (3.13.0a0):

$ py -m timeit -s "import sys; sys.set_int_max_str_digits(0); s = '9' * 1000000" "int(s)"
1 loop, best of 5: 617 msec per loop # str -> int

$ py -m timeit -s "import sys; sys.set_int_max_str_digits(0); i = 10**1000000-1" "str(i)"
1 loop, best of 5: 499 msec per loop # int -> str

Both are major speedups, although int → str benefits a lot more because decimal’s NTT-based giant multiplication is more effective than CPython’s Karatsuba giant *.

What single runs can’t show is how the time changes as input size changes. Left as an exercise for the reader :wink:.

1 Like

‘time consuming’

agree, continue since I’m learning, we get things to the
point, and hope it helps others who have the patience to read,

‘shall … backward compatibility’

meant compatibility of ‘the standard’ to it’s prior versions,

roundTiesToAway must be supplied for decimal floats
(if decimal is supported at all), but is not required for binary
floats.

see that in clause 4 for rounding attributes, but not in
clause 5 for operations.

Stefan Krah, libmpdec

thank you, valuable pointer, hope will improve my knowledge,

For floats with many (over thousands of) decimal digits

that’s out of my scope, I - and I think ‘most users’ - am / are fully satisfied
with the precision doubles provide, what is difficult is imprecision in that
field and in consequence harmed math logic.

Absolutely everything gets ever more complicated if you keep digging

have the good feeling structures are becoming clearer …

point with @kknechtel about big ints:

sure comparing

10**1000000-1" "str(i)"
1

to

10**000000-1" "str(i)"
1  

is a god idea?

So could a hypothetical “add-half-down” rounding mode.

yes, also but with more effort ‘ties to even’, but only ‘on the fly’, not in
post processing. ‘hypothetical’ - heard rumors from Sylvie Boldo that such
practical is implemented in ‘2019’ https://hal.science/hal-02137968v4/document

Guido (van Rossum)'s ideas

well done, they started something good,

machine floats … with all the surprises they’re prone to.

nicely expressed :slight_smile:

the Decimal 1/3 problem

is common for humans, as it also happens with paper and pencil
and the whole space of the universe to write. The 1/49 case holds in
Decimals because we have rounding for the result of the division,
and for the multiplication,

0.02040816326530612244897959184

is a rounded up representation of

0.0204081632653061224489795918367...  

fractional math has more power than reals.
instead of going to fractions or the exotic iterations Prof. Kahan
is dealing with … ‘most users’ don’t deal with such, and ‘most users’
won’t even notice deviations in 10. decimal of unsmooth numbers.
What we started with was the rounding of simple values contrary to
humans common use, and if that is necessary / helpful in summing
big amounts of such.
I’d say for summing we have better solutions, and the ‘to even’ rounding
injects additional irritations for ‘normal users’. Thus would like to be
able to work without.
Or as a generalized ‘want to’: We have limited resources to
implement an approximation to ‘infinte math’, thus problems with
extreme values and ‘corner cases’ are unavoidable. But we should,
if possible, avoid deviations / differences in the central region of
simple figures and basic arithmetic ( simple in the decimal scope of
simple minded useres ).

I’m not following. Why would you expect the section on operations to say anything about rounding? Standards in general do not repeat information, so sections cannot in general be properly understood in isolation. The section on rounding already said roundTiesToAway may not be available for binary floats, and it would be strange if the section on operations did repeat that.

Good catch! I edited the post to repair that. The actual second operation there (with the zero exponent) takes nanoseconds, not milliseconds. l must have hit a delete key after copying the screen-scrape, mistakenly erasing the leading 1.

To be fair, I was also annoyed the first time I saw 2.5 round down to 2. A difference is that I got over it :stuck_out_tongue_winking_eye:. In pie-in-the-sky mode, I’d be in favor of changing elementary education so that children weren’t taught an inherently biased rounding method to begin with. “Because that’s the way I was taught” is an enemy of progress in many areas.

To be fair, I was also annoyed the first time I saw 2.5 round down to 2.

Thanks for that, it’s saying that I’m not more an id… than you had been,
and / or justifies doubts in the rationale behind ties to even in the
multifactorial field of compatibility to ‘C’, human use, other programming
languages, spreadsheets …

In pie-in-the-sky mode, I’d be in favor of changing elementary education so that children weren’t taught an inherently biased rounding method to begin with. “Because that’s the way I was taught” is an enemy of progress in many areas.

That comes near to let them read Kahan, Knuth, Muller, Goldberg …
before using a desk calculator? They are already hopelessly
overwhelmed with the differences between fractions, decimals
and percentages. As Prof. Kahan stated: adapt computers to
people, not the other way. Not all computer users, nor even
programmers share your level of skills and insight.

Or: ‘changing education’? - try it :wink: using words from Morten
Welinder: ‘That’s a completely crazy thing to do and anyone
who tries deserves whatever comes out.’

If any use of tiestoeven I’d argue to teach / offer ties away and
additional ties to even, considering that ties away has a rational,
bijective and reversible relation between equally sized float
ranges and integers:
0.5 … 1.49999999[9]… ↔ 1,
1.5 … 2.49999999[9]… ↔ 2,
2.5 … 3.49999999[9]… ↔ 3,
whereas tiestoeven hasn’t:
0.50000000[0]1 … 1.49999999[9]… ↔ 1,
1.5 … 2.5 ↔ 2,
2.50000000[0]1 … 3.49999999[9]… ↔ 3,
3.5 … 4.5 ↔ 4,
5.50000000[0]1 … 5.49999999[9]… ↔ 5,
even more significant at:
rounding tiestoaway:
4500000000000001.5 \
4500000000000002.0 / → 4500000000000002

4500000000000002.5 \
4500000000000003.0 / → 4500000000000003

4500000000000003.5 \
4500000000000004.0 / → 4500000000000004

4500000000000004.5 \
4500000000000005.0 / → 4500000000000005
rounding tiestoeven:
4500000000000001.5 \
4500000000000002.0 → 4500000000000002
4500000000000002.5 /

4500000000000003.0 ↔ 4500000000000003

4500000000000003.5 \
4500000000000004.0 → 4500000000000004
4500000000000004.5 /

4500000000000005.0 ↔ 4500000000000005

As stated: no problems with tiestoeven as long as tiesaway
is available as alternative for people or math which need it.
Completely suppressing tiesaway is … … evil!
Such people should try pouring two rounded 2 litre pots into a
rounded 4 litre pot.

Or, sometimes I can become persistent:
humans do it,
‘C’ does it,
python2 did it,
Matlab does it,
.NET (C#, F#, VB) offer it,
Fortran does it,
Ruby does it,
Rust does it,
Pascal (ISO) does it,
PHP, OpenCL, Ada do it,
Applescript offers it,
SAS, Postscript, Cobol do it,
D, Tcl, smalltalk, SQL do it,
WolframAlpha offers it,
Swift offers it,
python decimal does it,
only python3 excludes it?

That’s getting silly. Nearest/even rounding is not a difficult concept. If you had been taught it from the start, I wager you would be just as adamant about that seeing a system round 2.5 to 3 was a gross affront to all that was good and decent :wink:.

You quoted Kahan near the start, but I wonder whether you know the context. It’s from this 56-page paper.

The thrust of that paper is that programmers are, in general, becoming ever worse at dealing with rounding errors:

As the population of computer programmers has grown, proficiency in rounding-error analysis
has dwindled.

He then goes on to prove that point with example after example of software failures due to numerically naïve software failing to deal with rounding errors adequately.

Nowhere does he get anywhere near suggesting that it would help to put the biased nearest/up rounding mode into a 754 successor. Because it wouldn’t help with any of the many examples he presents.

The real problem is that hardware and software environments haven’t evolved to help people debug numeric code. His paper analyzes all the approaches that are available, and finds them all lacking. The best practical approach he has for now is to rerun a program forcing one of 754’s directed rounding modes, and trace back to where results from that start to diverge from the nearest/even results. As he demonstrates, while clumsy, it’s surprisingly effective across many real-life examples of fatally flawed numeric algorithms. If results vary a whole lot after such a tiny change in rounding method, the code’s results are probably garbage.

Rerunning using higher precision may have even more diagnostic power, but fails on practicality because, even if HW does support efficient higher precision arithmetic, programming languages typically don’t provide an easy way to say “OK, run everything that was done in double precision with double-extended precision instead”. And without HW support, emulating more precision in software may be impractically slow.

Those are things I care about. Note that Python’s decimal module makes it very easy to change rounding mode and precision. Developing numeric algorithms using decimal can be a delight!

As I said, “In pie-in-the-sky mode”. I have no more expectation that it will happen than you should have that Python will change to support nearest/up rounding in any context not absolutely required by a standard :wink:.

In the meantime,

def rintnu(x):
    "Return float x rounded to an int, resolving ties away from 0."
    if x < 0.0:
        return -rintnu(-x)
    return int(x + 0.5)

rintnu(0.49999999999999994) → 1 ,
rintnu(4503599627370496 + 0.5) → 4503599627370496

1 Like

Three-quarters credit :wink:.

It’s impossible to “repair” the second example, unless you change float addition first.

>>> 4503599627370496 + 0.5 == 4503599627370496
True

That is, the 0.5 was lost to rounding during addition, before the function was called. So it goes. The same will happen with any function in any language supporting 754 doubles, unless you find a way to force the system to use to-plus-infinity rounding for the addition.

The first example is indeed “a bug” in the function I posted. Here’s a different version that doesn’t use float arithmetic, so can’t be affected by the current float rounding mode:

from math import modf

def rintnu(x):
    "Return float x rounded to an int, resolving ties away from 0."
    if x < 0.0:
        return -rintnu(-x)
    f, i = modf(x)
    i = int(i)
    if f >= 0.5:
        i += 1
    return i

>>> rintnu(0.49999999999999994)
0
>>> rintnu(0.5)
1

‘three-quarters’ … is that good or bad?
for the first example

return int( x + 0.49999999999999994 )

will do, for the second one: that’s what I dislike in IEEE 754 /
it’s actual interpretation, ‘ties to even anchored deep in the gearbox,
without alternatives, and hard to oversteer in software’.