Dividing large integers

Just a simple division using large integers is not always accurate. For example, 3802951800684688204490109616126/2 gives 1901475900342344102245054808064 (using print(int(3802951800684688204490109616126/2)), which is close but not quite right. And sometimes it just gives 1.9014…x10e30.

So, I tried the module fractions, thus:

from fractions import Fraction
result = Fraction(3802951800684688204490109616126, 1) * Fraction(1, 2)
print(result)
# Show the exact numerator and denominator
print(result.numerator)
print(result.denominator)

with the result

1901475900342344102245054808063
1901475900342344102245054808063
1

That serves my purpose (working on the Collatz Conjecture).

I could have used floored division (//) but I don’t trust my code enough to be absolutely sure that I am dividing an even number. (I am using a protective test on the denominator being 1, just in case.)

Is there another idea that might also be useful? My current work is up to 2^100, but I’ll be doing checks at much larger indices later.

“int / int” in Python returns “a flioat”, a machine floating-point number, rounded to 53 bits of precision. It’s not at all suitable for exact work with integers.

Good! The Fraction type implements exact rational arithmetic, with no upper bound on precision (although may take a great deal of memory to meet that promise).

More words? I don’t know what that means. Regardless, floor division is also exact. You can use divmod() to get the exact quotient and remainder at once:

quo, rem = divmod(top, bot)

After which, bot * quo + rem == top will be true.

If you expect that top is always a multiple of bot (I can’t guess from here), then you can verify it by adding assert not rem after.

Rest assured that 2**100 is tiny by bigint standards :wink:. Python will continue to work fine even if you need million-bit ints. Although to display them, it’s much faster to print hex(i) than i.. They’re represented internally in a power-of-2 base, and converting to decimal is not cheap. Converting to a different power of 2 (like hex = 2**4) base is cheap.

2 Likes

Sorry, I missed this.

Maybe you’ll have to check % 2 before the divide? Is there a reason
this is impractical?

Just a simple division using large integers is not always accurate. For
example, 3802951800684688204490109616126/2 gives
1901475900342344102245054808064 (using
print(int(3802951800684688204490109616126/2)), which is close but not
quite right. And sometimes it just gives 1.9014…x10e30.

This is because / produces a float. Floats have limit precision.
The // (divide rounding down) produces an int.

Look:

 >>> 3802951800684688204490109616126/2
 1.901475900342344e+30
 >>> 3802951800684688204490109616126//2
 1901475900342344102245054808063

So the loss of precision is the intermediate float result.

I don’t know the implementation of %. Is it like a / so derives a float, or is it a true integer implementation. Having got caught by /, I am not sure what to trust. I only learned of // today (fraction module was yesterday!).

If all you’re doing really is verifying the Collatz conjecture naively/ by brute force (by iteratively applying the Collatz map to a starting integer until it reaches 1), then the only denominator of interest is 2.

FYI, dividing a positive even number by 2 is a right shift (x >> 1). Maybe there’s a classic bit twiddling recipe to keep doing that until the 0th bit is a 1 (or otherwise drop all trailing zeros) (e.g. while x and not x % 2: x >>= 1). It probably works most efficiently on fixed width integers that fit into CPU registers though, I don’t know what’s optimal for specialised bigint stuff.

When you work with Python’s int type it is only / that you need to be careful of since division of integers gives floats. The // and % operators are exact integer operations that work exactly with arbitrarily large integers.

Another operator to be careful of is x ** n if n could ever be negative but I assume you don’t need that. Apart from ** and / all operations with int will always give exact results.

1 Like

I imagine that you don’t know yet how to install Python packages so this might be more trouble than it is worth but you may also be interested in using the fmpz type from the python-flint package. With fmpz division is always exact or it fails with an error:

In [1]: import flint

In [2]: a = flint.fmpz(6)

In [3]: a
Out[3]: 6

In [4]: a / 3
Out[4]: 2

In [5]: a / 4
---------------------------------------------------------------------------
DomainError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 a / 4

File flint/types/fmpz.pyx:291, in flint.types.fmpz.fmpz.__truediv__()

DomainError: fmpz division is not exact

You would still have to be careful to make sure that you always use the fmpz type though and don’t accidentally use int instead.

Some operations with fmpz are also a lot faster than int when working with very large integers but I don’t imagine that is relevant in your case (and small integers are actually slower).

If you did want to install python-flint then it is e.g.:

pip install python-flint

I pointed you at Python’s divmod() before. For ints i and j,

divmod(i, j)

returns the 2-tuple

(i // j, i % j)

Both parts of that are exact.

If you want to verify that i is an exact integer multiple of j:

q, r = divmod(i, j)
assert not r

That is, you just check that the remainder is 0. You could spell that as

assert r == 0

instead, if you like.

Tim, many thanks for your responses. Now, I can add divmod() to my python toolkit. I have learnt something new.
Totally as an aside: I first used a divmod function using FORTH on an 8-bit CPU in 1979 (and a few times since).

PS. BACKGROUND
I’ve already satisfied myself that the Collatz Conjecture is unprovable when using binary subdivision of the set of positive integers (ie using mod2^k). A side-effect is that the Collatz sequence of any given positive integer can be shown to always reach a last term that is less than the starting term, sooner or later. My current experiment is looking at the worst case at each binary subdivision level to see how long it takes before the sequence loops back to less than the start of the sequence. So, just looking at the first 100 subdivisions is going to take a while. I am up to k=68, and the modulus of the set in the Collatz sequence reaches at least 80 decimal digits. Result: The worst case at k=68 takes another 140 subdivisions before it loops back to less than starting number.

1 Like

Are you sure that’s phrased right? Suppose it were true. We know that the Collatz conjecture is true for lots of ints. Take the largest int L we know of such that the conjecture is true for all ints <= L.

Then consider L+1. By your claim, it will eventually reach a term < L+1, which is at most L+1-1 = L. But we already know all ints <= L reach 1, so L+1 must eventually reach 1 too, because it does reach some int known to reach 1. So L+1 also reaches 1.

So by generalized induction, you’ve proved the conjecture for all ints.

Or you didn’t say what you meant, or I didn’t understand what you wrote :smiley:

2 Likes

Yes, you are totally correct, and that was the basis of my first attempt at a formal paper. BUT, in the process of eventually reaching a term <L+1, there are subdivisions: each subdivision produces TWO sets, one with the same residue as the starting set, and the other with a larger value (actually the sum of the original residue and the original modulus). And that second set needs to be resolved, which it will, but then there are even larger residue sets, and so on, ad infinitum. In fact there are many such sets, but the largest residue is the furthest from looping back.

So, for any given integer, the Collatz Conjecture can be shown to be true, but creates at least one set that has a bigger integer (at least when using binary subdivision of the positive integers). It is what I left out that caught you out - and every one I know, including me. I’ve got a paper in review, but the essential conclusion is not exactly earth-shattering. At best, a bit of a refinement to what Terras said in 1976, and what Tao has said several times since.

1 Like

Cool! That’s clearer - thanks.

Long ago I tried a symbolic approach. Like if we start with 2*i, it goes to i, so already smaller. Ignore it. If we start with 2*1+1, it goes next to 6*i+4, and then to 3*i+2. Larger. So split that into 2 cases: i = 2*j or i = 2*j + 1, And so on,.

Most branches eventually reached an int provably less than the start, but there was always at least one other branch that kept splitting and splitting.

It was fascinatingly frustrating, and allowed me to weed out testing many values by “brute force” in a different program, but never reached a satisfying conclusion.

Haven’t thought about it in years. It is possible to break Collatz addiction :rofl:

1 Like

My breakthrough was looking at the branch that keeps splitting. I only heard of the Collatz Conjecture about 3 years ago. I looked at it for 6 months or so, then moved on. Only went back to it when I had to have a hip replacement revised (took 3 goes) and have been on crutches for 8 months - so what to do. Why not Collatz? It is a great puzzle, and I love puzzles.

PS Is it possible to implement a 1-bit right-shift on a bigint to implement a div2?

2 Likes

Sure!

bigint >> i

is equivalent to

bigint // 2**i

but generally runs quicker. This includes negative bigints! Python strives to create the illusion that bigints use an infinitely wide 2’s-complement representation.

So, e.g., -1 >> 1 is -1. It never “runs out of sign bits”

However, ints are immutable in Python: every operation creates a new int object; they’re never mutated in-place. That can become a time and/or space problem with very large ints (like millions of bits). I doubt that will become an issue for you, but if it does, come back and ask. A popular extension package does offer a form of mutable giant ints.

1 Like

The old time/space tradeoff. Mmm…a multiply by 3 is left-shift and add. So, there’s some fun to try. Yeah, I know, another rabbit-hole. Thanks for help.

But always time things! This can be tricky. Multiplying by 3 only creates one new int object, but left-shift and add creates two. Every basic operation has to crawl over the entire int, and multiplication by a small int is very fast on modern boxes. The overhead of crawling over many bits more than once can overwhelm the small cycle savings of “faster” primitive operations.

Long ago, in the Python implementation code for dict lookups, I spelled

5*j + 1

as

(j << 2) + j + 1

This was on basic native machine C ints. The latter was faster at the time. But now the former is faster. Compilers effectively rewrite the multiplication away, and use superscalar architectures to compute 4*j and j+1 simultaneously, and optimize the multiplication away by using some form of “load effective address” instruction.

“Premature optimization is the root of all evil” has always been good advice :smile:.

1 Like