Factorial optimization

I have a function that takes two numbers (the first is greater than the second) and returns the result of dividing their factorials.

import math
def factorial_division(n, d):
    return math.factorial(n) / math.factorial(d)

However, with large quantities, it works for a very long time because of math.factorial(n).
Is there a way to optimize this?

The factors of the smaller factorial in the denominator cancel out those in the larger factorial in the numerator, so you only need to calculate the product of the remaining factors in the numerator.

1 Like

@blhsing I followed your advice and created the following function:

def factorial_division(n, d):
    ans = 1
    ln = []
    ld = []
    for i in range(n): # list of n multipliers
        ln.append(n - i)
    for i in range(d): # list of d multipliers
        ld.append(d - i)
    for i in ld: # removing the same elements
        ln.remove(i)
    for i in ln: # and multiply
        ans *= i
    return ans

But it’s still slow.
How can I do it faster?

import numpy as np
def f(x,y):
    return np.prod(np.arange(y+1,x+1))

@petercordia Briefly, but:

import numpy as np
def f(x,y):
    return np.prod(np.arange(y+1,x+1))
print(f(500, 30))

print 0.

Fair.

import numpy as np
def f(x,y):
    return np.prod(np.arange(y+1,x+1, dtype=np.float64))

is slightly better. Can also be written as

import numpy as np
def f(x,y):
    return np.arange(y+1,x+1).prod(dtype=np.float64)

There is a very important pattern in factorials. It isn’t a coincidence that there are a lot of the same elements.

A lot depends on the size of the arguments to your function, and the accuracy you require. If they are large, and far apart, and you only need moderate accuracy, you can use Stirling’s Approximation. Alternately, you can just use the Gamma function which is trivially related to the factorial.

(And even if you are going to go the route of explicit multiplication, you should almost always calculate the log-factorial by adding up the logarithm of the factors, rather than multiplying the factors themselves. Similarly, it is often a good idea to use the log-gamma function.)

1 Like

You could try math.perm:

def factorial_division(n, d):
    return float(math.perm(n, n-d))
1 Like

How large are your actual n and d?

Like 6.08606003260358e+100

Which one of the two is that, and what’s the other?

For example n = 2230505856425722403 and d = 2230505856425722365.

If you know that the distance between them is somewhat small, then you can just multiply the integers between them

import math
math.prod(range(2230505856425722365 + 1, 2230505856425722403 + 1))

Did you want a float representative of the result, or an int?

In this case the float would be float('inf') while the int would be

17351138597226293352630193935665609594693978581996498781843019433078963027860489317612178572279296282556948008308450983789210463173334229571315086962509040932155200507618870712043548430561950788261692250464289134388486635536666564908417332515148578295039482470616499986575509864916249532872327205353234164677571012766394131744290828469332114312611766044284905128640843598683647829878418330776003244032794141636915677820604912776872518559455231692123159366281106422219638122840647095448201676717904639972497190201538542268643416837206517701093840322733676316941180912257933682401688265844902076608703084793987212327675008079248381690481566503581722166765264410829545064918974514741683159040000000000
1 Like

You should use int for convenience.

prod = 1
for i in range(2230505856425722365+1, 2230505856425722403+1):
    prod*=i
prod

gives an exact answer instantly.

prod = 1.0
for i in range(2230505856425722365+1, 2230505856425722403+1):
    prod*=i
prod

returns inf.

Because Python automatically creates ints consisting of many bytes there is no largest possible int. But floats can cap out.

Though for all practical purposes
2230505856425722403**38 should also be good enough.

@franklinvp

13.6 μs  math.prod(range(d+1, n+1))
10.3 μs  math.perm(n, n-d)
13.1 μs  math.prod(range(d+1, n+1))
10.3 μs  math.perm(n, n-d)
13.3 μs  math.prod(range(d+1, n+1))
10.9 μs  math.perm(n, n-d)

Benchmark script:

from timeit import repeat

s = '''
import math
n = 2230505856425722403
d = 2230505856425722365
'''

es = [
    'math.prod(range(d+1, n+1))',
    'math.perm(n, n-d)',
]

for e in es * 3:
    t = min(repeat(e, s, number=1000)) / 1000
    print(round(t*1e6, 1), 'μs ', e)

Attempt This Online!

Just to give some background understanding:

The attempt here:

is slow because it spends a lot of time making the lists and then searching them to remove the values. There’s no point to this, because we know ahead of time which values will cancel out: all the ones from d downward. (It’s even worse because ln.remove doesn’t know that the values are sorted, so it has to keep searching the entire list to find the 1, 2 etc. which were put at the end of the list by the previous append loops. It would have been faster to append the values in ascending order - but still a lot slower than needed, because when you remove the values from the start of the list, everything else has to shift down in memory.

Therefore, we should only consider the values greater than that, in the first place - by changing the bounds of the loop. But also, we don’t need to make a list of numbers to multiply. We can just multiply them, as we find them:

def factorial_division(n, d):
    ans = 1
    for i in range(d+1, n+1):
        ans *= 1
    return ans

The math module provides a faster implementation of factorial than naively multiplying all the numbers together.

The fact that it’s written in C doesn’t matter very much, because it still has to work through the Python data structures - and more importantly, the actual math is the bottleneck once the numbers get large. If you use a naive loop in Python, most of the time is spent in the C implementation of the multiplication operator. Instead, math speeds things up by factoring out powers of two, which produces some overlapping results, and reusing common work.

For a small example, suppose we compute 16!:

16 * 15 * 14 * 13 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1

First, factor out a power of two from each even number, and group the values where we did that:

= 8 * 15 * 7 * 13 * 6 * 11 * 5 * 9 * 4 * 7 * 3 * 5 * 2 * 3 * 1 * 1 * 28

= (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1) * (8 * 7 * 6 * 5 * 4 * 3 * 2 * 1) * 28

Notice the second group is just 8!. So if we iterate this process, eventually we get:

= (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1) * (7 * 5 * 3 * 1) * (3 * 1) * (1) * 215

The (3 * 1) result can be reused when computing (7 * 5 * 3 * 1), which can be reused when computing (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1). And the factors of two can be multiplied all at once, by a simple bit-shift.


However, these kinds of optimizations don’t work so neatly for permutations and combinations. The products of all the odd numbers don’t telescope nearly so neatly when you don’t multiply all the way down to 1. And sometimes these kinds of optimizations are just low priority. Up through Python 3.10, math.perm simply multiplied in a loop, like the Python code I showed above.

Starting in 3.11, it… does much more complicated things. There’s a long thread on the issue tracker here, describing the changes:

It’s written about math.comb (i.e, combinations rather than permutations), but many of the optimizations are shared. Some of them even apply to math.factorial, because there’s now a precomputed table of small odd-factorial results.

I tried the benchmarks mentioned in the OP there (on slightly slower hardware :slight_smile: ). From 3.8 to 3.12, the “just calculate factorials and divide” approach (for 100000 choose 50000) already takes about 30% less time than before, which is impressive. The implementation used by math.comb is now faster than the naive factorial division approach, instead of being slower. Overall it’s now more than ten times as fast for these values (and it’s an algorithmic improvement, so it only gets better for larger inputs). And up to the factorial of (on 64-bit builds) 20 is a table lookup now. (This doesn’t improve things very much for those inputs, because when things are this fast, the time is dominated by sanity checks and working through the C-Python API. But I assume it matters a lot for the algorithm internally.)

2 Likes

Not really, as both lists are descending. So it finds not 1,2,etc but d,d-1,etc. And with their example where d is close to n, those are almost at the start.

True, I didn’t quite think that through :slight_smile: