# Factorial optimization

I have a function that takes two numbers (the first is greater than the second) and returns the result of dividing their factorials.

``````import math
def factorial_division(n, d):
return math.factorial(n) / math.factorial(d)
``````

However, with large quantities, it works for a very long time because of `math.factorial(n)`.
Is there a way to optimize this?

The factors of the smaller factorial in the denominator cancel out those in the larger factorial in the numerator, so you only need to calculate the product of the remaining factors in the numerator.

1 Like

``````def factorial_division(n, d):
ans = 1
ln = []
ld = []
for i in range(n): # list of n multipliers
ln.append(n - i)
for i in range(d): # list of d multipliers
ld.append(d - i)
for i in ld: # removing the same elements
ln.remove(i)
for i in ln: # and multiply
ans *= i
return ans
``````

But itās still slow.
How can I do it faster?

``````import numpy as np
def f(x,y):
return np.prod(np.arange(y+1,x+1))
``````

@petercordia Briefly, but:

``````import numpy as np
def f(x,y):
return np.prod(np.arange(y+1,x+1))
print(f(500, 30))
``````

print `0`.

Fair.

``````import numpy as np
def f(x,y):
return np.prod(np.arange(y+1,x+1, dtype=np.float64))
``````

is slightly better. Can also be written as

``````import numpy as np
def f(x,y):
return np.arange(y+1,x+1).prod(dtype=np.float64)
``````

There is a very important pattern in factorials. It isnāt a coincidence that there are a lot of the same elements.

A lot depends on the size of the arguments to your function, and the accuracy you require. If they are large, and far apart, and you only need moderate accuracy, you can use Stirlingās Approximation. Alternately, you can just use the Gamma function which is trivially related to the factorial.

(And even if you are going to go the route of explicit multiplication, you should almost always calculate the log-factorial by adding up the logarithm of the factors, rather than multiplying the factors themselves. Similarly, it is often a good idea to use the log-gamma function.)

1 Like

You could try `math.perm`:

``````def factorial_division(n, d):
return float(math.perm(n, n-d))
``````
1 Like

How large are your actual n and d?

Like `6.08606003260358e+100`

Which one of the two is that, and whatās the other?

For example `n = 2230505856425722403` and `d = 2230505856425722365`.

If you know that the distance between them is somewhat small, then you can just multiply the integers between them

``````import math
math.prod(range(2230505856425722365 + 1, 2230505856425722403 + 1))
``````

Did you want a `float` representative of the result, or an `int`?

In this case the `float` would be `float('inf')` while the `int` would be

``````17351138597226293352630193935665609594693978581996498781843019433078963027860489317612178572279296282556948008308450983789210463173334229571315086962509040932155200507618870712043548430561950788261692250464289134388486635536666564908417332515148578295039482470616499986575509864916249532872327205353234164677571012766394131744290828469332114312611766044284905128640843598683647829878418330776003244032794141636915677820604912776872518559455231692123159366281106422219638122840647095448201676717904639972497190201538542268643416837206517701093840322733676316941180912257933682401688265844902076608703084793987212327675008079248381690481566503581722166765264410829545064918974514741683159040000000000
``````
1 Like

You should use int for convenience.

``````prod = 1
for i in range(2230505856425722365+1, 2230505856425722403+1):
prod*=i
prod
``````

``````prod = 1.0
for i in range(2230505856425722365+1, 2230505856425722403+1):
prod*=i
prod
``````

returns inf.

Because Python automatically creates ints consisting of many bytes there is no largest possible int. But floats can cap out.

Though for all practical purposes
`2230505856425722403**38` should also be good enough.

@franklinvp

``````13.6 Ī¼s  math.prod(range(d+1, n+1))
10.3 Ī¼s  math.perm(n, n-d)
13.1 Ī¼s  math.prod(range(d+1, n+1))
10.3 Ī¼s  math.perm(n, n-d)
13.3 Ī¼s  math.prod(range(d+1, n+1))
10.9 Ī¼s  math.perm(n, n-d)
``````

Benchmark script:

``````from timeit import repeat

s = '''
import math
n = 2230505856425722403
d = 2230505856425722365
'''

es = [
'math.prod(range(d+1, n+1))',
'math.perm(n, n-d)',
]

for e in es * 3:
t = min(repeat(e, s, number=1000)) / 1000
print(round(t*1e6, 1), 'Ī¼s ', e)
``````

Attempt This Online!

Just to give some background understanding:

The attempt here:

is slow because it spends a lot of time making the lists and then searching them to remove the values. Thereās no point to this, because we know ahead of time which values will cancel out: all the ones from `d` downward. (Itās even worse because `ln.remove` doesnāt know that the values are sorted, so it has to keep searching the entire list to find the `1`, `2` etc. which were put at the end of the list by the previous `append` loops. It would have been faster to append the values in ascending order - but still a lot slower than needed, because when you remove the values from the start of the list, everything else has to shift down in memory.

Therefore, we should only consider the values greater than that, in the first place - by changing the bounds of the loop. But also, we donāt need to make a list of numbers to multiply. We can just multiply them, as we find them:

``````def factorial_division(n, d):
ans = 1
for i in range(d+1, n+1):
ans *= 1
return ans
``````

The `math` module provides a faster implementation of factorial than naively multiplying all the numbers together.

The fact that itās written in C doesnāt matter very much, because it still has to work through the Python data structures - and more importantly, the actual math is the bottleneck once the numbers get large. If you use a naive loop in Python, most of the time is spent in the C implementation of the multiplication operator. Instead, `math` speeds things up by factoring out powers of two, which produces some overlapping results, and reusing common work.

For a small example, suppose we compute 16!:

16 * 15 * 14 * 13 * 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1

First, factor out a power of two from each even number, and group the values where we did that:

= 8 * 15 * 7 * 13 * 6 * 11 * 5 * 9 * 4 * 7 * 3 * 5 * 2 * 3 * 1 * 1 * 28

= (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1) * (8 * 7 * 6 * 5 * 4 * 3 * 2 * 1) * 28

Notice the second group is just 8!. So if we iterate this process, eventually we get:

= (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1) * (7 * 5 * 3 * 1) * (3 * 1) * (1) * 215

The (3 * 1) result can be reused when computing (7 * 5 * 3 * 1), which can be reused when computing (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1). And the factors of two can be multiplied all at once, by a simple bit-shift.

However, these kinds of optimizations donāt work so neatly for permutations and combinations. The products of all the odd numbers donāt telescope nearly so neatly when you donāt multiply all the way down to 1. And sometimes these kinds of optimizations are just low priority. Up through Python 3.10, `math.perm` simply multiplied in a loop, like the Python code I showed above.

Starting in 3.11, itā¦ does much more complicated things. Thereās a long thread on the issue tracker here, describing the changes:

Itās written about `math.comb` (i.e, combinations rather than permutations), but many of the optimizations are shared. Some of them even apply to `math.factorial`, because thereās now a precomputed table of small odd-factorial results.

I tried the benchmarks mentioned in the OP there (on slightly slower hardware ). From 3.8 to 3.12, the ājust calculate factorials and divideā approach (for 100000 choose 50000) already takes about 30% less time than before, which is impressive. The implementation used by `math.comb` is now faster than the naive factorial division approach, instead of being slower. Overall itās now more than ten times as fast for these values (and itās an algorithmic improvement, so it only gets better for larger inputs). And up to the factorial of (on 64-bit builds) 20 is a table lookup now. (This doesnāt improve things very much for those inputs, because when things are this fast, the time is dominated by sanity checks and working through the C-Python API. But I assume it matters a lot for the algorithm internally.)

2 Likes

Not really, as both lists are descending. So it finds not 1,2,etc but d,d-1,etc. And with their example where d is close to n, those are almost at the start.

True, I didnāt quite think that through