The factors of the smaller factorial in the denominator cancel out those in the larger factorial in the numerator, so you only need to calculate the product of the remaining factors in the numerator.

@blhsing I followed your advice and created the following function:

def factorial_division(n, d):
ans = 1
ln = []
ld = []
for i in range(n): # list of n multipliers
ln.append(n - i)
for i in range(d): # list of d multipliers
ld.append(d - i)
for i in ld: # removing the same elements
ln.remove(i)
for i in ln: # and multiply
ans *= i
return ans

A lot depends on the size of the arguments to your function, and the accuracy you require. If they are large, and far apart, and you only need moderate accuracy, you can use Stirlingās Approximation. Alternately, you can just use the Gamma function which is trivially related to the factorial.

(And even if you are going to go the route of explicit multiplication, you should almost always calculate the log-factorial by adding up the logarithm of the factors, rather than multiplying the factors themselves. Similarly, it is often a good idea to use the log-gamma function.)

from timeit import repeat
s = '''
import math
n = 2230505856425722403
d = 2230505856425722365
'''
es = [
'math.prod(range(d+1, n+1))',
'math.perm(n, n-d)',
]
for e in es * 3:
t = min(repeat(e, s, number=1000)) / 1000
print(round(t*1e6, 1), 'Ī¼s ', e)

is slow because it spends a lot of time making the lists and then searching them to remove the values. Thereās no point to this, because we know ahead of time which values will cancel out: all the ones from d downward. (Itās even worse because ln.remove doesnāt know that the values are sorted, so it has to keep searching the entire list to find the 1, 2 etc. which were put at the end of the list by the previous append loops. It would have been faster to append the values in ascending order - but still a lot slower than needed, because when you remove the values from the start of the list, everything else has to shift down in memory.

Therefore, we should only consider the values greater than that, in the first place - by changing the bounds of the loop. But also, we donāt need to make a list of numbers to multiply. We can just multiply them, as we find them:

def factorial_division(n, d):
ans = 1
for i in range(d+1, n+1):
ans *= 1
return ans

The fact that itās written in C doesnāt matter very much, because it still has to work through the Python data structures - and more importantly, the actual math is the bottleneck once the numbers get large. If you use a naive loop in Python, most of the time is spent in the C implementation of the multiplication operator. Instead, math speeds things up by factoring out powers of two, which produces some overlapping results, and reusing common work.

The (3 * 1) result can be reused when computing (7 * 5 * 3 * 1), which can be reused when computing (15 * 13 * 11 * 9 * 7 * 5 * 3 * 1). And the factors of two can be multiplied all at once, by a simple bit-shift.

However, these kinds of optimizations donāt work so neatly for permutations and combinations. The products of all the odd numbers donāt telescope nearly so neatly when you donāt multiply all the way down to 1. And sometimes these kinds of optimizations are just low priority. Up through Python 3.10, math.permsimply multiplied in a loop, like the Python code I showed above.

Starting in 3.11, itā¦ does much more complicated things. Thereās a long thread on the issue tracker here, describing the changes:

Itās written about math.comb (i.e, combinations rather than permutations), but many of the optimizations are shared. Some of them even apply to math.factorial, because thereās now a precomputed table of small odd-factorial results.

I tried the benchmarks mentioned in the OP there (on slightly slower hardware ). From 3.8 to 3.12, the ājust calculate factorials and divideā approach (for 100000 choose 50000) already takes about 30% less time than before, which is impressive. The implementation used by math.comb is now faster than the naive factorial division approach, instead of being slower. Overall itās now more than ten times as fast for these values (and itās an algorithmic improvement, so it only gets better for larger inputs). And up to the factorial of (on 64-bit builds) 20 is a table lookup now. (This doesnāt improve things very much for those inputs, because when things are this fast, the time is dominated by sanity checks and working through the C-Python API. But I assume it matters a lot for the algorithm internally.)

Not really, as both lists are descending. So it finds not 1,2,etc but d,d-1,etc. And with their example where d is close to n, those are almost at the start.