Prime factorization algorithm

CyrusPolentschz · May 25, 2024, 3:55pm

This is a code to find the prime factorization of consecutive numbers between 1 and n. It’s part of a larger program I’m working on. Since to find prime factorizations I only need to test for primes, for each number, starting from 1, in this program I check to see if the number is divisible by any of the primes in the primes list. If not, then it is added to the primes list. This way each larger number builds off previous primes found. Is there a way to make this algorithm faster?

primes = []

def factorize(n):
    global primes; global primitivesp_nos; x = []
    for i in primes:
        while n % i == 0: x.append(i); n = n//i
        if i * i > n: break
    if n > 1: x.append(n)
    if len(x) == 1: primes.append(x[0])
    return x  
    
print([factorize(i) for i in range(1, 10)]) 
#Output: [[], [2], [3], [2, 2], [5], [2, 3], [7], [2, 2, 2], [3, 3]]

abessman · May 25, 2024, 4:32pm

Use a prime sieve to generate candidate factors.
Implement the algorithm in another language like C, and call it from Python via ctypes. Python is great at a lot of things, but number crunching isn’t where it excels.

jeff5 · May 25, 2024, 4:55pm

You’ll know about the Sieve of Eratosthenes? Each time you find a prime you eliminate all its multiples from further consideration, so they cost you almost nothing.

That only finds primes, but you want to list the factors, which is a nice twist. So suppose you start with:

a = [[i] for i in range(n)]

Special treatment for a[0] and a[1], but then then for each k > 1 where len(a[k])==1, for all i a multiple of k where len(a[i])==1 you replace a[i] with [k, a[i//k]]. I think this means that by the time you have processed k=2, your list looks something like:

[[], [1], [2], [3], [2, [2]], [5], [2, [3]], [7], [2, [2, [2]]], [9], [2,[5]], ... [2,[9]], ... ]

Now why have I inserted a[i] and not flattened it? It is so that it occurs as a reference back to the earlier entry. The entry for a[8] took no more work than for a[4], and a[18] is still [2,[9]], but will magically become [2,[3,[3]]] as soon as we process a[3].

I have not tried this.

I wonder if there is a way to optimise out the len()==1 tests?

@abessman is right that Python is not made for flat-out speed, but your first step is always a good algorithm, and Python is great for exploring.

jeff5 · May 25, 2024, 5:10pm

Ok, this is not quite right if you just do an assignment. You have to replace the contents of the list object a[i] without changing its identity or the references are still to the old value. a[i][:] = [k, a[i//k]] (I think).

m-f-h · February 18, 2025, 6:45pm

nice idea, to create the list of required primes dynamically during the factorization process. but:

please use a meaningful variable name, “p”, and not “i” for the primes!
you should “break” if p*p > n before, not after making the trial division.
your approach requires to factor “all” integers in increasing order. e.g., if you factor 11 before having factored 7, then it will add 11 to the list of primes, after 5 (given that the list is required at least up to 5 to have 5*5 > 11 => break).
Summarizing, while “isprime” and primes() can use each other without problem, the factorization algorithm must be more careful with adding primes to the list. O think a separate list “larger primes” might be a good idea, from which you can move the smallest element to the “primes” list when you reach this element after having checked all smaller ones.

Wombat · February 21, 2025, 2:26am

The itertools math recipes has example code for generating primes, factoring, and testing primality. It runs faster than you would typically expect for Python.

from itertools import islice
from contextlib import suppress
from math import isqrt

def iter_index(iterable, value, start=0, stop=None):
    "Return indices where a value occurs in a sequence or iterable."
    # iter_index('AABCADEAF', 'A') → 0 1 4 7
    seq_index = getattr(iterable, 'index', None)
    if seq_index is None:
        iterator = islice(iterable, start, stop)
        for i, element in enumerate(iterator, start):
            if element is value or element == value:
                yield i
    else:
        stop = len(iterable) if stop is None else stop
        i = start
        with suppress(ValueError):
            while True:
                yield (i := seq_index(value, i, stop))
                i += 1

def sieve(n):
    "Primes less than n."
    # sieve(30) → 2 3 5 7 11 13 17 19 23 29
    if n > 2:
        yield 2
    data = bytearray((0, 1)) * (n // 2)
    for p in iter_index(data, 1, start=3, stop=isqrt(n) + 1):
        data[p*p : n : p+p] = bytes(len(range(p*p, n, p+p)))
    yield from iter_index(data, 1, start=3)

def factor(n):
    "Prime factors of n."
    # factor(99) → 3 3 11
    # factor(1_000_000_000_000_007) → 47 59 360620266859
    # factor(1_000_000_000_000_403) → 1000000000000403
    for prime in sieve(isqrt(n) + 1):
        while not n % prime:
            yield prime
            n //= prime
            if n == 1:
                return
    if n > 1:
        yield n

def is_prime(n):
    "Return True if n is prime."
    # is_prime(1_000_000_000_000_403) → True
    return n > 1 and next(factor(n)) == n