Python code optimization

Hi! I’m writing a Python program to determine if a number n is abundant, that is, if its factors (excluding itself) have a sum greater than n.

I know you can just use

from sympy import divisors

n = int(input())
if sum(divisors(n)) - n > n:
       print("n is abundant.")

But is there a way to figure out that a number is definitely NOT abundant before having to generate all the factors? For example, if n were to equal 331, testing the factors 2, 3, 5, 7, 11 reveals that none of them are divisors, so is there a way from that observatio or some similar one you could already conclude that 331 is not abundant and stop the program there?

Maybe think about how one can sum the factors?

I’m just thinking out loud here, but it seems to me that the worst case
scenario for finding all the factors f is when f^p >= n for some
power p - then none of the preceeding potential factors will be valid.

So in that case f*p == n. Which should mean that p*f == n which
means that once you’ve reached some factor f and f^f > n maybe you
should stop?

It might also mean there’s a very finite number of abundant numbers?

Of course as you find factors you’ll be subtracting them from the
required sum, and you should be comparing the remaining sum, not n.
But I’d think you’d hit f^f > remaining_sum pretty fast.

I haven’t managed to formalise, let alone prove, any of this. Just where
my thinking is going.

I think there are infinitely many abundant numbers. Any multiple of an abundant number is abundant. Although good idea

Then I don’t understand your definition. Suppose n is abundant. You’re
saying 2*n is also abundant, but you’ve only added 2 to the sum of
factors while you’ve doubled n. Doesn’t seem true.

(edit: misread that, silly font)

You’ve doubled all of the factors too though. You don’t only count prime factors.

Oh, all factors. Right. My bad.

Yeah, easy one to forget. I, too, am often saying “factor” and meaning “prime factor”.

Side note, by the way: This isn’t really about optimizing your Python code so much as optimizing your algorithm. So you might find good information in general mathematical texts.

My first look at this would be a sieve. Since any multiple of an abundant number is also an abundant number, you can efficiently mark large sections of a number space as abundant as soon as you find one example. For instance, you start out by testing the first eleven numbers, and they’re not, but then you hit 12. Immediately also mark 24, 36, 48, 60, … as abundant. They don’t need to be tested. Then 18 comes along. You can mark 36 (it’s already marked, not a problem), 54, 72 (already marked), 90, etc. 20 gives you 40, 60, 80, 100. Carry on like this, testing only those numbers which are not already marked.

In fact, since any multiple of a perfect number will be abundant, you could shortcut this even more. As you test the first few numbers for abundancy, you’d find that 6 is perfect, and then immediately zip off and mark every sixth number from 12 onwards as abundant. That right there is a 16% time saving :smiley:

None of this will help you with the simple question “is n abundant?”, but if you’re trying to test a large set of numbers (say, doing analysis on all the abundant numbers up to 1,000,000), this could help a lot.

Side note: While looking for ways to improve this, I found these two points in the Wikipedia page on abundant numbers:

  • An abundant number which is not the multiple of an abundant number or perfect number (i.e. all its proper divisors are deficient) is called a primitive abundant number
  • Every abundant number is a multiple of either a perfect number or a primitive abundant number.

This sounds tautological to me. If it’s abundant and not a multiple of abundant or perfect, then it’s primitive abundant. Therefore every abundant is a multiple of a perfect or a primitive abundant… isn’t that fairly obvious?

I was trying to use that last line to further optimize (“if it’s not a multiple of a primitive abundant, skip it”), but I don’t think there’s anything useful in there to use :smiley:

There are some filters you can apply based on some trivial observations.

  1. Your code, as written, will test primes, but primes can be excluded because they are trivially non-abundant: if p is a prime its only proper divisor is 1. Since this is known, you don’t need to evaluate sum(divisors(n)) - n > n if n is a prime. As may you already know, Sympy provides an isprimes() function which you could use to filter out primes. But as the number of primes get less dense relative to a given n as n gets bigger this isn’t all that helpful.

  2. Following from (1), you can also filter out powers of 2: if n = 2^k then its proper divisors are 1, 2, ... 2^(k-1), and these add up to 2^k - 1 < n. Note: this is not true for higher prime powers - if p is a prime the proper divisor sum is p^k * [(p - 1/p^k)] / (p - 1), which is greater than p^k if and only if p > 2.

I think there is another approach which involves looking at the “average size” of proper divisors of n: if s'(n) is your sum of proper divisors function, and d'(n) is the number of proper divisors of n then you can write s'(n) = A(n) * d'(n), where A(n) is the arithmetic mean of the proper divisors of n. For a non-abundant number s'(n) < n is equivalent to d'(n) < n / A(n). It seems possible that with a good estimate for A(n) you could test the inequality d'(n) and eliminate those that meet that.

The use of divisors together with sum can be replaced with factorint, which gives the prime factorization.

If the prime factorization is \prod_{k=1}^{m}p_k^{a_k} then the sum of divisors is

\prod_{k=1}^{m}\frac{p^{a_k+1}-1}{p_k-1}

1 Like

Yes, this is based on the prime factorization theorem - representing a positive integer n as a product of the different prime factors p_k of n with multiplicities a_k of the p_k, i.e. each a_k is the highest power of p_k such that (p_k)^a_k divides n.

I don’t know how much this improves the speed of the abundance test - you still have to compute the proper divisors sum for a given integer. And in the OP there was a question of determining whether the given number could be ruled out as being abundant based on some number-theoretic test or properties.

I think the sieve method suggested by @Rosuav is the way to go, with a number of filters being applied alongside, e.g. excluding primes, powers of 2 etc.

Nah, no need for the Fundamental Theorem of Arithmetic. This gives the existence of the factorization and its uniqueness up to ordering of the primes, but the formula I mentioned is only requires adding geometric progressions.

\prod_{k=1}^{m}\sum_{j=0}^{a_k}p_k^j

If you distribute this (Euler) product, you get the sum of all divisors.

The formula helps, because the prime factorization, while expensive to compute, can be much smaller than the list of all divisors. And to list all divisors the prime factorization had to be computed anyway.

And yes, you can mark the non-primitive abundant, but the primitive abundant still need to tested directly from the definition.

This is a side issue: but my point is that the formula you gave for the sum of proper divisors of n in terms of the prime factors of n and their multiplicities is based on the prime factorization theorem. It is called fundamental for a reason - it precedes anything like a sum of divisors formula for n involving the prime factors of n and their multiplicities.

I haven’t looked at the implementation of sympy.ntheory.factorsint, but it would be based on identifying prime factors and their multiplicities, whose existence and uniqueness is given by the prime factorization theorem.

A further point on the OP: if you want to test your finalised function for abundant numbers it may be handy to write some unit tests using a subset (or subsequence) of the abundant numbers. I think the OEIS page on this might be handy.

What I mean is that FTA gives the writing n=\prod_{k=1}^{m}p_k^{a_k}.

The formula for adding divisors still works regardless of the FTA. In every integral domain, regardless of it being unique factorization domain or not, the sum \sum \prod_{k=1}^mp_k^{r_k}, where the sum ranges over all 0\leq r_k\leq a_k, satisfies the formula above. In every commutative ring the Euler product works, but let’s say integral domain, to be able to define the fractions \frac{1}{p_k-1} in the field of fractions.

:slight_smile: again, this is a side discussion, but I (and probably the OP) was talking in context of the integers, which we know is a UFD. You’re talking in more general terms, and this discussion probably belongs elsewhere. Although if you were right then you’re saying that the exponents a_i can be determined for a given element of any integral domain [?].

Never mind. No, the exponents a_k is not what I was talking about.

Sympy already has a function to test abundant numbers:

https://docs.sympy.org/latest/modules/ntheory.html#sympy.ntheory.factor_.is_abundant

sympy also has a function for sum of divisors divisor_sigma, which computes the formula above

return Mul(*[cancel((p**(k*(e + 1)) - 1) / (p**k - 1)) for p, e in factorint(n).items()])