Which part of CPython3's repository is resposible for algorithm/code optimizations?

take a look at these two programs, the first is in CPython3 and the second one is C++17

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
 
 
if __name__ == "__main__":
    n = 46
    print(fibonacci(n))

#include <bits/stdc++.h>

using namespace std;

int fib( int n)

{
     if (n <= 1)
         return n;
     return fib(n - 1) + fib(n - 2);
}

int main()

{
     int n = 46;
     cout << fib(n);
     getchar ();
     return 0;
}

so I benchmarked them and not to my suprise it took python 0.9 seconds to finish the program and it took 11 second for C++ to complete.

which is undestandable since C++ is like a big panel with a lot of different functions and buttons so if you write the code as I did and don’t consider optimizations like tail-recursion. you’re done for, which makes it a bad programming choice for novice programmers.

but I want to apply the option that python automatically grants at compile time. I call it algorithmic optimization

if you didn’t know to compile a c/cpp source you usually run it with optimization arguments such as O2 the most favored:

g++ fib.cpp -O2 -o fib.exe

now I want to change it like this:

g++ fib.cpp -Oa -o fib.exe
g++ fib.cpp -O2a -o fib.exe

but I don’t know at which point cpython does these optimizatios (when its compiled to bytecode or when it is interpreted etc.)
so I thought instead of going through cpython’s massive repo head on first I raise the question here.

after all as a rule of thumb in clean coding is not to rewrite a code that’s already written. so I believe in writing it from zero UNLESS I REALLY HAVE TO. but I can translate back what cpython’s community has done and commit it to gcc or llvm.

if it comes to fruition novice programmers will also be able to get a kick out of C++ and learning curve won’t be that high, until the point they learn how to write a fibonacci that executes in less than 0.9 seconds.C/CPP being compiled languages and all that.

I’m very curious about this. I can’t get that naive Python code to run anything like that quickly in CPython without a change that, while simple, dramatically changes the algorithm:

import functools
@functools.cache
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(46))

This simple-looking decorator completely removes the iconic double-recursion. Algorithmically it’s more similar to this:

_cache = [0, 1]
def fibonacci(n):
    while n >= len(_cache):
        _cache.append(_cache[-2] + _cache[-1])
    return _cache[n]

print(fibonacci(46))

And as such, it is indeed, very fast. (It reduces it from O(fib(n)) to O(n) runtime.)

Using PyPy does speed up run time with the original code, but nothing like the 0.9 seconds you cite. (On my system, it ran in roughly 25 seconds, but specific hardware wil vary that figure.)

One quick clarification: The eleven seconds for the C++ version shouldn’t have included the getchar() call, right? If you run it, then wait a long time before terminating the program, that shouldn’t be included in the time. I’m sure you didn’t mess that up, but just wanting to make sure :slight_smile: In any case, when I ran it on my computer, the C++ version (without the getchar call) took 13 seconds, so the timing does seem pretty plausible.

So if your hardware is very roughly comparable to mine (within, say, a 25% difference on the C++), there has to be some sort of optimization happening that is definitely not obvious. You say “CPython3”; can you give more details about the exact version? Do you have any sort of memoization happening? Is it producing the correct result (1836311903)?

As a short and simple response though, I would say that the easiest way to optimize this particular algorithm is called “memoization”, and can be achieved with the @functools.cache decorator. As long as the function you’re decorating is a pure one (that is, it just calculates a value based on its arguments, nothing more), memoization should be safe. But Python won’t ever do that by default, since there’s no way to know what functions are pure.

Are you sure about those results? They are not believable.

Unfortunately the formatting of your Python code is broken (all the indents are lost) so it is impossible to tell whether your code is written correctly or not, or how you benchmarked this, but if we fix the indentation in the most obvious way, the algorithm you have chosen is sometimes described as “the worst algorithm in the world”.

On my computer, I gave up after 10 minutes. I estimate it was about 60% of the way through the process, so extrapolating, it probably would have taken around 16 minutes to finish if I was more patient.

I know my computer isn’t the speediest machine in the world (Intel i3-8109U CPU at 3.00GHz) but I’d be surprised if your PC was a thousand times faster.

The correct result for the 46th Fibonacci number is 1836311903. Using the recursive definition you post (with the indentation fixed) would require 5942430145 function calls.

Fixing your formatting, and supplying a better version:

# This is ridiculously slow.
def fibonacci_naive_recursive(n):
    if n <= 1:
        return n
    return fibonacci_naive_recursive(n-1) + fibonacci_naive_recursive(n-2)


def fibonacci_iterative(n):
    if n <= 0:
        return 0
    prev, curr = (0, 1)
    for _ in range(2, n+1):
        prev, curr = curr, prev+curr
    return curr

As a general rule, it doesn’t. Current versions of CPython perform almost no optimizations. Not even tail call optimization (which doesn’t apply to the Fibonacci algorithm you have used).

It certainly doesn’t optimize something like the recursive definition of the Fibonacci function. If you use that definition to find the 46th Fibonnaci number, it will make all 5.9 billion function calls. There are ways to optimize it, but the Python interpreter doesn’t do them automatically.

I suggest you check your code, and your benchmarking method, because I don’t think you have measured what you think you have measured.

I suppose it is just barely possible that your computer is 1000 times faster than mine, but in that case your C++ code would not have taken 11 seconds. As a general rule, most C++ code, even with optimizations turned off, will run about 10-100 times faster than the equivalent Python.

Highly optimizing Python JIT compilers like PyPy may be able to beat C and C++ in very specialised circumstances, but this is not one of them.

Welcome to the Python Discourse!

It doesn’t explain the difference you’re seeing here as others have stated, but to answer the question in your title, historically, the only real optimizations in core Python besides the peephole optimizer and optimizing the code of the specific library routines was just writing performance-critical stuff as C extensions. However, that has been changing substantially in recent versions, with the new bytecode specialization, interpreter generation and the potential for larger changes in the future like GIL-removal and adding a JIT, as well as many other optimizations. If you’re curious about some of the current active optimization work ongoing with Python starting in Python 3.10 and accelerating more dramatically in Python 3.11 or 3.12, check out the Faster CPython project.

On a side note, as others mentioned, the lack of (and highly inconsistent) code formatting above made it difficult for others to attempt to reproduce your results. When adding code or output to your post, please make sure you enclose it in code fencing so it is formatted correctly for others to read and copy, as I’ve edited your post to add. You can do so with the </> button in the toolbar, or via typing triple backticks above and below the code in question, optionally with the language name (e.g. python) for syntax highlighting, like this:

```python
<YOUR CODE HERE>
```

Thanks!

1 Like

Including <bits/something.h> I doubt is valid C++ std usage.
The bits folder is usually internals of the implementation.

My results are on my M2 mac:

% time py3 fib.py
1836311903
PYTHONSTARTUP=~/bin/py3startup.py python3 fib.py 175.63s user 0.48s system 99% cpu 2:56.13 total

% time ./a.out
1836311903
./a.out 8.97s user 0.03s system 99% cpu 8.997 total

The python code is 19.6 times slower.
Which is as Stephen suggested between 10-100 times.

Thanks for running that test!

I’m actually impressed that Python is only 20 times slower. What happens if you enable all the C++ optimizations you can?

Using llvm’s -O2 I get this:

% c++ -O2 fib.cpp
% time ./a.out
1836311903
./a.out  4.98s user 0.02s system 94% cpu 5.296 total

% c++ fib.cpp
% time ./a.out
1836311903
./a.out  8.97s user 0.03s system 97% cpu 9.226 total

I did not bother with averaging lots of runs etc etc.

And as others have pointed out this might be a ok ish mathamatical algorithm its suck as a computer algorithm.

You probably mean Python doesn’t have a way.

There is a way, to a language designer.

Shedskin or Cython could probably run circles around a purely CPython approach.

Yes, I meant that, in Python, there is no way to know which functions are pure.

If you want to completely rewrite everything from scratch, sure, but at that point it’s not optimizing Python any more.