Plot Twist: After Years of Compiling Python, I'm Now Using AI to Speed It Up

Hi everyone,

This post: "AI Python compiler" - Transpile Python to Golang with LLMs for 10x perf gain? PyPI-like service to host transpiled packages motivated me to share my own journey with Python performance optimization.

As someone who has been passionate about Python performance in various ways, it’s fascinating to see the diverse approaches people take towards it. There’s Cython, the Faster CPython project, mypyc, and closer to my heart, Nuitka.

I started my OSS journey by contributing to Nuitka, mainly on the packaging side (support for third-party modules, their data files, and quirks), and eventually became a maintainer.

A bit about Nuitka and its approach:

For those unfamiliar, Nuitka is a Python compiler that translates Python code to C++ and then compiles it to machine code. Unlike transpilers that target other high-level languages, Nuitka aims for 100% Python compatibility while delivering significant performance improvements.

What makes Nuitka unique is its approach:

  • It performs whole-program optimization by analyzing your entire codebase and its dependencies
  • The generated C++ code mimics CPython’s behavior closely, ensuring compatibility with even the trickiest Python features (metaclasses, dynamic imports, exec statements, etc.)
  • It can create standalone executables that bundle Python and all dependencies, making deployment much simpler
  • The optimization happens at multiple levels: from Python AST transformations to C++ compiler optimizations

One of the challenges I worked on was ensuring that complex packages with C extensions, data files, and dynamic loading mechanisms would work seamlessly when compiled. This meant diving deep into how packages like NumPy, SciPy, and various ML frameworks handle their binary dependencies and making sure Nuitka could properly detect and include them.

The AI angle:

Now, in my current role at Codeflash, I’m tackling the performance problem from a completely different angle: using AI to rewrite Python code to be more performant.

Rather than compiling or transpiling, we’re exploring how LLMs can identify performance bottlenecks and automatically rewrite code for better performance while keeping it in Python.

This goes beyond just algorithmic improvements - we’re looking at:

  • Vectorization opportunities
  • Better use of NumPy/pandas operations
  • Eliminating redundant computations
  • Suggesting more performant libraries (like replacing json with ujson or orjson)
  • Leveraging built-in functions over custom implementations

My current focus is specifically on optimizing async code - identifying unnecessary awaits, opportunities for concurrent execution with asyncio.gather(), replacing synchronous libraries with their async counterparts, and fixing common async anti-patterns.

The AI can spot patterns that humans might miss, like unnecessary list comprehensions that could be generator expressions, or loops that could be replaced with vectorized operations.

It’s interesting how the landscape has evolved from pure compilation approaches to AI-assisted optimization. Each approach has its trade-offs, and I’m curious to hear what others in the community think about these different paths to Python performance.

What’s your experience with Python performance optimization?

any thoughts?

3 Likes

Performance bottle necks are best identified using a profiler, e.g..

Writing vectorised code can be tricky for beginners, but I don’t think identifying List comprehensions and other code smells is done unacceptably by the current generation of linters. Is AI really adding anything of value there, other than ticking boxes and adding buzzwords?

I’d definitely couple AI based optimisation with benchmarking - agents can be in the habit of suggesting changes for the sake of them, even more than some human coders.

Producing textual output based on prompts at the end of the day, is all an LLM ‘knows’ how to do. It’s up to you to know when to stop.

1 Like

(I’ve been vectorizing a lot in my activities. Today I would prefer to use tools based on einsum-like operations than lots of reshapes everywhere.)
When I need to quickly check if an algorithm is already here to do what I want to do, I ask an LLM, it is very good to scan the web for “state-of-the-art code snippets”.
Usually, the code it provides me is unsatisfying when it is more than 20 lines, but it provides a good basis that I always rework. What is very satisfying with the LLMs are they’re ability to figure out how to use some under-documented module because they can read all the pages about this module on the web at once.
So basically, LLM usually provides me the proper syntax usage of external modules and a quite good structure for my programs, but I keep fully review it everytime at the end.

Hi James,

while we currently do optimization at a function / method level, we do use Line-Profiler in order to generate much better and accurate optimizations on their bottlenecks

Writing vectorised code can be tricky for beginners, but I don’t think identifying List comprehensions and other code smells is done unacceptably by the current generation of linters.

we’re specifically targeting performance improvements - not code smells. Here are some real examples from production codebases where AI found non-obvious optimizations:

  • albumentations #2376 - 77% speedup: Replaced list comprehension with NumPy array for LUT creation and used np.where for conditional assignments
  • roboflow/inference #1092 - 188% speedup: Used np.argmax() for single-pass solution vs finding max index in two passes
  • kornia #3218 - 130% speedup: Replaced matrix multiplication and redundant vector operations with direct dot products using torch.sum, avoiding recomputation via algebraic identity

I’d definitely couple AI based optimisation with benchmarking - agents can be in the habit of suggesting changes for the sake of them

Absolutely agree! We’ve actually analyzed over 100k of our own optimization attempts and we’ve found that

  • 62% of LLM suggestions had incorrect behavior (would introduce bugs if accepted)
  • Of the remaining 38% that were behaviorally correct:
    • 73% resulted in performance gains below 5% or even decreased performance
    • Only ~10% of all attempts produced meaningful, correct optimizations

This is why we:

  1. Run extensive test suites to verify behavioral correctness - we also generate our own regression tests using LLM generated tests and concolic tests
  2. we check code coverage and have a threshold for it.
  3. generate multiple optimization suggestions (including Line-Profiler guided ones)
  4. Benchmark each candidate against the original code as a baseline.
  5. Only submit PRs when we have both correctness AND meaningful performance gains
1 Like

This is honestly one of my favorite things about LLMs. I’ve had so many “wait, that function exists??” moments, most of them being actually real, and others just being AI hallucinated slop.