CPython optimizations leveraging type hints [Bachelor thesis topic]

Kobzol · May 31, 2023, 2:10pm

Hello
I teach Python at a university and I’d like to discuss an idea for a Bachelor thesis topic related to (CPython) interpreter optimizations.
The goal would be to try to leverage Python type hints from the typing module to optimize Python programs (that contain type hints), for example by using the type hints to guide the recently added bytecode adaptive specialization machinery, or even speculatively use some simple JIT compilation guided by the type hints.

This is not currently performed by CPython AFAIK (although I remember hearing Guido talking about this on the Fridman podcast, that it could be a possibility in the future). The “Faster CPython” project has planned to work on some JIT functionality for CPython, but I don’t know what is the progress of that.
So I wanted to ask whether there have been some similar ideas proposed already, and in general what do people think about this.

It would also be useful for me simply to find some people (CPython developers?) that work on something related (or some discussion forum where we could talk about this), as I know of a skilled student that I would like to assign the topic to, and it would be great we could potentially get some feedback/guidance from people that have deep knowledge of the CPython interpreter.

kj0 · May 31, 2023, 3:36pm

Hi Jakub, I’m a CPython core dev working on interpreter performance.

Your idea sounds awfully similar to Static Python from Cinder. Have you taken a look at that?

I proposed a somewhat related idea for CPython, but instead of using type hints, it would use collected type information from runtime Tier 2 Optimizer: Superblock Type Propagation & Unboxing · Issue #564 · faster-cpython/ideas · GitHub . We wrote a fork of CPython that supports that (you can read more in that issue). The (tentative) plan is to add this to the optimizer stage in CPython 3.13’s superblock optimizer/copy and patch JIT compiler.

I recommend reading the Faster CPython ideas repo to get a feel of what the community discussions are currently about.

I would be excited if you would like to discuss more (either on here or emailing me at kenjin@python.org is fine)!

carljm · May 31, 2023, 6:48pm

Hi Jakub. I’ve worked on the Cinder Static Python project. Happy to chat on this topic. You may also be interested in mypyc and typed-python as prior art in this space.

If your goal is simply to optimize Python code, I think you’ll find that using static type hints is not better than using the much richer type information available at runtime. That means a project like the one you are suggesting probably has to be motivated by some additional considerations, such as “for developer experience reasons, we want to guarantee that type annotations can’t be wrong, by raising TypeError at runtime if they are” (Static Python does this), or “we want to add additional types, like unboxed machine integers/floats, which are easier to manage as an explicit opt-in type with distinct semantics than as an implicit optimization” (I think all three of SP, mypyc, and typed_python do this.)

In the case of Static Python, we had a third idiosyncratic consideration, which is that we wanted to optimize a pre-forking server workload, and in order to not repeat the optimizations in every worker process, we needed to do the optimizations before fork, which meant that taking advantage of runtime type information was more difficult.

Kobzol · May 31, 2023, 7:50pm

Thanks for your answers! I was aware of some other tools that did something like this, but I didn’t know mypyc and Static Python, so we will definitely check them out.

Regarding the use-case, I realize that using runtime type information is more powerful, although I think that there might still be some niche where type hints could help - e.g. to warm the JIT faster, since the type information is available instantly. In a way, this is not so problematic for the bachelor thesis, as the goal would be to explore various approaches for JIT optimization and similar techniques, and possibly also compare the trade-offs of using type hints vs runtime type information. Even if type hints work worse than runtime types, it could be interesting to quantify this difference.

Since you mentioned that there is some ongoing progress in this area (although it uses runtime types and not type hints), I wonder if you could think of some self-contained “subproject” or some set of tasks/experiments that could benefit your work and that could be assigned to the student to get him up to speed. If not, could you perhaps give us a few pointers on where one could start to look in CPython, or what could be a reasonable first step?

To be honest, neither me, nor the student is deeply familiar with CPython, and implementing an optimization like this in an “unfamiliar territory” might be a bit of a moonshot Still, we want to try how far can we get.

Thanks!

encukou · June 1, 2023, 8:07am

I wonder if static type hints could make the compiler suggest an initial specialization, and e.g. boost a counter to reduce warm-up time.

kj0 · June 2, 2023, 12:54pm

Specialization’s warm up isn’t very significant. It’s (IIRC) only 8 execution counts for a bytecode instruction. I’m not sure if hinting initial specializations would provide a measurable speedup.

brandtbucher · June 2, 2023, 9:37pm

It’s 8 in 3.11, and only 2 in 3.12!

Yeah… specialization is already super fast, and implementing this idea would add a ton of complexity to the bytecode compiler (which currently doesn’t concern itself with types at all).

Let’s also not forget that Python type hints are basically just arbitrary code.