I teach Python at a university and I’d like to discuss an idea for a Bachelor thesis topic related to (CPython) interpreter optimizations.
The goal would be to try to leverage Python type hints from the typing module to optimize Python programs (that contain type hints), for example by using the type hints to guide the recently added bytecode adaptive specialization machinery, or even speculatively use some simple JIT compilation guided by the type hints.
This is not currently performed by CPython AFAIK (although I remember hearing Guido talking about this on the Fridman podcast, that it could be a possibility in the future). The “Faster CPython” project has planned to work on some JIT functionality for CPython, but I don’t know what is the progress of that.
So I wanted to ask whether there have been some similar ideas proposed already, and in general what do people think about this.
It would also be useful for me simply to find some people (CPython developers?) that work on something related (or some discussion forum where we could talk about this), as I know of a skilled student that I would like to assign the topic to, and it would be great we could potentially get some feedback/guidance from people that have deep knowledge of the CPython interpreter.
Hi Jakub. I’ve worked on the Cinder Static Python project. Happy to chat on this topic. You may also be interested in mypyc and typed-python as prior art in this space.
If your goal is simply to optimize Python code, I think you’ll find that using static type hints is not better than using the much richer type information available at runtime. That means a project like the one you are suggesting probably has to be motivated by some additional considerations, such as “for developer experience reasons, we want to guarantee that type annotations can’t be wrong, by raising TypeError at runtime if they are” (Static Python does this), or “we want to add additional types, like unboxed machine integers/floats, which are easier to manage as an explicit opt-in type with distinct semantics than as an implicit optimization” (I think all three of SP, mypyc, and typed_python do this.)
In the case of Static Python, we had a third idiosyncratic consideration, which is that we wanted to optimize a pre-forking server workload, and in order to not repeat the optimizations in every worker process, we needed to do the optimizations before fork, which meant that taking advantage of runtime type information was more difficult.
Thanks for your answers! I was aware of some other tools that did something like this, but I didn’t know mypyc and Static Python, so we will definitely check them out.
Regarding the use-case, I realize that using runtime type information is more powerful, although I think that there might still be some niche where type hints could help - e.g. to warm the JIT faster, since the type information is available instantly. In a way, this is not so problematic for the bachelor thesis, as the goal would be to explore various approaches for JIT optimization and similar techniques, and possibly also compare the trade-offs of using type hints vs runtime type information. Even if type hints work worse than runtime types, it could be interesting to quantify this difference.
Since you mentioned that there is some ongoing progress in this area (although it uses runtime types and not type hints), I wonder if you could think of some self-contained “subproject” or some set of tasks/experiments that could benefit your work and that could be assigned to the student to get him up to speed. If not, could you perhaps give us a few pointers on where one could start to look in CPython, or what could be a reasonable first step?
To be honest, neither me, nor the student is deeply familiar with CPython, and implementing an optimization like this in an “unfamiliar territory” might be a bit of a moonshot Still, we want to try how far can we get.