Wild idea RFC: A reasoning engine for Python types and soundness

Hello Python enthusiasts! As a theoretical computer scientist with some experience on software verification, I’m fascinated by the intricacies of static typing in Python. While I’ve previously worked mainly outside the traditional type-checking realm (in things like bounded model checking), I’ve recently been thinking about something that I believe could contribute to the ongoing effort in typing formalization and standardization. It’s ambitious, but isn’t that where the fun begins? I’d love to hear your insights on its feasibility, what I have overlooked, and any directions or resources you might suggest.

Idea: Provide a level of operational semantics for Python that tracks the types of values, but is able to “execute” Python constructs at the type level. That is, it would not execute using concrete values but types.

Now, if this sounds very similar to what type inference in the existing type checkers does, that’s because it is. The crucial difference would be allowing to execute the type inference using either type annotation level types or Python runtime level types.

This would allow for expressing formally claims such as “no correctly annotation-typed program [without known-unsafe constructs X, Y or Z] can cause a runtime type error”; or that “any value statically typed as X will have a runtime value belonging to set S”. I hope that much of these could be verified using automated theorem proving instead of manual labor in a proof assistant.

(NOTE: For now, assume every sentence contains a disclaimer “in the absence of Any”.)


  1. Sanity check of the typing specification. By expressing the properties in formal language, we certainly need to be able to fully understand what the annotations supposedly mean. This should help discover any gaps in specifications.
  2. Keeping us informed of exactly what constructs in the annotation system allow for unsound behavior (runtime type errors despite type checking correctly).
  3. Allow people working on the type system or the Python language to understand what kinds of effects their proposed additions or modifications to the language would have on type safety, i.e. if it opens new avenues of unsoundness.
  4. Perhaps a formal, machine readable formal specification of the type system would allow synthesizing type checkers from it. That may or may not be useful.

Personally, I’m mostly tickled by point (3).

Random thoughts on the “How”:

I’ve been looking into parsing Python. Clearly a lot of this work must be already done by existing type checkers, so I initially looked into getting dumps from them. That may still be the best approach.

I tried ast and RustPython’s parser. They seem to give approximately the same information. One aspect that is clearly somewhat difficult, at least given the tools that I know of, is mapping variable names to common variables, in that ast only tells us that we’re referring to an identifier “foo”. I think some of that may be difficult given the dynamic nature of Python, but clearly the static analyzers tackle it somehow.

One idea I got today: If I’m not mistaken, I think it should be possible to track the type annotations down to the bytecode level, having a stack of type annotations. Reasoning on the stack machine should be a lot easier than reasoning on the AST level. The details are a bit hazy to me still, particularly because I only understand Python bytecode at a very general level, but I think this might sidestep some complexities like variable scopes and MRO in the simulation phase. On the negative side, it leaves a gap between the AST and the bytecode that needs to be filled (but then, dividing the problem in two halves isn’t necessarily bad).


  • Does this sound theoretically feasible? Am I missing some reason why it’s not?
  • Would you consider such a tool useful? What do you think of the 4 goals I listed?
  • Do you have any ideas on the tracking variables topic?
  • Has anyone thought about tracking type annotations on the bytecode level? Is there some reason why that necessarily loses information?

Wasn’t there already a type checker that does something similar by “executing” the bytecode? A bit of search later: Yes, pytype does this.

For tiny projects, sure. And it might help with typing spec. But I would be very careful with statements like “proofing something about python code”. Python is a very dynamic language and the “unsafe” constructs you have to take into account are in fact many different things and are used all over the place. Also, your system will have to do a decent amount of value tracking as well (potentially via the proxy “Literal” or something) to be able to deal with getattr at least somewhat gracefully.

Python’s type system doesn’t have a formal expression of rules. The current state of the specification has stated intent and a test suite, with some known places where the language is unclear, ambiguous, self-conflicting and where type checkers have reached differing conclusions on behavior. It’s an in progress effort for that to be cleared up.

Suppose we assume specific semantics or formally adopt specific semantics. In that case, this is possible, coq program which was used to demonstrate compatability between a specific formal model and python’s type system , importantly including gradual types such as Any.

This part in particular may be harder than you might think. Certainly not impossible, we can reason about these and look for which construct or interaction in the type system is “at fault” for unsoundness, but there are several things that are intended (whether or not that is a good idea is still debated) that are unsound, but done for pragmatism. Two examples off the top of my head: LSP is not enforced on constructors and The numeric tower is not type safe for all specified allowed substitutions. You’d need to do a lot of work to intentionally encode these into a formal proof engine in a way that didn’t result in the whole proof blowing up from inconsistent premises, and still get back meaningful information about issues this causes without the proof engine just saying that the consequences are consistent with the premises.

Interesting. I was vaguely aware of pytype, but I really thought it does something different. But yes, this seems very useful. Thank you!

I’m not trying to prove properties of arbitrary Python programs; the idea would be to prove properties of the type system and its correspondence to the runtime types. This, in general, is much more feasible.

Not saying that it isn’t an ambitious idea.

I know. I’ve actually lurked here for a while and read those discussions and the documents being worked on. The divergence between type checkers is one reason why I would love to see a formalization.

I am aware of this. While in some theoretical sense I love soundness, I also do think it’s not a pragmatic goal for Python typing. LSP on constructors is one case I know well, and the numeric tower one I remember running into. There’s also lots of cases like not calling super().__init__() and thus leaving fields unset (well, I don’t think the type system really deals currently with the set/unset dichotomy), overloading specific dunder methods in ways that break assumptions of type checkers, etc.

But I think accounting for all this should be quite feasible. Essentially, you work to show that any program that does not use these known-unsound constructs is sound. This results in an explicit understanding of exactly where that unsoundness lives in the language.

Someone working on extending the type system could use it to show that it doesn’t open new avenues of unsoundness, or to know exactly what.

That may well be!

Thanks for the feedback :slight_smile:

1 Like