Revisiting variance

carljm · July 20, 2024, 2:04am

I offered my thoughts on some of the more recent examples here over at Problems with TypeIs - #23 by carljm. The only other thing I’ll observe about those examples is that many of them use un-parameterized types like MutabilityIsADistraction or Collection, which are shorthand for MutabilityIsADistraction[Any] and Collection[Any]. As soon as you introduce Any, unsafety is to be expected. So a real hole in the type system should be demonstrable without using any Any types (explicit or implicit) in annotations (or via missing annotations.)

Regarding safety and user preferences: complete safety/soundness is absolutely not a goal of the Python typing specification. For one thing, it’s a gradual type system which doesn’t integrate with the language to insert runtime casts around uses of the gradual type (Any) – this already means that soundness is completely out of reach.

But even in the absence of Any, soundness is not the only or overriding goal. The goal is to provide a type system that is usable by Python developers and successfully catches as many bugs as possible. Static analysis of a very dynamic language is inherently a tradeoff between false positives and false negatives. Aiming solely for soundness aims to reduce false negatives to zero, but unavoidably the result will be an increase in false positives on code that is actually correct at runtime (given invariants the static analysis isn’t smart enough to see.) The way this curve tends to work out in practice, eliminating those last few false negatives will generally have a massive cost in increased false positives (and the proposals in this thread around variance most certainly exemplify this.) Generally the actual result of this (as in the Pyre example above) is that users just get frustrated and choose to silence the type-checker, and the end result is less safe and catches fewer bugs than if the type checker had chosen to accept a few more false negatives in order to achieve many fewer false positives.

So the Python typing specification will always accept some tradeoffs where the cost in false positives is too high to justify eliminating a few false negatives, if they don’t occur that frequently.

Also, it is a goal of the Python typing specification that it actually describes the behavior of Python type checkers, so that it is useful for both type checker implementers and type system users, not that it describes an impractical extreme which no type checker with actual users is willing to implement and few users want to actually use. So there will always be natural push and pull between the type specification and the practical realities faced by type checkers, and that’s good; it makes the type checkers better (and more similar in their behavior), and it makes the type spec more useful.