A more useful and less divisive future for typing?

erictraut · September 22, 2023, 10:38pm

@mikshardmind, thanks for starting this discussion. I appreciate the thought that went in to your post.

Python is a “big tent” programming language. It’s used by a diverse set of developers with different interests, needs, and priorities. The same is true of the Python static type system. The concerns of a library maintainer will be somewhat different from the needs of someone implementing a production web server, and that’s going to differ from the needs of a data scientist writing ML code in a notebook, etc. I think the Python community does a pretty good job balancing the needs of its diverse users. Wherever possible, new capabilities are added in a way that provides value to all Python users. In cases where a capability is targeted at a particular subset of users, attempts are made to avoid negatively impacting others who don’t care about that new feature. In cases where there are conflicts between the needs and priorities of subsets of users, compromises are usually worked out. Those compromises require people to come together and understand each other’s viewpoints and make a concerted effort to work out a solution that provides the best possible outcome for everyone. In other words, it happens through discussions like this one. So once again, thanks for starting this thread.

You said that “no mainstream Python type checker has a goal of overall type safety”. First, I think that the notions of “type safety” and “type soundness" are more ambiguous than you imply (more on this later). Second, I think that type safety is a goal of all of the mainstream Python type checkers. However, it’s not the only goal or the overarching goal. Today’s mainstream type checkers attempt to balance the goal of type safety against other goals like usability and simplicity. I will go so far as to say that a Python type checker that fails to strike this balance will never become “mainstream".

One way that type checkers accommodate different sets of users and needs is to support different modes. Let’s keep that in mind because it could serve as a potential solution as we continue this discussion.

You mentioned that there’s no good way to learn how to handle typing in Python. Can you elaborate on that thought? Are you primarily thinking of Python developers who are just getting started with static typing? Or are you thinking more about users who already know the basics about type annotations but want to use more complex features like generics, overloads, protocols, ParamSpecs, etc.?

You’ve suggested that an official reference document might be useful. When you say “reference document”, are you thinking a formal spec? I see value in creating a formal spec and I’m supportive of that effort, but I don’t think that most Python developers would find a formal spec very useful in learning the type system. Good user documentation and tutorials would better serve that need. There have been attempts to improve the user documentation for Python static typing, but progress has been slow due to limited time from volunteers. And writing good documentation is not easy.

I answer dozens of questions every week about static typing, so I have a pretty good sense for what’s tripping up users with the Python type system. The most common area of confusion, by far, is the notion of immutability, especially when it comes to the ubiquitous list and dict classes. Many users who are new to static typing are stymied when the type checker tells them that they cannot assign a “list[str]” to a “list[str | None]”. Interestingly, TypeScript avoids this issue because they chose to make arrays and maps covariant. This opens up a small hole in the type system, but it makes TypeScript much easier to learn than Python typing. That’s a good example of where the goal of “easy to learn” and “type safety” were in conflict. The TypeScript team favored “easy to learn” over “type safety” in this case, whereas the early designers of the Python type system went the other way.

The second-most-common area of confusion I see is related to type variable scoping and variance. This was the motivation behind PEP 695. Here we have an example of the typing community coming together to devise a solution that eliminates a barrier to using the type system.

You mentioned that different behaviors between type checkers makes it harder to learn the type system. Most users tend to run only one type checker on their code, so this perhaps isn’t as big an issue as you think. All Python type checkers tend to have the same behavior for basic type checks (e.g. determining whether a value with a particular type is assignable to a target variable or parameter with a declared type). Differences in behavior tend to be in areas that are more advanced and more subtle, for example in type inference behaviors or in the heuristics used to determine the best solution for a type variable when calling a generic function.

I’ve written a document that explains the main behavioral differences between pyright and mypy and the reasons for these differences. This will give you a sense for some of the many decisions that type checker authors need to make and the design goals that lead to certain decisions over others.

I mentioned above that “type safety” and “type soundness” are probably not as black and white as you think. There are some aspects of the type system that are more clear-cut such as “is type A consistent with type B?”. But there thousands of other cases where the answer is much less clear. Spec’ing all of these behaviors is probably not viable.

Here are some examples. Let’s not debate any of these here. I want to just give you a sense for what I mean.

Should LSP rules be enforced for __init__ methods?
If you have a tuple with indeterminate length (x: tuple[int, …]), should an unguarded index access (x[0]) generate an error?
If a class defines a property, should a subclass be allowed to override it with a simple variable or vice versa?
Should a type checker report type violations within a try block?
Should it be considered an error if an __init__ method fails to call super().__init__() in a way that allows the class to be used with multiple inheritance?
If a base class doesn’t declare the type of a class-scoped variable but a subclass does, should override type checks apply?
If a variable is assigned only within a for loop, should that variable be flagged as potentially unassigned if the length of the iterable is not statically known?
If a class definition uses a dynamic value (a variable) for one of its base classes, should that be flagged as an error?
If a class implements an __isinstancehook__, is it safe to perform isinstance type narrowing for that class?
If a value is assigned to an undeclared attribute of a function, should that be flagged as a type error?
Is it safe to assume, for type narrowing purposes, that a value assigned via a property’s setter will be returned by its getter? Under what circumstances is it not?
If a call expression uses an unpack operator and the length of the iterable is unknown but the callable has a fixed number of parameters, should that generate an error?
Should conflicts between __slots__ and class variables be reported by a type checker?

Answering many of these questions comes down to some combination of:

What principles have been established that cover this case?
What precedent have other type checkers established for this case?
Given what’s know about the target user for this type checker (or this mode), what behavior would be expected or desired?

Many of these decisions come down to the tradeoff between false positives and false negatives. If you have a condition that is theoretically exploitable in an extremely uncommon coding pattern but is provably safe in all other cases, should that be flagged as an error? Put numerically, if a condition is 99.9999% likely to be fine and only 0.0001% likely to represent an actual bug, would you want to see all of those cases flagged as an error in your code? Most developers would not, and they’d be annoyed if they needed to work around this error every time. What if the condition is 99% likely to be OK? 95%? 50%? Different developers have different thresholds. And most will modify their answer for production code versus non-production code.

You suggested that we should codify the purpose of the type system. I think that’s worth discussing, but coming up with a single answer may be difficult. As I said above, the Python ecosystem is diverse, and different subsets of users have different needs and priorities. It may be necessary to answer the question differently based on which group of users we’re talking about. Here’s a proposed taxonomy: “library authors and maintainers”, “developers who do not use type annotations in their code or use them sparingly”, and “developers who use type annotations throughout their code”. I’m open to other suggestions, but this is roughly the taxonomy I use when thinking about features and modes in pyright.

If we’re going to attempt to codify the purpose of the type system, I suggest that do so in terms of user value. An answer like “the purpose of a type system is to allow a type checker to enforce type safety” is hard to evaluate because it begs the question “what exactly does that mean?” and “why should a user care?”. If we frame the answer in terms of user value, I think it will help us find common ground.

You mentioned autocompletion and IDEs. Let me try to clear up some apparent misperceptions. Tools like pylance (the default language server for VS Code, which is built on pyright), MPLS (which predates pylance), Jedi (another language server which is used in VS Code and other editors), and PyCharm all use multiple sources of information to implement completion suggestions, signature help, semantic coloring, semantic rename, semantic search, refactoring tools, etc. This information typically includes some combination of library introspection, doc strings, static code analysis (inference), and type annotations. Of these, type annotations provide the best experience by far. Doc strings alone provide a very poor experience because they are often missing or incorrect, there is no standard format for doc strings, class and instance variables don’t have doc strings, doc strings can’t be accessed statically if the library is not shipped as Python source, and doc strings cannot express generic types or overloaded signatures. The developer experience gets demonstrably better if a library includes type information for its public interface. Note that this doesn’t require the library author to provide type annotations for their internal implementation, just the portion of the library that comprises the public interface.

If you’re a developer who is not interested in language server features but you are interested in static type checking, then I think you’ll agree that a library with type information provides significant value over a library with no type information. If your code imports from untyped libraries and your type checker treats all of these symbols as Any, you are going to miss a lot of potential bugs. It’s a huge boon to users of static type checking when library maintainers add type information.

I’ve interacted with many library maintainers who are in the process of (or are considering) adding type annotations. Attitudes toward static typing varies greatly across this group. Some are resistant to adding type information. Others embrace it readily and see advantages to both themselves and the consumers of their libraries. Most are somewhere in the middle. They see some benefits, but they’re waiting for the tooling to mature, and they’re taking cues from their consumers about the value of type annotations relative to other features they could add to the library.

You have mentioned that you would like Python type checking to be stricter than it is today. You’re not alone in this view, but I think it’s safe to say that you’re also not representative of most users of Python type checkers today.

I think it would be useful for this discussion if we could develop a better understanding of your motivations and needs. To the extent that you’re able to share details, how do you and your employer use Python? What requirements or needs are driving your desire for more strictness? If Python type checkers provided more strictness, what value would it bring to you and your employer?

You’ve alluded to some “red tape”, so I’m guessing that’s referring to compliance requirements of some sort? Are you able to provide any more details? Do you see value here beyond compliance, or is that the primary motivation?

Are you a library maintainer? You’ve mentioned “social pressure” to add type information to libraries. Is this something you’ve personally felt?

Developing a better understanding your needs, priorities, and motivations could help us come together and find a solution.