A more useful and less divisive future for typing?

First off, I apologize if any of this comes off as overly negative, it is my intent here to see if we can start a discussion about addressing some ecosystem pain points with typing that I personally believe stem from attempting to be more lax in what is specified (this may seem paradoxical at first to some). I know this is a lot of hard work and it may come across as critical of decisions made by various tools, their maintainers, and so on. I recognize that this isn’t an easy topic, and I’m not saying any of this specifically to call out anyone, any changes as a result of this would need to come from collaboration and considering a way forward that remains pragmatic and respectful of the needs of users.

Right now, typing is… more than a little divisive in Python. Some people want it to be stricter (myself included there), people who want it to be more optional than it is (see ecosystem effects like IDEs choosing to stop reading docstrings and only read type hints for autocompletion), and people who want to put things that aren’t even type information in the type system (typing.Doc discussion, as well as typing.deprecated)

I feel like the overall ecosystem is struggling here, even as people find parts of it beneficial, because there’s no clear long-term direction for typing, nor a clear answer to the question “Who is this for?” and more and more, it seems that the push forward in typing not having an answer to this is causing conflict.

I think there are a few issues here, and each of them is individually important to why it has come to feel this way:

  1. Type checking is ad-hoc mainly, with no formal specification, and no mainstream Python type-checker having a goal of overall type safety.* This point, in particular, led to another user posting something on python/typing (GitHub, not linked as the framing was inflammatory) which was rightfully closed and locked; While I share some of the sentiment and frustration of that post, I decided to wait a little bit to think about how to address the same points from a perspective of being productive.

  2. Typing is being used to augment tooling instead of the methods the same tools were using previously, rather than in addition to. This puts more work on library maintainers in order to make the tooling do less work (which seems backward to me at least…), and all the end users see is that something isn’t working with some libraries. The biggest offender here would be IDEs like Visual Studio Code which used to look at the AST and docstrings to provide autocomplete but don’t anymore, instead using typing. One of the most popular free code editing tools relying on typing being provided is an enormous social pressure for libraries to add typing even if they do not gain other benefits from doing so. (such as libraries that add stubs only, because the internals of the library are not implemented in Python, or not yet understood by type-checkers)

  3. People are putting non-type information in typing and annotations. This one might seem innocuous, after all, typing.Annotated exists specifically for this to play well with typing, but as was pointed out in the discussion about typing.Doc, there’s no clear way this should be used that is ergonomic if uninterested in typing.

  4. There’s no good way to learn how to handle typing in Python. Various type-checkers have diverging behavior, there’s no canonical document, and existing type-checkers don’t follow rules based on type theory, individually making decisions on what not to check based on their view of pragmatism. While that may seem to benefit users to be pragmatic, this adds to the inconsistency of the learning experience and reduces the maximum benefits a type system could provide.

  5. It’s clear from other languages that a type system can be both fully sound and completely optional and gradual. Julia in particular shows that not only is this possible, but that doing so can provide other benefits such as performance and multiple dispatch (the latter of which has also come up as something people want typing to be capable of in python)

All of this combined leads to a situation where I believe in the benefits of typing, but don’t think the current status quo is enough nor easy to teach and for people to want to adopt. Anecdotally, from those I’ve spoken to on it, most people are either already sold on the benefits of typing and see Python’s typing as deficient compared to other languages or see it as only a chore they shouldn’t need to do, and that tooling should just handle. These are opposite extremes, and I don’t think there’s ever going to be a “typing for everyone” so long as it remains inconsistent and difficult to teach.

I don’t say this with the expectation that addressing all of the issues, both technical and social, associated with typing will be easy, but I do think these issues are worth addressing.

I can see a couple of ways to work towards this, and as to not present only problems, but hopefully a possible path towards positive progress, one of them would be to do all of the following:

  1. Formalize the type system
  2. Have an official reference document for the type system that people can use to learn.
  3. Codify what the type system’s purpose is and what its goals currently are, and keep this documented and up to date with changes.
  4. Create two official classifications (corresponding PyPi classifiers?) that tools may self-identify using. One for tools that check the type system rigorously, and one for tools that use the type information, but may not be a full type checker (this allows there to be a space for tools that want to choose a level of pragmatism that may conflict with the current specification as well as those that use them for other purposes)
  5. Don’t special-case the type system for tools, let tools build around a solidly defined type system and if they need the type system to do more, make the case generally that it should do more with showing a specific benefit, rather than showing a special case that benefits but isn’t cohesive with the rest of the type system.

But this is only one possible path and doesn’t fully address how typing is socially pressured. I think this path would be worth pursuing, as there are other benefits from a fully sound type system that could lead to other benefits long term at a language level, such as later on allowing modules to opt into further optimizations based on their type information being correct at a module level, but there are other perspectives than mine which should be considered here. I don’t have an answer to the social pressure bit other than try to make the type system more consistent and support more so that people feel they have a way to learn it rather than fight with it.

* Recently in both the issue trackers for mypy (see note on only enabling in strict despite being a safety issue not a pedantic issue) and pyright, also only enabled in strict (see number 3 in comment), this was confirmed relating to a discussion with LSP violations in subclasses, and type checking on this violation not being done previously for “pragmatic” or “disruptive” reasons, even where maintainers had pointed out the unsafety of it. This happens to be one of the subtyping behaviors that actually was expressed in a PEP and which was already being correctly done in List vs. Sequence by both of them. I’d prefer if people not pile into those issues unless they have a constructive argument to add to them, this is only pointing out how the current status quo leaves more to type checkers in a way that decreases the usefulness of typing from a perspective of those wanting type safety from it.

8 Likes

We could do the same as in C compilers, introduce a flag or flags that add more checks at runtime, up to the users to use them or not.

It has long been the plan to start formalizing type checker behavior and providing living specifications, based on the existing PEPs, on https://typing.readthedocs.io/. Unfortunately, as is often the case, the limiting factor is volunteer time.

3 Likes

I’ll share a proposal within a week or so that will address some (though not all) of your points, creating a path towards a specification.

Overall, your list of issues looks diverse, and not all issues can be addressed by this group. (For example, we don’t have control over how VSCode does autocomplete.)

The biggest disconnect is probably around soundness. You clearly care deeply about making the type system sound, but for many type checkers full soundness has not historically been a goal. And for type checkers like mypy and pyright that are used by numerous projects, any changes now can have backward compatibility considerations, so even if the maintainers of these type checkers agree with soundness as a goal (they may not!), it will take them a long time to get there.

6 Likes

I think it was discussed before to have a set of tests to verify if a type checker is compliant with each typing feature. Apart from helping with the formalization of type checking behaviors, it will also make it easier for new type checkers to be written.

2 Likes

If we would have either a more formal typing spec (perhaps also with canonical representations) and/or a set of tests verifying compliance, then those considerations could be perhaps be handled by having a “strict” and “non-strict” mode (as Mike mentioned).

As to the different groups that Mike sees (the strict ones, the optional ones and the “freewheelers” – my term – who want to use __annotations__ for their own nefarious purposes), I find myself in all of them… One day I’m in the first and the another day in another group. I wonder if ultimately the effect of all the typing work (and future work) will just be that it makes it easier for people to adopt Rust instead of Python :upside_down_face:

1 Like

I look forward to it and hope it helps with at least some of the diverging behavior bits or lack of having a common set of terms that new things can be discussed using. The recent discussion about subtyping, Never, and so on has shown how the lack of agreed-upon definitions can hurt even discussion of the system as it exists now, let alone how one might want to change it.

Yeah, I don’t have good answers for this in particular. It seems that tooling will naturally want to use things that are supported by the language, this included, but I think the issue with this is somewhat magnified by the fact that there isn’t a good canonical typing reference right now for users to not feel like they are fighting the type system because of the choices external to what can be accomplished here, so while it’s a factor, it’s a factor we can’t directly address here, only be mindful of how other decisions may change the perception of how much that impacts users.

For me personally, yes this would be the largest disconnect. I view the lack of soundness as a place for divergent behavior between type-checking implementation to emerge, as an unnecessary lack of safety that could be provided by typing, and as a way where the type system becomes harder to teach and harder to soundly build upon because it doesn’t follow rules people may be familiar with either from theory or other sound type systems in other languages. For languages with type systems I would consider worth comparing, those would be Julia, Elixir, Haskell, Rust, OCaml, and Kotlin. Julia, in particular, is interesting as a comparison as types are entirely optional and allowed to be gradually added, while being as rigorous as all of the others there when provided. Elixir is in a similar position, but the parallels are harder to draw.

The lack of soundness in particular has led to a discussion internally at my work about either implementing our own type checker which does enforce soundness or not using Python in a specific context. I don’t hold either of these options as great and largely a product of somewhat unneeded red tape, but the decision on the level of acceptable risks is outside of my direct influence. Creating our own type-checker without the broader community buying into soundness wouldn’t help, it would only create further divergence between type-checking implementations by adding yet another implementation that does things differently.

Quite bluntly, I don’t think something should claim to be a type checker if its primary purpose is not to check for soundness based on type information with what is currently known to be sound. Anything short of that is just another linter, albeit one that can leverage type information.

That comes into a philosophical difference though that isn’t necessarily useful to what comes next, but I would hope that if someone has bought into the benefits of typing, they would see that a system that can be self-consistent because it has chosen soundness requires less work to understand as there don’t need to be special cases where “pragmatic” decisions interact with each other, and is therefore also easier to teach and harder for there to be a meaningful divergence between type-checkers.

I should also emphasize that I don’t think a sound type system requires everything to be typed or for adding type information to be any harder than it is today, I expect it to be easier because the consistency provided by a sound, self-consistent system should make it easier to teach and document the “right” way, but I do acknowledge that changing existing tools to enforce soundness would cause some projects to need to fix things that are not currently enforced, and it’s definitely something that needs to be kept in mind.

I would be very surprised if people picked rust for type safety as their first choice leaving Python unless they have some other very specific needs as well. I would expect that most people starting from python but wanting stronger benefits from typing as their primary goal would be better served by either Kotlin or Julia as both would have more similarities with their current code.

1 Like

Kevin Millikin wrote a draft specification for the basic building blocks of the type system: Python Static Types I: Subtyping - Google Docs. Should we adopt it or perhaps find a productive way to iterate on it as a community?

Since pytype enforcing soundness has come up fairly often, I collected a list of writing on the topic that I felt was interesting. Scroll to the end of the page here: Pytype’s Type System | pytype

Pytype currently opts not to enforce soundness, but the linked articles have arguments both for and against.

I guess to be more clear on what amount of soundness and consistency in the type system I value since soundness can be seen as somewhat a spectrum, maybe it helps for finding some common ground: Things which the type system has language for should be checked by things claiming to type check, and the system and tools using it should be internally consistent.

The recent example with variance of mutable fields of subclasses violated both of those conditions. (Not checking something described by the type system, inconsistent with handling of variance on list vs sequence)

In my view, the type system doesn’t have to support all possible code, even though that would be great as a long-term goal, it would require the type system to understand things that are dynamic in nature at runtime, and while we could describe the limits of what we expect of that dynamism to the type system, that seems further away from being a realistic goal, at least at this point in time. So that’s not my goal here. My “ideal” here is not that everything needs to be described and checked, but that everything that is described is checked.

1 Like

I’d like to add one more concern to your already excellent list, @mikeshardmind: I recently started looking into how to leverage type annotations for runtime validation and conversion. I realize that there is a fair number of libraries out there but many of them have limitations in expressivity or soundness I don’t particularly care for. Even though typing’s documentation is reasonably good at summarizing what we can annotate with its exports, there is no good documentation I could find on how to reason about type annotations in code. It would also seem that there are common tasks such as binding type variables that would benefit from a prominent library. Think the packaging of type checking.

@mikshardmind, thanks for starting this discussion. I appreciate the thought that went in to your post.

Python is a “big tent” programming language. It’s used by a diverse set of developers with different interests, needs, and priorities. The same is true of the Python static type system. The concerns of a library maintainer will be somewhat different from the needs of someone implementing a production web server, and that’s going to differ from the needs of a data scientist writing ML code in a notebook, etc. I think the Python community does a pretty good job balancing the needs of its diverse users. Wherever possible, new capabilities are added in a way that provides value to all Python users. In cases where a capability is targeted at a particular subset of users, attempts are made to avoid negatively impacting others who don’t care about that new feature. In cases where there are conflicts between the needs and priorities of subsets of users, compromises are usually worked out. Those compromises require people to come together and understand each other’s viewpoints and make a concerted effort to work out a solution that provides the best possible outcome for everyone. In other words, it happens through discussions like this one. So once again, thanks for starting this thread.


You said that “no mainstream Python type checker has a goal of overall type safety”. First, I think that the notions of “type safety” and “type soundness" are more ambiguous than you imply (more on this later). Second, I think that type safety is a goal of all of the mainstream Python type checkers. However, it’s not the only goal or the overarching goal. Today’s mainstream type checkers attempt to balance the goal of type safety against other goals like usability and simplicity. I will go so far as to say that a Python type checker that fails to strike this balance will never become “mainstream".

One way that type checkers accommodate different sets of users and needs is to support different modes. Let’s keep that in mind because it could serve as a potential solution as we continue this discussion.


You mentioned that there’s no good way to learn how to handle typing in Python. Can you elaborate on that thought? Are you primarily thinking of Python developers who are just getting started with static typing? Or are you thinking more about users who already know the basics about type annotations but want to use more complex features like generics, overloads, protocols, ParamSpecs, etc.?

You’ve suggested that an official reference document might be useful. When you say “reference document”, are you thinking a formal spec? I see value in creating a formal spec and I’m supportive of that effort, but I don’t think that most Python developers would find a formal spec very useful in learning the type system. Good user documentation and tutorials would better serve that need. There have been attempts to improve the user documentation for Python static typing, but progress has been slow due to limited time from volunteers. And writing good documentation is not easy.

I answer dozens of questions every week about static typing, so I have a pretty good sense for what’s tripping up users with the Python type system. The most common area of confusion, by far, is the notion of immutability, especially when it comes to the ubiquitous list and dict classes. Many users who are new to static typing are stymied when the type checker tells them that they cannot assign a “list[str]” to a “list[str | None]”. Interestingly, TypeScript avoids this issue because they chose to make arrays and maps covariant. This opens up a small hole in the type system, but it makes TypeScript much easier to learn than Python typing. That’s a good example of where the goal of “easy to learn” and “type safety” were in conflict. The TypeScript team favored “easy to learn” over “type safety” in this case, whereas the early designers of the Python type system went the other way.

The second-most-common area of confusion I see is related to type variable scoping and variance. This was the motivation behind PEP 695. Here we have an example of the typing community coming together to devise a solution that eliminates a barrier to using the type system.


You mentioned that different behaviors between type checkers makes it harder to learn the type system. Most users tend to run only one type checker on their code, so this perhaps isn’t as big an issue as you think. All Python type checkers tend to have the same behavior for basic type checks (e.g. determining whether a value with a particular type is assignable to a target variable or parameter with a declared type). Differences in behavior tend to be in areas that are more advanced and more subtle, for example in type inference behaviors or in the heuristics used to determine the best solution for a type variable when calling a generic function.

I’ve written a document that explains the main behavioral differences between pyright and mypy and the reasons for these differences. This will give you a sense for some of the many decisions that type checker authors need to make and the design goals that lead to certain decisions over others.


I mentioned above that “type safety” and “type soundness” are probably not as black and white as you think. There are some aspects of the type system that are more clear-cut such as “is type A consistent with type B?”. But there thousands of other cases where the answer is much less clear. Spec’ing all of these behaviors is probably not viable.

Here are some examples. Let’s not debate any of these here. I want to just give you a sense for what I mean.

  • Should LSP rules be enforced for __init__ methods?
  • If you have a tuple with indeterminate length (x: tuple[int, …]), should an unguarded index access (x[0]) generate an error?
  • If a class defines a property, should a subclass be allowed to override it with a simple variable or vice versa?
  • Should a type checker report type violations within a try block?
  • Should it be considered an error if an __init__ method fails to call super().__init__() in a way that allows the class to be used with multiple inheritance?
  • If a base class doesn’t declare the type of a class-scoped variable but a subclass does, should override type checks apply?
  • If a variable is assigned only within a for loop, should that variable be flagged as potentially unassigned if the length of the iterable is not statically known?
  • If a class definition uses a dynamic value (a variable) for one of its base classes, should that be flagged as an error?
  • If a class implements an __isinstancehook__, is it safe to perform isinstance type narrowing for that class?
  • If a value is assigned to an undeclared attribute of a function, should that be flagged as a type error?
  • Is it safe to assume, for type narrowing purposes, that a value assigned via a property’s setter will be returned by its getter? Under what circumstances is it not?
  • If a call expression uses an unpack operator and the length of the iterable is unknown but the callable has a fixed number of parameters, should that generate an error?
  • Should conflicts between __slots__ and class variables be reported by a type checker?

Answering many of these questions comes down to some combination of:

  • What principles have been established that cover this case?
  • What precedent have other type checkers established for this case?
  • Given what’s know about the target user for this type checker (or this mode), what behavior would be expected or desired?

Many of these decisions come down to the tradeoff between false positives and false negatives. If you have a condition that is theoretically exploitable in an extremely uncommon coding pattern but is provably safe in all other cases, should that be flagged as an error? Put numerically, if a condition is 99.9999% likely to be fine and only 0.0001% likely to represent an actual bug, would you want to see all of those cases flagged as an error in your code? Most developers would not, and they’d be annoyed if they needed to work around this error every time. What if the condition is 99% likely to be OK? 95%? 50%? Different developers have different thresholds. And most will modify their answer for production code versus non-production code.


You suggested that we should codify the purpose of the type system. I think that’s worth discussing, but coming up with a single answer may be difficult. As I said above, the Python ecosystem is diverse, and different subsets of users have different needs and priorities. It may be necessary to answer the question differently based on which group of users we’re talking about. Here’s a proposed taxonomy: “library authors and maintainers”, “developers who do not use type annotations in their code or use them sparingly”, and “developers who use type annotations throughout their code”. I’m open to other suggestions, but this is roughly the taxonomy I use when thinking about features and modes in pyright.

If we’re going to attempt to codify the purpose of the type system, I suggest that do so in terms of user value. An answer like “the purpose of a type system is to allow a type checker to enforce type safety” is hard to evaluate because it begs the question “what exactly does that mean?” and “why should a user care?”. If we frame the answer in terms of user value, I think it will help us find common ground.


You mentioned autocompletion and IDEs. Let me try to clear up some apparent misperceptions. Tools like pylance (the default language server for VS Code, which is built on pyright), MPLS (which predates pylance), Jedi (another language server which is used in VS Code and other editors), and PyCharm all use multiple sources of information to implement completion suggestions, signature help, semantic coloring, semantic rename, semantic search, refactoring tools, etc. This information typically includes some combination of library introspection, doc strings, static code analysis (inference), and type annotations. Of these, type annotations provide the best experience by far. Doc strings alone provide a very poor experience because they are often missing or incorrect, there is no standard format for doc strings, class and instance variables don’t have doc strings, doc strings can’t be accessed statically if the library is not shipped as Python source, and doc strings cannot express generic types or overloaded signatures. The developer experience gets demonstrably better if a library includes type information for its public interface. Note that this doesn’t require the library author to provide type annotations for their internal implementation, just the portion of the library that comprises the public interface.

If you’re a developer who is not interested in language server features but you are interested in static type checking, then I think you’ll agree that a library with type information provides significant value over a library with no type information. If your code imports from untyped libraries and your type checker treats all of these symbols as Any, you are going to miss a lot of potential bugs. It’s a huge boon to users of static type checking when library maintainers add type information.

I’ve interacted with many library maintainers who are in the process of (or are considering) adding type annotations. Attitudes toward static typing varies greatly across this group. Some are resistant to adding type information. Others embrace it readily and see advantages to both themselves and the consumers of their libraries. Most are somewhere in the middle. They see some benefits, but they’re waiting for the tooling to mature, and they’re taking cues from their consumers about the value of type annotations relative to other features they could add to the library.


You have mentioned that you would like Python type checking to be stricter than it is today. You’re not alone in this view, but I think it’s safe to say that you’re also not representative of most users of Python type checkers today.

I think it would be useful for this discussion if we could develop a better understanding of your motivations and needs. To the extent that you’re able to share details, how do you and your employer use Python? What requirements or needs are driving your desire for more strictness? If Python type checkers provided more strictness, what value would it bring to you and your employer?

You’ve alluded to some “red tape”, so I’m guessing that’s referring to compliance requirements of some sort? Are you able to provide any more details? Do you see value here beyond compliance, or is that the primary motivation?

Are you a library maintainer? You’ve mentioned “social pressure” to add type information to libraries. Is this something you’ve personally felt?

Developing a better understanding your needs, priorities, and motivations could help us come together and find a solution.

20 Likes

@erictraut thanks for the detailed reply, I’ll do my best to address specific points as I can, but if I don’t respond to something you’d like more info on, please point it out, I’d rather discuss as possible towards common ground on this. This is going to be a very long response, and if anyone skims it, I’m not going to hold it against them

To be clear on this point in particular, I don’t think the existing documentation is enough for people at multiple stages of the learning process, but it’s worse at the very beginning (think about a new typing user not getting why Sequence[str] allows str until they are bitten by it after they just changed it because list[str] wasn’t quite right, or users first trying to figure out very simple “what goes in goes out” TypeVars), and also near the edges of what either isn’t supported or is barely supported (consider complex cases with overloads typevars and generics all in play, or users who find themselves wanting HKTs). Overloads in particular are made more complex in python than in some other languages due to them being allowed to overlap and the ordering mattering.

In the “middle” where most people end up if they stick with typing, I think the documentation and tooling serve reasonably well, though there are still some ugly cases around variance that even those who work with it make mistakes on when reasoning about at times (guilty myself here) as well as generics with multiple parameters, their names not being useful enough on their own, and simply remembering what each parameter actually corresponds to (anyone that’s had to type a Coroutine and/or Generator where Awaitable and/or Iterable were not enough has probably come into this one)

I’ve been in a position of wanting to support multiple type checkers and the differences making that somewhat an unrealistic goal in the past. I have not fully reevaluated that recently, but your document still matches with one of the larger issues of inference and why I settled on pyright when it became infeasible to support multiple (not impossible, but more effort than was reasonable, and it at the time would have required either restructuring code or assertions for mypy to get certain parts correct)

I agree that both are a spectrum. I place a higher value at the correctness end of it (for reasons I’ll elaborate on somewhat below), but I’m also not rushing to go use Typed Racket, and while Elm is nice, I wouldn’t pick it for most problems.

I think it’s better to make a correct system easier to use than to make compromises in correctness because it is easier. I think getting formal correctness correct on the type system side is hard. We can make it easier on ourselves if we stick to what is correct and provable, and conversely, it is made harder when deviating on a case-by-case basis, as it removes or makes it significantly more difficult to rigorously show that a system remains self-consistent as exceptions are layered on.

When it comes to the end-user side of the equation, there’s a catchy phrase here in a related, but different domain: “Security should be idiomatic”… I feel much the same about correctness. Rather than compromising at a system level for what is actually correct because we think we may make it hard on users, we should be finding ways to make it easy for users to do the correct thing.

I don’t think that getting there (process and implementation-wise) is going to necessarily be easy, but I think it’s the best long-term goal when considering all possible benefits and a user-first perspective.

I think this makes the case for a level above “strict” as current type checkers use the term, but does not make the case that it should not be something that is checked. However, I understand that maintaining such a set of rules costs time. Part of why I think a full formal specification rooted in theory would be of more value here is that we can build on what already exists for verifying types in a system for which a machine-proof could be made. We would then have a type system that at it’s ideal could check everything it describes, but various kinds of violations could have configurable levels based on views on pragmatic errors, how it interacts with other code (public exports), and tolerance for false positives/negatives. As for the specific examples, while I have no desire to debate them at this time, I have provided my perspective on these below for the sake of comparing perspectives.

Agreed full stop. While I think that the primary purpose of a type system should be to provide the capability to analyze the type information and that one of the strongest reasons for this is to detect unsound code from a structural standpoint, answering it in terms of user value should be important here.

I want “a self-consistent type system in which everything it can describe is checked”.
From this, I want a level of guarantee about program structure and especially about library stability when it comes to those who would extend and use my code for further use. If I write something that rigorously is well-typed, my users should be able to safely extend and build upon it, but this isn’t fully the case right now. For instance, the use of type[T] is somewhat unsound if relying only on type analysis for reassurance, when this is something that should absolutely be checkable at the type system level. (__init__ and a few other issues, one of which has been recently resolved in some (but not all) of the typing ecosystem)

I can provide some additional clarity here, but if I’ve left out something or something seems to not quite add up it’s likely that avoiding a detail I cannot disclose has introduced some vagueness that I did not spot while editing this. It’s very easy to have blindspots of what “should be clear” when removing information, and it’s unlikely I will be able to further clarify.

The current state of affairs has me currently looking to (on a 2-3 year window) either move 1-2 critical projects from Python to another language after evaluating our available dev resources and the languages we have enough developers to support or use some of those available resources to improve our tooling to be in compliance while continuing to use Python.

This is in response to increasing demands on robust code that have led to those writing policy to want all projects with certain risk thresholds associated with them to conform to certain ideals. I think the level of “type system enforced safety” actually specified and that applies here (I can’t share the actual documents, but I can and have argued for things which would help here) is slightly more than we need, but still reasonably possible to provide, even in the “big tent” of Python development. I have personally argued for more than I would need under these policies as I do believe that a consistent underlying system could provide for multiple audiences.

While compliance is a motivation, it is not the sole or even primary motivator. I have options available to me when it comes to compliance, and while it appears that switching away from Python could become necessary, and I find it somewhat distasteful on a personal level, professionally I’m being paid to make the best decisions given a set of criterion, and will meet those as best I can with or without Python being a viable option.

I do see a value in the related ideals when it comes to making more robust code that is less subject to critical issues. I find this especially true when it comes to code that is internet-facing, which is increasingly the case as Python has gotten quite good at handling network-bound code with asyncio.

I can say that in a network-bound system, I’m weighing both Go and Elixir as alternatives at this point in time, even with Elixir’s type specifications being just as optional as Python’s. I’m also considering the costs of continuing to use Python and a slight restricting of code and some internal rules that restrict the allowed type specifications further to what would be enough.

There’s another system that’s part of a data processing pipeline, but we need to handle malformed data as well as potentially maliciously crafted data (each by detection and rejection). This one I personally think we already had adequate protection layered into at a logical level, but the powers that be said “no” with recent re-evaluations. Julia (again, completely optional type specifications) looks like the best candidate if replacing here as much of the code would be nearly the same, and gain some performance in the process, but I have not done enough exploration yet.

There are other various bits of code predating the updates to policy that were already not written in Python due to some other prior policies, and at least one where Python was actually specifically chosen because it could be so resilient for the given risk profile, so I’m aware of various tradeoffs and some varying ability to mitigate issues and layer on tooling to check for different kinds of issues.

I don’t maintain any widely supported public python code anymore. any non-archived code I have available publicly I do maintain, but I get very little in the way of requests on these, and these are fully typed code bases that I’ve gone through the effort on already.

I maintain several internal to my company python libraries and used to be a maintainer on a public, widely used framework. I experienced the pressure to add type information from two separate camps (those who wanted to start leveraging it for analysis, and those who wanted better IDE completions) around the time of python 3.6, but I personally would have wanted to add the type information anyway for those same reasons. I have witnessed a lot of negativity towards typing, less so recently, but there have been resurgences of it as well as people just being reminded of it feeling intrusive to them in some cases, such as in the conversation about typing.Doc.


Now, I’m going back to catch a few things that I did not think flowed well in the same order of response…

Largely needs to be addressed with the tooling maintainers, not here but...

without quoting the whole section or getting to into the weeds on this, there’s some very clear negatives that have come from it as well, though as these are not actionable here I may collect a list later of things I think there are room for improvement that I’ve seen with pylance and take it to pylance. This comment was included in the original post to show how things that build on typing can create a negative feel around typing as tools can feel dumber when it feels like something has been replaced with the new thing that should improve the situation, but where the behavior has gotten worse for how people actually use it. (a mismatch in priorities or at least outcomes as one might have it)

The big one of note in pylance is that “jump to definition” at some point switched to preferring jumping to the typeshed or other available stub over the actual definition, and there’s no corresponding “jump to implementation” to match that behavioral change. pylance also does not appear to use docstrings or the AST at all, even when typing does not exist, so while you say it typically uses a combination of information, this does not match my experience as a user. Perhaps there’s a configuration I’m missing to make this happen, but the fact that it isn’t clear to me and I’m quite invested in typing doesn’t leave me much hope that the average user experience on this is better.

While not debating the merits or tradeoffs of these or fully explaining them, I’m going to answer my ideal for your examples because you are interested in knowing where my needs are and this is something I can do. I’ve left these in a collapsible section. If you’d like to discuss further on these, I’m willing; There may or may not be a productive conversation possible on where we differ here at this point in time, if there isn’t, I hope there could be when we get further into the process. Either way, I think that for some of these to be checked also requires it to be easier for users to see an error and intuitively understand what the error is actually asking them to change.

Answers to your examples
  • Should LSP rules be enforced for __init__ methods?

Yes. If you need alternative construction, use a classmethod. If you know alternative construction may be needed in a subclass, use *args and **kwargs appropriately to support this in parent classes. I think in the “ideal” system, this would be detected always, warned at a strict level by default, with it elevated into basic if the type is publically exported due to type[T]

  • If you have a tuple with indeterminate length (x: tuple[int, …]), should an unguarded index access (x[0]) generate an error?

This would not be a type error, but could reasonably be warned for by leveraging the type information. This may fall under a deficiency in the typability of iterables a later answer will say more on.

  • If a class defines a property, should a subclass be allowed to override it with a simple variable or vice versa?

No, the semantics of this aren’t guaranteed to hold in all uses of replacement. (consider uses of type[T] rather than only T for a given type T, not necessarily a typevar T)

  • Should a type checker report type violations within a try block?

Yes. For tests, there are more appropriate methods in testing frameworks, and the runtime case when receiving an unexpected type can be handled using typing.assert_never, which gives the ability to handle code the type checker should assume is unreachable, but misuse could lead to while a user may still need to handle, such as handling for a deprecated, and later removed case still providing a meaningful error because keeping the case there has no runtime impact in the “good” case.

  • Should it be considered an error if an __init__ method fails to call super().__init__() in a way that allows the class to be used with multiple inheritance?

Yes, but this isn’t a type error and is likely a code smell for complex inheritance that should be simplified or better structured.

  • If a base class doesn’t declare the type of a class-scoped variable but a subclass does, should override type checks apply?

Yes. Class vars should be treated with more scrutiny than most things. Their mutability and nature can very easily lead to bugs or inadvertent behavior.

  • If a variable is assigned only within a for loop, should that variable be flagged as potentially unassigned if the length of the iterable is not statically known?

yes, and I’d say so even under “basic” type-checking rules. Unbound locals due to loops can go undetected for a while if 99.9% of the time you handle some amount of data and then one day you get an empty job for whatever reason.

  • If a class definition uses a dynamic value (a variable) for one of its base classes, should that be flagged as an error?

Under the current type system? Yes. I believe it would be possible to improve the type system to where some level of knowledge could still be properly retained in this case, but that, at least given the decision to allow subclassing Any, this should require an explicit cast to Any or a specific type/protocol to be valid currently.

  • If a class implements an __isinstancehook__, is it safe to perform isinstance type narrowing for that class?

No, but this is an explicit escape hatch for it, so given that, the tooling should defer to the escape hatch being used. tooling might provide a configuration to disallow it anyhow to prevent being caught unaware by a library they use using this.

  • If a value is assigned to an undeclared attribute of a function, should that be flagged as a type error?

Currently, yes, unless the type is actually a protocol that implements __call__ and that attribute. If intersections are accepted, this should become possible to type expectations more cleanly, but should still require the user to silence the dynamic behavior where first used. (for the record, that specific protocol solution exists in real world, public code)

  • Is it safe to assume, for type narrowing purposes, that a value assigned via a property’s setter will be returned by its getter? Under what circumstances is it not?

Sometimes. It isn’t in the presence of a specific getter, in which case the getter is what should be used, but I think it could be reasonable to restrict setters and getters for the same property to the same values as a complexity tradeoff on the side of tooling.

  • If a call expression uses an unpack operator and the length of the iterable is unknown but the callable has a fixed number of parameters, should that generate an error?

This is the only one here for which I am of two minds, and it’s largely due to a deficiency in the typing of iterables. Tuple is the only iterable that is allowed to be heterogeneous and sized with the type information preserved on a per-element basis. I think it would currently be a reasonable decision to warn for this, but at either strict or some hypothetical “pedantic” level above strict.

  • Should conflicts between __slots__ and class variables be reported by a type checker?

Yes.

2 Likes

Not wanting to derail this discussion, but can you explain where the “small hole” lies here?

If we consider typescript’s behavior, but in python’s language, because there could exist other references to that list and lists are mutable, the thing expecting list[str | None] could modify something known as a list[str] appending None. Now there’s something known as list[str] in another context, but with a None at the end. Python type checkers avoid this one, and I’m glad for it.

3 Likes

Except for those of us who use Visual Studio Code with the Pylance language server and mypy as a separate tool (like me for most projects), there really are two type checkers providing feedback. As you describe further down in your thoughtful answer, Pyright’s output is surfaced in very different way than mypy’s. But there are two of them and I have seen them diverge before (prompting me to learn about Pylance and Pyright).

3 Likes

“Who is this for?"

IMHO:

  • Typing annotations provide the most value to long-lived Python programs that are expected to become large and/or be maintained by many individuals.
  • Type annotations prevent some of the most common classes of programmatic errors:
    • Misspelled/broken references to functions/methods and variables.
    • Failure to handle potential Nones.
    • Missing imports (especially of exceptions).
  • Type annotations provide reliable machine-verified documentation that cannot get out of date.

Long-term direction for typing

IMHO:

  • The type system should be extended over time to describe & characterize common patterns used in real-world Python code.
    • That is, I propose the same kind of direction that TypeScript appears to have in characterizing patterns seen in real-world JavaScript code, even if some of those patterns are messy.

Yes. There is a longstanding issue to write typechecker-agnostic HOWTO documentation.

As far as I can tell, there hasn’t been enough energy for anyone to step up and do this yet. I know I myself have not yet found enough bandwidth to write the section on TypedDict that I promised to a while back. (I have not forgotten.)

Indeed I expect disagreement here: I do not personally think that a lack of (perfect) soundness disqualifies a type checker as being useful, as I have argued before:

Also, as Eric mentioned, the definition of “sound” is not exactly black & white.

2 Likes

I’ve been watching this and similar threads scroll by, not able to really to engage with the discussions in any meaningful way, but still finding my concern growing with each new thread. I will add my 2¢ from the peanut gallery though.

As someone who spent most of his career (now retired) writing Python code which was only dynamically—but strongly—typed, I view type annotations with some distaste. I’ve experimented with them in small ways just to get a feel for the basics, but in the end, have turned away from them. They definitely feel grafted on to me. When I read some questions people post about them I think to myself, “What in the heck are they trying to do? Is this complexity really necessary?” As a recent reply in this thread indicated, type annotations will be most beneficial on large, long-lived, multi-person projects. I suspect though that there are plenty of instances where they buy the programmer little or nothing which wasn’t present in such subscription concrete ways.

My understanding was that type annotations were always supposed to be optional. Technically, I suppose they still are. However, as tooling shifts to demand more and more metadata be specified through annotations or open source projects require type annotations in all submissions, I fear it has become optional in name only. I think at some point a fence will need to be built around type annotations to prevent them from infecting the entire ecosystem.

Fredrick Lundh (RIP) once commented that for him, version 1.5.2 was the perfect version of the language. Everything which came after that was just so much extra baggage Maybe he was kidding, but I kind of doubt it. I’m sure I’m misremembering his statement to some degree, but I still think the general sentiment holds. Python used to be this almost tiny language a person could learn in a couple hours from the tutorial, then hold in their head without any further effort or reference, the perfect first programming language. The complexity was in the rich set of libraries which could be absorbed piecemeal, as one’s needs evolved. Such seems to be no longer the case.

Maybe, like JavaScript, someone will eventually feel obligated to write a book entitled Python: The Good Parts. If so, I hope they leave out static type annotations.

7 Likes

I just want to mention another use case here, important to me personally. I design programs starting with types. Whenever possible I use a language’s type system to model the problem domain first. This explores the entities involved, attributes, relationships, cardinality and so on. I’ve found this to be a very productive way of designing programs of any size, as long as the type system is sufficiently expressive. Although Python doesn’t have algebraic sum types, I find I can get a long way through a design with dataclasses, enums, unions, and the occasional shallow class hierarchy.

After I’ve created some data model, functions and methods define the interactions and transformations between types. So for me, in addition to all the other use cases mentioned, type annotations serve as a design system.

It’s just my personal preference, and perhaps not that common of a use case, but I feel I don’t see this mentioned much in discussions about the purpose of type annotations or specific typing features.

2 Likes