Typing Stability & Evolution

This is mainly continuation of comments in TypeGuard 724 PEP discussion. A lot of the recent comments are less about PEP and more about general typing stability expectations so feel it would be better to continue discussion here and leave original topic about TypeGuard proposal.

Some of my core thoughts are covered here and this extends few them and one proposal for a way of handling typing stability perception.

I agree strongly with this specific quote. I think there are three core aspects of type system today.

  1. Standard library typing features: This includes TypedDict, Paramspec, Annotated, overload, and many more features defined in python standard library and has runtime impact. This also includes syntax changes like improved generic typing/union support. This is partly coupled to python language although typing-extensions is very helpful way to decouple some typing features.

I think standard library/runtime typing changes should have high stability expectation and am in agreement with other comments that typing should follow similar stability promises as core language.

  1. Type System/Checker Behavior: This covers type specifications for how inference works and many type checkers mypy, pyright, pyre, pytype, pyanalyze, and more we have today. The current TypeGuard PEP I consider to fall entirely in 2 as written. There are no standard library/runtime changes involved here and each type checker may have it’s own policy and expectations for backwards compatibility. While mypy is very important and useful library it is not python language and similar to how other foundational libraries (numpy, django, flask, matplotlib) do not evolve through PEPs, type checkers should also be able to evolve without PEPs and determine their own stability expectations.

  2. Types in libraries/stubs: Typeshed is core part of this along with growing number of py.typed libraries. This also leads to typing instability especially for libraries where types are still relatively new/evolving frequently. Here I think individual libraries own both their runtime and types and are not part of typing council/language’s stability expectations.

I think distinction between 1 vs 2 is partly blurry for typing evolution because new typing features commonly jump to PEPs quickly. When a new feature like TypeGuard, Paramspec, TypeVarTuple, Intersection types, and more are being discussed it is often with plan that after discussion immediately work on PEP and then implement the PEP. Often the type checker implementation for PEP comes months/year+ after PEP is accepted. For a complex feature like TypeVarTuple how can we be confident in specification and inclusion in standard library if users can’t use feature until long after PEP has been accepted? And by bringing PEP as a proposal to language immediately I think it leads to higher stability/implied consensus on typing then present.

Instead my proposal would be for moderate/large typing features (intersections, paramspec, typevartuple all good examples) they should be proposed and agreed on within type checkers/typing ecosystem without standard libary support/PEP. The feature would still have a proposal document (TEP/Typing Enhancement Proposal) that Typing Council could review. A TEP could be structured very similar to PEP with main difference only being TEPs should never change standard library and are restricted to type checker/type system changes.

For these features it is then possible for them to be implemented in typing_extensions.experimental (or another library) and to have experimental phase. How long depends on complexity of that feature and time for it to be implemented and used. After users/type checkers have experience with the feature then the TEP can be converted to PEP with adjustments based on feedback. At that stage the feature can move to standard library and be viewed as more stable.

For complex feature like Intersections I think it will be very hard to have confident specification/understanding of many interactions it is with type system without experience using feature. Paramspec similarly have core behaviors (how does paramspec and method interact) that are undecided upon but likely would have been discovered if usage of the feature was possible before PEP acceptable. My code idea is to enable typing features to develop in clear separation from standard library during experimental phase and when ready for more stability/user expectations then it moves to standard library. Users who are comfortable using typing features on edge can explore them similar to how features are often added to typing-extensions before PEP is accepted. Other users can wait and have reasonable policy that only typing features that have reached PEP stage and graduated should be used.


This is the main point I disagree with here. The stdlib documentation explicitly says:

Using -> TypeGuard tells the static type checker that for a given function:

  1. The return value is a boolean.
  2. If the return value is True, the type of its argument is the type inside TypeGuard.

Maybe that should not have been included in the docs, but it was, and as a result I believe it’s now reasonable for people to assume from this that writing -> TypeGuard[int] is valid on any function that satisfies these two conditions.

I don’t honestly think that the new TypeGuard PEP is particularly important in the grand scheme of things. But I do think that the reliability of the Python documentation is important, and unfortunately the TypeGuard PEP has been affected by that.

I think the key lesson here is to be more mindful of the implications and commitments involved in documenting something in the stdlib docs - whether it’s “type checker behaviour” or not. By all means choose to define something in a PEP and don’t put it in the stdlib docs, but be explicit that you’re doing that, and that the reason is that you want to allow for future changes in behaviour, or to allow type checkers room to experiment. That gives a much better understanding of what’s stable and what isn’t going forward.

Maybe there’s also a need to go back over things that have been documented historically (such as the behaviour of TypeGuard) and agree some way of retracting the implied commitment of having put them in the stdlib documentation. I don’t really have an opinion on that, beyond wanting whatever is done to be explicit, and not simply “let’s ignore the issue, no-one’s cared about it before”.

Absolutely! I don’t know why type checker behaviour is being defined via the PEP process. If the typing community thinks that is the wrong thing to do, then I see no problem with that. Of course, adding new Python syntax, or new stdlib features, is subject to the PEP process, so you’d need to work out how to split the two aspects.

+1 to this. If the need for stdlib/language support can be avoided, that makes things a lot easier. This may require a change in approach - experimental features may have to do something like from typing_extensions import Intersection (where typing_extensions is a 3rd party library residing on PyPI). But that’s something that could be managed.

1 Like

I’m actually finding myself in the direct opposite of this. Typing features should exist to help users describe the types in their code and should be added with or without preexisting implementation support to the extent that they are considered justifiable additions to more accurately and ergonomically express code, and implementations should be expected to just check what is described. Predicating typing features on specific implementations and those implementations agreeing on a meaning of it misses that the language of the feature should just be unambiguous and strictly defined (as the ietf has discovered similarly over time with similar issues) to begin with.

Any slowdown needed to actually make it acceptable as standard lib inclusion with all of the breaking change considerations that will entail should help towards the proper consideration required for the language to be strict enough to provide stability while being well-defined enough to serve in accurately describing the types in code.

1 Like

(Meta: please write shorter sentences.)


This is not my experience. Pretty much all typing PEPs I can remember cite implementation of the proposed feature in at least one typechecker in the “Reference Implementation” section:

Having at least one reference implementation available in advance helps to identify corner cases that the PEP should discuss.

The PEP process seeks to review the design of a proposed feature that affects Python. Typing features of Python are Python features. I’m not sure what value is provided in introducing a parallel “TEP” process that is almost identical.

For particularly major/uncertain new typing features it makes sense to me to start with an implementation in the typing_extensions module, with an extended experimentation phase. But I do not think such a phase should be mandatory for all proposed typing features:

  • (Not)Required: Design & implementation was sufficiently simple that I do not think there would have been value in forcing it to go through an extra stabilization period in typing_extensions.

  • Intersection: I agree with @mdrissi that this upcoming feature looks pretty complicated. (It’s got an entire separate Git repro to hash out design issues!) Thus for this feature I do think it would make sense to have an initial implementation in typing_extensions to gather real world feedback in advance.


My comment was in response to comments that seemed to me to suggest typing features shouldn’t follow the same process as other Python features, though.

Either typing features are Python, and should follow Python processes and rules (including backward compatibility), or they aren’t and they can be experimental and rapidly changing.

What I object to is claims that even though typing features are Python features, they don’t need to adhere to the policies that apply to Python features, because (something to do with being new, or experimental, or optional, or not affecting runtime behaviour).


I think due to mypy being most common type checker, mypy implementation is in effect required for feature to get much usage in libraries. TypeVarTuple is good example. Some of major motivating use cases for TypeVarTuple are libraries like numpy for shapes in arrays. Numpy years after this PEP still does not use it because it’s waiting for mypy support to be complete. Mypy support is almost complete but still in progress now for TypeVarTuple.

Similarly TypeVarTuple is not possible to be used in typeshed for same reason. Typeshed is one place were a lot of early usage can happen.

For particularly major/uncertain new typing features it makes sense to me to start with an implementation in the typing_extensions module, with an extended experimentation phase.

I agree with this. I think threshold for major will vary per person. I would agree NotRequired/Required are not major because the behavior was already there with total, but NotRequired made it more readable. Annotated PEP is not major. Final PEP is borderline.

I think there’s a lot of confusion in this and related threads. I’d like to take the opportunity to draw a distinction between a) stability of the type system, b) stability of the day-to-day experience of using a type checker.

These two are very different! PEP 724 is about the former, but what comes to mind for most people is the latter.

First off, we do not change semantics of special symbols. Such symbols are introduced by PEPs and their behaviour is usually standardised to the extent specified in the PEPs. PEP 724 is to my knowledge the only typing PEP that has proposed a backwards incompatible change of semantics of a symbol. Let’s not overfit to a single example.

If a backwards incompatible change were to be made, it would have to be a PEP, which this is. And if it were a PEP, the backwards incompatibility would have to be seriously considered, which it is. See
this message from Jelle — we’re unlikely to actually make the backwards incompatible change.

(I’ll concede the exact nature of backward incompatibility matters. If this changed behaviour at runtime no one would even have written the PEP)
(I’ll also readily concede that PEPs insufficiently specify behaviour and am working to change that, but in practice this matters little to most existing users. Note in the PEP 724 case, the relevant behaviour is specified in 647)

experimental features may have to do something like from typing_extensions import Intersection

You’ll be happy to know we already do this; in fact, type checkers even have their own extension libraries (see mypy_extensions and pyre_extensions). But sometimes this process doesn’t raise issues to the extent that they result in change, in which case you can end up having a discussion like PEP 724.

Now what does happen frequently is minor changes to type checker behaviour, as mentioned here. These are usually fixing of bugs, or allowing the type checker to understand just a little more of Python code.

Here is a representative example. I make a change that allows mypy to better understand code (specifically objects with __call__ that is overloaded with a generic self type). mypy_primer reports that it gets rid of one unnecessary error in 10 million lines of code. Or maybe more relevant to pip, typeshed will make a change to more accurately reflect that email.message.Message.get* can return None. Add up enough times, and you’ll have to move around a few # type: ignore when you change type checker versions.

You can criticise mypy or typeshed for being too cavalier about these changes, or for being bad software or whatever. But confusing this kind of instability for type system instability is akin to saying “Various semantics of pip are governed by PEPs, therefore pip shouldn’t make user visible behaviour changes without following the same processes as Python”. This kind of criticism also better belongs on issue trackers.


This is well taken; it’s an excellent point and I’m definitely guilty of conflating the two.

Sometimes, a somewhat pernicious argument is made which conflates these two. The level of instability that we currently expect is sometimes cited as a reason for not caring at all about either kind of stability. That kind of comment tends to raise people’s hackles and results in unproductive discussions. I’m not sure how to best avoid these conflicts, but they are draining and unpleasant to read.

I also think there are different flavors of change we expect out of type checkers, all of which could be called “instability”, but only some of which are, in my experience, deserving of that name.

As a concrete example, until recently mypy would have required this ignore:

def foo(x: int | None = None) -> int:
    x = 1 if x is None else x
    def bar() -> int:
        return x+1  # type: ignore
    return bar()

This was changed – I would not even say “fixed” so much as “enhanced” – within the past year, as the feature “narrowing in outer scope is discarded in inner scope”. It’s a special case which allows for this usage even though the general case is unsafe (Common issues and solutions - mypy 1.7.1 documentation). So that ignore becomes unnecessary and would be flagged by --warn-unused-ignores.
Is mypy being “unstable” here? I think only under an extremely strict definition of stability which forbids most improvements or fixes.
(Aside: sorry if there’s some mistake in the example, please let’s not fixate on it too much. :slight_smile: )

Unnecessary ignores and redundant casts are almost in a different category from other changes. And these days, to my great delight, that seems to be the majority of the change that I see when a mypy release rolls out. The more I read, especially from the major threads in this nascent category, the more I think that’s because the maintainers are already prioritizing the kind of stability which is being asked for.

If I’m reading the tea leaves correctly, some of this may be a perception problem which will be healed with time.
However, the ongoing growth of typing features can also feed into this negative perception. e.g., I recall trying to use ParamSpec before mypy supported it, and it was an unpleasant experience – read a bunch of doc to make sure you understand it, try to use it, try to figure out why it’s not working, and then realize that you’ve been wasting your time. It felt like I had been misled by docs. Efforts to minimize the gap between new features and broad type checker support are therefore also worthwhile as part of the “stability story” for typing.