Static type annotations in CPython

At the Core Dev sprint in Bellevue we had a brief [1] discussion about static type annotations in CPython. I want to encapsulate what was discussed and see if the wider core dev community agrees before trying to commit some of this to the devguide.

When static type annotations became a thing, the clear decision was made that they should not, for the time being, go into the standard library. There’s a separate project, typeshed, that provides type annotations for the stdlib instead. This provides more flexibility for the evolving static typing space to adjust annotations that doesn’t involve waiting for Python releases (not to mention awkward backward compatibility concerns). The recommendation to type checkers and other tooling wanting type annotations has always been to use typeshed, not annotations that are part of the stdlib.

Where we are now, though, is that we have several bits and pieces of CPython that do use type annotations, and to good effect. Build-time tools like Argument Clinic use them. The test runner, regrtest and libregrtest, uses them. The private implementation of the new REPL in 3.13 uses them. A few third-party imports into the stdlib (like tomllib, importlib.resources and importlib.metadata) use them. Also, a few type annotations have snuck into other parts of the stdlib by accident (e.g. cpython/Lib/multiprocessing/connection.py at main · python/cpython · GitHub). For the build tools that use them, as well as regrtest and PyREPL, we have CI set up so mypy tests them. I believe we don’t have it set up for importlib.resources and importlib.metadata, but those are concurrently maintained outside of our repo and tested there. tomllib had annotations when we pulled it into the stdlib, but the annotations are not currently checked by anything.

Consensus in the room (which included maintainers of typeshed and several people working on actual type checkers) seemed to be that there’s still not much to be gained from having type annotations in more of the standard library. The typeshed setup is working well, and avoids some difficult problems (e.g. type checkers supporting different syntax and features, and thus needing slightly different annotations in a forked/vendored copy of typeshed). Given that typeshed is supposed to override the stdlib, having the same annotations in the stdlib itself wouldn’t gain us anything, and having different annotations might be confusing to users.

Type annotations on the tooling we use is valuable, and we should keep it. If we pull modules into the stdlib with type annotations, we should keep them. We should take care to actually validate the type annotations, though. Right now we do that for some of the type annotated code, with mypy. Whether we should keep using mypy longer term (or switch to another type checker), and what configuration to use for mypy, is a bit uncertain. Effectively, mypy wins by default for now, but we should be open to re-evaluate. There’s a small pain point when the type-checked code uses features of the stdlib that haven’t been reflected in typeshed yet, but according to the typeshed maintainers they’re happy to take PRs for unreleased features, even if it means potentially rolling them back.

The actual type checking is currently done by CI but not from the Makefile. We should probably make it easier to run the type checker of choice during development, with a suitable make target. We should also document this state in the devguide, along with instructions on how to add new things that should be tested with the type checker.

There’s also the question of how type annotations (in particular in the stdlib) should show up in the documentation. Sphinx has the ability to display them (which we don’t currently do), but especially for complex types it might not produce the most readable output. It is probably a good idea for well-annotated libraries with simple enough types, though.

Are there any other concerns people want to bring up? (Or, for people who were at the Core Dev sprint, did I miss anything we discussed?) Do people generally agree with the status quo, or are there strong arguments to change something (like getting rid of annotations where we currently have them, or having more annotations)?

The list of action items so far, based on the consensus in the room:

  1. Clean up the type checking we currently do, making it easier to run outside of CI and to add new things to type-check.
  2. Revisit type annotations outside of the things currently being type-checked (e.g. the type annotations that snuck into a few places). Maybe we should remove them, maybe we should actively type-check them, instead. Unchecked type annotations are a bad idea.
  3. Figure out whether we should include type annotations in generated docs, and how.
  4. Document all this in the devguide.

Other thoughts on things we should improve around static type annotations in CPython?


  1. by some standards ↩︎

15 Likes

I’m generally pretty happy with the status quo, but would like to remove unchecked annotations (the multiprocessing example you link isn’t even a valid type hint!)

I’d previously written a longer post here Type annotations in the stdlib - #7 by hauntsaninja that might help provide some context to those who weren’t in the room for this consensus:

Consensus in the room […] seemed to be that there’s still not much to be gained from having type annotations in more of the standard library.

3 Likes

Memory usage and startup time.

PEP 563 and some optimizations I implemented for PEP 563 have achieved very tiny overhead. But PEP 563 is about to be deprecated.

Python is not language only for server/desktop having a lot of RAM. Python is run on Raspberry Pi, WASM, and serverless containers. We definitely should care about startup time and RAM usage.

8 Likes

Not a core dev, but as a typeshed maintainer I have some comments. I had some previous thoughts in this GitHub issue: Thoughts about type hints in the standard library · Issue #5444 · python/typeshed · GitHub and this thread, linked before by Shantanu: Type annotations in the stdlib - #22 by srittau.

Generally, I see two different goals for annotations in the standard library with different concerns: Type checking the standard library itself and providing type annotations for third party libraries and applications (replacing the stdlib part of typeshed). (Although there is a certain overlap, as often one standard library module can be considered a third party to another one.)

Personally, long term I would like to see both. A fully type checked standard library that also provides type hint to third-party software.

Type checking the standard library

This is the more realistic short-term goal and can be implemented gradually or partially: Check all code that is already annotated, and require the checks to pass in CI. Allow annotations in new or changed code, but don’t require them.

The main concern is how exactly annotated code checks calls of unannotated functions/classes. Fall back on typeshed? Just assume Any?

That’s actually the opposite of the policy we recently adopted:

We accept changes for future versions of Python after the first beta for that version was released.

That said, I’m sure we can amend the policy if this helps CPython development.

Definitely, existing annotations should be valid and should be checked!

Providing type annotations for third-party software

I agree that for the time being, typeshed seems to be the best solution. But long term – as both the type system and the annotations in typeshed mature, making changes less and less necessary, and more core developers gaining experience with typing – having the standard library annotations directly in CPython makes sense to me. This has several advantages:

  • Less effort overall required, as the overhead of opening and managing separate PRs to typeshed won’t be necessary.
  • Less chance of the type annotations diverging from the implementation.
  • Removing the need for the awkward Python version handling in both type checkers and typeshed.

Also, how should extension modules be handled? We’d either need a mixture of inline annotations for pure Python modules and type stub files for extensions modules, or we could use type stub files for all types of modules consistently. The latter would have the advantage that the modules themselves don’t need to be type checked before this could be implemented.

But these are really only questions for the future that don’t need to be answered today.

Type annotation in documentation

I believe that using valid type annotation syntax in documentation makes sense and should be quite readable. But as you mention, it’s not always the best idea to use the actual type hints, which are written with type checking in mind.

One example: If I were to document the re.Match.groups function (actual documentation here), I would annotate it like this:

Match.groups(default: str = None) -> tuple[str | None, ...]
Match.groups(default: bytes = None) -> tuple[bytes | None, ...]

This differs from the actual annotations in several ways, and is less accurate, but is much more readable than this:

    @overload
    def groups(self) -> tuple[AnyStr | Any, ...]: ...
    @overload
    def groups(self, default: _T) -> tuple[AnyStr | _T, ...]: ...

Why Any in the returned tuple? What is _T? Why the overload? What’s the meaning of AnyStr here? All of these questions have good answers that are irrelevant for documentation.

Another example would be a signature with many arguments, two of which are mutually exclusive. In type annotations, you’d use an overload to ensure that only one argument is used. But instead of duplicating the long signature, in documentation it’s much more readable to just add the sentence “x and y are mutually exclusive”.

The _typeshed pseudo-module

Another concern with type annotations in the standard library is the _typeshed pseudo-module, which is only available at type checking time. This adds many useful type aliases and protocols. When transferring types into the standard library, these types would also need to copied somehow.

2 Likes

Based on my experience maintaining downstream code I would say also that even if the decision is not to allow adding type hints in the stdlib it is still probably necessary to run a type checker in CI. No policy declaration will stop people from adding hints whether correct or incorrect: it needs to be enforced in CI somehow either that the hints must be checked or that they must not be added at all. Incorrect unchecked hints are technical debt that will eventually need to be paid by someone.

1 Like
  • +1 to improve running outside of CI using make
  • Defer to folks like Jelle on whether expanding usage makes sense. Agree that having types and not checking is not ideal.
  • Adding types to the generated docs. I believe there are few perspectives here:
    • correctness of reference which is a positive for professional development
    • learners of Python, non-CS trained users, and rapid prototyping (science): I suspect part of Python’s popularity is that it is more visually clean and understandable than other statically typed languages. As a new user, reducing the cognitive load by ignoring types in functions has been appealing to onboard learners rapidly.
    • I think if we add type information throughout the docs (which I think we should), we should do it in a way that doesn’t alienate people who don’t find types necessary. In other words, doing it right versus the most expedient solution makes sense and reducing visual clutter will be key.
  • Devguide guidance: +1

Thanks @thomas for capturing the discussion well.

P.S. I likely would have never tried Python if it weren’t for the fact that I could ignore types and have visually cleaner code. I would likely have stuck to C++/Java.

6 Likes

Our docs aren’t generated from Python source, so any annotations we wish to document would need manually adding to the RST rather than being autogenerated. An advantage of this is we can tailor and simplify them for the reader, especially as longer annotations can be complex and less readable.

4 Likes

There’s two ways in which type annotations can “serve as documentation”, and I think it’s worth distinguishing between the two:

  1. Many projects use Sphinx to generate their user-facing API docs. Sphinx calls inspect.signature() on functions and this often means that any type hints that are part of the signature show up in the generated Sphinx docs. As @hugovk says, this isn’t something we currently do for CPython’s docs, and it’s pretty unlikely we’ll switch to this any time soon.
  2. Type hints can also serve as a kind of (more informal) developer-facing documentation even if they don’t show up in user-facing docs, however: they can help readers of the source code understand how a function or class is meant to be used, improving the clarity of the code.

Both user-facing and “developer-facing” docs are important and valuable, and it’s important that both remain accurate and up-to-date. When type hints are considered as a kind of documentation, they’re much more valuable when they’re verified and checked by a type checker, regardless of the documentation cagtegory we’re considering.

1 Like