How does MonkeyType affect performance and stability?

I’m working on a python project that’s around 70,000 SLoC, and was only recently ported from Python 2.7 to Python 3.10.

We really need type annotations, and we’d love to use MonkeyType (or something like it?) to get there. We’re considering using MonkeyType in production for this.

My questions are:

  • I saw a small example program for which running the code under MonkeyType was 23.4 times slower than running the code normally. I’m guessing that’s an uncommonly slow example, but is it? What have people seen as a typical performance penalty for running code under MonkeyType?
  • Does MonkeyType add any instability? Is it safe enough for a production environment?

We have the option of throwing more machines at production temporarily to deal with slowness, but an estimate of how much more hardware would be valuable. I suspect the best way to get that estimate is to run MonkeyType in our Staging environment with a real-world-like workload first, but people’s experiences with MonkeyType performance overhead would still be informative.

Thanks!

I have to admit I’m confused by the concern here. Type stubs and annotations are fundamentally development tools (and/or serve a documentation purpose; I use them this way myself, without bothering with MyPy etc.). From what I can tell in the documentation, MonkeyType is intended to be used once on the code, to generate type stubs. It wouldn’t make sense to use in production, just as it wouldn’t make sense to use a profiler or debugger in production. Simply having type stubs and annotations in the code does not meaningfully impact performance, because Python doesn’t care about them - they just set up some metadata that third-party tools can use for static type checking. (And if it did somehow impact performance, then it wouldn’t matter how the annotations or stubs were created.)

I can certainly understand from the documentation how running code “under” MonkeyType would be slow (it seems to be doing a trial run of the code, with heavy instrumentation, in order to deduce what types are actually used by the code in practice), but that’s the exceptional case for running the code. Once you have the type stubs, you have them.

One needs some load for MonkeyType to generate the sqlite databases. Generally using your test suite as the load doesn’t work great, because it’s full of mocks.

I could’ve been more clear that we see using MonkeyType in production as a (hopefully) one time deal.

I may be able to make the case that we should just do it in our Staging environment instead, especially if MonkeyType is particularly slow or less than sturdy.

It’s also worth considering how rigorously you want to apply annotations, and/or what specific problems you are hoping to solve by having them. For example, a quick check of the Tensorflow codebase suggests they only use annotations sporadically.

And perhaps most importantly: what are the alternatives you are considering? The worst case is a manual update, I suppose; are there any competing tools in the running?

I’d love it if we could eventually:
python3 -m mypy --disallow-untyped-calls --ignore-missing-imports ${py_files}
I did that on some smaller projects, and found it was pretty nice.

I looked around for alternatives to MonkeyType, but only saw one that purportedly is more about python2: PyAnnotate. It sounds like PyAnnotate has since made the jump to Python 3, but is still Alpha?

I’ve also looked a little at alternatives to mypy, but came back to mypy: 4 Python type checkers to keep your code clean | InfoWorld

Manually updating a large project with type annotations could take years to really pay off.

I realize MonkeyType is more of a first draft than a complete project - especially if you don’t give it a real-world workload while running MonkeyType.

I’m looking at WSGI-MonkeyType · PyPI to deal with the fact that we have a distributed system on multiple virtual machines and python interpreters within a VM.

As someone who has taken a couple of code bases of similar or larger size from no annotations to fully annotated more or less by himself over the course of the past two or three years, I can understand why this task may seem daunting and it certainly takes its time, so I understand the desire leveraging an automated tool like MonkeyType to take off some of the initial load. But I would like to nudge you towards considering doing more of the work yourself, since I’ve found it a very rewarding experience. [1]

I am not convinced the quality of those type annotations is going to be particularly high, especially for things like generics, so it will probably end up being a lot of work to review the generated annotations until they don’t interfere with ergonomics in some way, so I am not sure you are actually saving yourself a significant amount of work.

The advantage of typed python is that it’s gradual, so you can gradually introduce type annotations starting from your core modules and increase the type checker’s strictness for the modules you’ve already typed as you go. I’ve found this is also a really good opportunity to learn more about your code and spot design problems, that went previously unnoticed. Starting from a codebase that’s already fully annotated by automatic tooling I think it’s actually more daunting and difficult to refactor them to be sane. That being said, some amount of automated help to reduce the tedium can make sense. [2]

I am not convinced it makes sense to use something like MonkeyType in production in the hopes of getting slightly better results [3] when in the end you will have to go over everything to make sure it’s actually ergonomic to work with going forward, since speaking from experience, bad type annotations are worse than no annotations.

On a side note: I would dissuade you from using --ignore-missing-imports globally. It’s fine to get started, but you’re better off explicitly enabling the option for the packages/modules where you’re missing type hints, because otherwise you will potentially hide some mistakes in typed modules.

Example pyproject.toml — mypy overrides section for ignoring missing imports
[[tool.mypy.overrides]]
module = [
    "alembic.*",
    "magic.*",
    "plaster.*",
    "pyramid.*",
    "pyramid_beaker.*",
    "pyramid_layout.*",
    "sqlalchemy.*",
    "transaction.*",
    "zope.sqlalchemy.*"
]
ignore_missing_imports = true

  1. Apart from some occasional annoyances that come up here and there, where the type system just isn’t expressive enough yet ↩︎

  2. Like on the order of magnitude of what stubgen gives you for generated .pyi files, i.e. it prefills some things like Incomplete | None for arguments that have a default of None and bool/int/str for literal defaults, but leaves everything else empty ↩︎

  3. if you already looked at what you got with your test suite and didn’t like it, I’m not sure you will be satisfied with the results from production use either ↩︎

I started using --ignore-missing-imports because of an out-of-tree dependency that didn’t have type annotations. Is there a better way of dealing with those?

If you want a straightforward, static tool that can automate much of the drudgery and give you a decent starting point for incremental typing without requiring dynamic execution or overwhelming you with a bunch of overly-specific, unhelpful types, you could consider @Jelle 's autotyping tool. It can infer a variety of common typing patterns to a configurable level of conservativism or aggressiveness, and add these as annotations in your code. Additionally, it can integrate with pyanalyze for more in-depth type annotation and validation.

4 Likes

Hi! A colleague and I wrote MonkeyType for use at Instagram. I’m afraid I don’t have hard data for you on performance; I’d expect a significant hit when it’s enabled. Our approach at IG was to run it on a very small sampled percentage of requests. With a large amount of traffic this still provided very good coverage, without a noticeable overall capacity hit. Perhaps a similar approach could work for you. For a 70k LOC code base you probably don’t need to run it in production for very long to get a good initial data set.

A much faster MonkeyType should be possible using PEP 669 monitoring. I won’t be able to tackle that project myself anytime soon, and it wouldn’t help you on Python 3.10.

Regarding stability, I can’t make any guarantees of course, but I’m not aware of any production errors caused by MonkeyType at IG (and we ran it for multiple years.) MonkeyType is pretty careful to fail in ways that don’t break the program.

As mentioned already, MonkeyType should be viewed as a data collection tool, where that data is useful input to a human authoring annotations. It shines best in large code bases with large teams, where you may need to apply annotations to code that you aren’t already familiar with, and it could take very laborious and in-depth code research to reproduce the same actually-used-types information MonkeyType can provide. I won’t claim that that’s a great situation to be in, but if you are in it nonetheless, I can report from experience that MonkeyType can be a huge time saver.

4 Likes

Yes, I put an example for pyproject.toml into my post that can be expanded, but you can also achieve the same with mypy.ini if you prefer a separate config file for mypy. Here’s a link to the relevant section in the mypy docs: The mypy configuration file - mypy 1.7.1 documentation

This is also how you can gradually increase strictness for only the parts of your code base that have already been worked on.

I used monkeytype on a few codebases to get a baseline set of annotations – a couple of years ago – by running the testsuites under it.
I had to turn off coverage, since the two conflicted, but had no other issues.

If you have good tests which you trust, this is a great approach for the transition. I loved the results I got with relatively little effort on a 10K SLOC codebase, for example.

But every file needs evaluation. Sometimes manual edits were needed. I had a few cases of large nested union structures which were better written as Any.


Although it’s slightly OT, I would recommend getting a type checker integrated into your testing process now, before adding annotations. Get it working on your untyped code, with whatever config and minor changes are needed.

That way, when you introduce annotations using manual or automated means, they’ll be checked and enforced out of the gate.

I’ve seen teams annotate code before using a type checker, and it uniformly goes badly.

5 Likes

One needs some load for MonkeyType to generate the sqlite databases. Generally using your test suite as the load doesn’t work great, because it’s full of mocks.

This is a bit of a yellow flag. Unit tests are great, and tend to use mocks, but if that’s all you have, then you aren’t really testing your code like it will work in production.

Many projects have some sort of integration test that actually runs the code very much like in production with no mocks at all, and in my experience, these tests regularly find issues that the unit tests don’t.

One possibility would be writing such an integration test first, which would have value on all its own, and then using that test to run MonkeyType or PyAnnotate and get “real” signatures without the mocks.

1 Like

I would also caution against a blanket --ignore-missing-imports. This will surely come back to bite you later. If there are just a few places where that external dependency is imported, you could add a # type: ignore comment in the code. But you can also configure this in mypy.ini (or setup.cfg or pyproject.toml). See: Running mypy and managing imports - mypy 1.7.1 documentation.