Feedback on type checker error messages

Jelle · March 2, 2024, 1:32am

I was looking at improving some error messages in pyanalyze and thought it’d be useful to get some community feedback. I am not looking to standardize these error messages (different type checkers can make different choices and that’s fine), but I’ll take the feedback here into account for pyanalyze, and perhaps other type checkers will do the same.

I will write down a few basic programs with type checking errors, outputs from some type checkers, and a few of my thoughts, and I’d be interested to hear any input from others.

Incompatible return type

def f() -> str:
    return False

pyanalyze (as of #735; before it was much worse): Incompatible return type: expected str, got Literal[False] (code: incompatible_return_value)
pyright: Expression of type "Literal[False]" cannot be assigned to return type "str"
mypy: Incompatible return value type (got "bool", expected "str") [return-value]
pyre: Incompatible return type [7]: Expected `str` but got `bool`.

Thoughts:

pyright’s message feels too verbose and buries the fact that this is about a return type
pyanalyze and pyright say “Literal[False]” instead of “bool”. I think this is mostly driven by how their type inference works (definitely true for pyanalyze).
mypy puts the actual type first, pyre the expected type. I chose to follow pyre in the change I just made, but I’m not sure which is better.

Incompatible argument type

def f(x: str) -> str:
    return ""
f(1)

pyanalyze: Incompatible argument type for x: expected str but got Literal[1] (code: incompatible_argument)
pyright: Argument of type "Literal[1]" cannot be assigned to parameter "x" of type "str" in function "f"
mypy: Argument 1 to "f" has incompatible type "int"; expected "str" [arg-type]
pyre: Incompatible parameter type [6]: In call `f`, for 1st positional argument, expected `str` but got `int`.

Thoughts:

I feel the “assigned” language in the pyright message is a little confusing, since this isn’t an assignment, though I understand the technical background of that message.
pyanalyze and pyright choose to mention the name of the parameter, mypy and pyre mention the position of the argument. Either is probably defensible.

Incompatible local variable type

def f(x: str) -> None:
    y: int = x

pyanalyze: Incompatible assignment: expected int, got str (code: incompatible_assignment)
pyright: Expression of type "str" cannot be assigned to declared type "int"
mypy: Incompatible types in assignment (expression has type "str", variable has type "int") [assignment]
pyre: Incompatible variable type [9]: y is declared to have type `int` but is used as type `str`.

Thoughts:

pyanalyze wins on terseness here. That’s not necessarily a good thing, but I think the error is still clear
“used as” in the Pyre message feels wrong, we’re not “using” y.

mdrissi · March 2, 2024, 1:53am

On second case while I think for a 1 argument function difference is minor for 5 argument or more function I would prefer an error message including argument name and not the position of the argument. I normally try to have readable argument names but very easy to forget what is third vs fourth argument.

For other two cases less of an opinion. One place I find error messages interesting/sometimes a challenge is overloads. If a function has 3 overloads do you specify a reason why incompatibility for all 3 or only best one? And if you only pick one when that one is different then user target, it’s easy to get confused.

Jelle · March 2, 2024, 2:10am

Overloads are an interesting case too. Here is an example:

Incompatible call to overloaded function

from typing import overload
@overload
def f(x: str) -> int: pass
@overload
def f(x: int) -> str: ...
@overload
def f(x: int, y: str) -> float: ...
def f(x: object, y: object = None) -> object: return None

f(1.0)

Pyright:

No overloads for "f" match the provided arguments  (reportCallIssue)
Argument of type "float" cannot be assigned to parameter "x" of type "int" in function "f"
  "float" is incompatible with "int"  (reportArgumentType)

Pyanalyze:

Cannot call overloaded function (code: incompatible_argument)
    In overload (x: str) -> int
      Incompatible argument type for x: expected str but got Literal[1.0]
          Cannot assign Literal[1.0] to str

    In overload (x: int) -> str
      Incompatible argument type for x: expected int but got Literal[1.0]
          Cannot assign Literal[1.0] to int

Mypy:

main.py:10: error: No overload variant of "f" matches argument type "float"  [call-overload]
main.py:10: note: Possible overload variants:
main.py:10: note:     def f(x: str) -> int
main.py:10: note:     def f(x: int) -> str
main.py:10: note:     def f(x: int, y: str) -> float

Pyre:

10:2: Incompatible parameter type [6]: In call `f`, for 1st positional argument, expected `str` but got `float`.

Thoughts:

Pyre and pyright both apparently pick just one overload to show data for. Unclear how it was picked.
Mypy and pyanalyze both show the signatures of non-matching overloads, but only pyanalyze explains why each overload doesn’t match
Pyanalyze first filters out overloads for which the argument count doesn’t match, which is why the third overload doesn’t show up. Is that a good idea? Not sure; at least it makes the error message a little shorter.

alicederyn · March 2, 2024, 11:12am

Not directly what you’re asking about, but I think this is my least favourite part of the pyanalyse message. It adds a lot of bulk to each message, which otherwise feel to me like the most approachable of the options you’re showing in each case (great job by the way!). I get why it’s there, but it does make me want something shorter.

It seems to be doing it twice though, is this an area you’re going to be looking at?

mikeshardmind · March 2, 2024, 11:52am

I remember it being mentioned in the issue tracker for pyright at some point that pyright has a heuristic for picking the closest overload.

I definitely think you’ve done a great job making the error messages easy to understand. I think the verbosity of the overloads not matching is a little too much as a default, and just telling me what it inferred types as and what it saw as possible overloads would be enough in most cases if you showed each overload in aligned output, similar to mypy.

I don’t think so? At least I don’t think it would be clear as a user that you were showing a subset of overloads as shown off. Maybe you could get less verbose by doing some hybrid approach that shows why the “close” overload uses were not matched, but show the other overloads that were skipped or at at least how many and why?

alicederyn · March 2, 2024, 1:05pm

I like the idea of showing

a single mismatch
total number of overloads
an option to enable verbose mode showing every mismatch

Could multiple overloads be combined? e.g. if three overloads differ only in the first parameter you could say something like “Incompatible argument type for x: expected str | int | datetime but got Literal[1.5]”. Having to wade through multiple overloads to get the same information would be harder.

AlexWaygood · March 2, 2024, 1:10pm

Yeah, it can get quite nightmarish if you provide the wrong types to builtins.pow. Here’s mypy’s current output if you try doing pow("foo", "bar):

main.py:1: error: No overload variant of "pow" matches argument types "str", "str"  [call-overload]
main.py:1: note: Possible overload variants:
main.py:1: note:     def pow(base: int, exp: int, mod: int) -> int
main.py:1: note:     def pow(base: int, exp: Literal[0], mod: None = ...) -> Literal[1]
main.py:1: note:     def pow(base: int, exp: Literal[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], mod: None = ...) -> int
main.py:1: note:     def pow(base: int, exp: Literal[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20], mod: None = ...) -> float
main.py:1: note:     def pow(base: int, exp: int, mod: None = ...) -> Any
main.py:1: note:     def pow(base: Literal[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], exp: float, mod: None = ...) -> float
main.py:1: note:     def pow(base: Literal[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20], exp: float, mod: None = ...) -> complex
main.py:1: note:     def pow(base: float, exp: int, mod: None = ...) -> float
main.py:1: note:     def pow(base: float, exp: complex | _SupportsPow2[Any, Any] | _SupportsPow3NoneOnly[Any, Any] | _SupportsPow3[Any, Any, Any], mod: None = ...) -> Any
main.py:1: note:     def pow(base: complex, exp: complex | _SupportsPow2[Any, Any] | _SupportsPow3NoneOnly[Any, Any] | _SupportsPow3[Any, Any, Any], mod: None = ...) -> complex
main.py:1: note:     def [_E, _T_co] pow(base: _SupportsPow2[_E, _T_co], exp: _E, mod: None = ...) -> _T_co
main.py:1: note:     def [_E, _T_co] pow(base: _SupportsPow3NoneOnly[_E, _T_co], exp: _E, mod: None = ...) -> _T_co
main.py:1: note:     def [_E, _M, _T_co] pow(base: _SupportsPow3[_E, _M, _T_co], exp: _E, mod: _M) -> _T_co
main.py:1: note:     def pow(base: _SupportsPow2[Any, Any] | _SupportsPow3NoneOnly[Any, Any] | _SupportsPow3[Any, Any, Any], exp: float, mod: None = ...) -> Any
main.py:1: note:     def pow(base: _SupportsPow2[Any, Any] | _SupportsPow3NoneOnly[Any, Any] | _SupportsPow3[Any, Any, Any], exp: complex, mod: None = ...) -> complex
Found 1 error in 1 file (checked 1 source file)

carljm · March 3, 2024, 4:08am

I agree that “incompatible type” seems like a less verbose and clearer-in-more-situations (e.g. the situations that don’t look like “assignments”) version of “cannot be assigned to,” without being any less clear or less technically accurate. Design decisions in pyright are typically carefully thought-through; I’m curious if @erictraut has thoughts on why “cannot be assigned to” is preferred in pyright.

In cases with “too many” possible overloads, I wonder if it would make sense to just reference the code location of the overloads in the error message, rather than trying to show them all inline? If they are from typeshed, that could even be a link to the relevant version of typeshed on GitHub. For example, in the builtins.pow example, typeshed/stdlib/builtins.pyi at main · python/typeshed · GitHub is significantly easier to read than the mypy error message. (It is syntax-highlighted, it uses well-named type aliases instead of long Literal types, etc.) It seems a good tradeoff for the more readable version to be one click away, rather than the unreadable version to take up 30 lines of my type-checker output.

rchen152 · March 9, 2024, 7:01am

For what it’s worth, these are the error messages that pytype emits for each code sample (pay no attention to the filename and line numbers):

Incompatible return type

File "foo.py", line 2, in f: bad return type [bad-return-type]
           Expected: str
  Actually returned: bool

Incompatible argument type

File "foo.py", line 6, in <module>: Function f was called with the wrong arguments [wrong-arg-types]
         Expected: (x: str)
  Actually passed: (x: int)

Incompatible local variable type

File "foo.py", line 9, in f: Type annotation for y does not match type of assignment [annotation-type-mismatch]
  Annotation: int
  Assignment: str

Pytype isn’t at all consistent in terminology here (see: “bad” return type and “wrong” arg types), but to be honest, I think most users focus on the expected/actual types and don’t read the prose very carefully.