Going even further beyond with the new error messages

kknechtel · April 7, 2023, 7:06am

I just built 3.11.2 for myself (the day before 3.11.3 comes out, naturally) and have been playing around with the new exception tracebacks. The concept is excellent, and it’s a great start towards something I’ve hoped to see implemented for a long time.

However, there are some more things I would have hoped for that I’m not seeing:

Inconsistent treatment of operands

If I generate a TypeError from invalid subscripting, Python will highlight the subscript and the subscripted-thing differently:

>>> import bad
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/bad.py", line 1, in <module>
    [None][0][1]
    ~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

Similarly if I generate an IndexError:

>>> import bad
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/bad.py", line 1, in <module>
    'test'[:0][0]
    ~~~~~~~~~~^^^
IndexError: string index out of range

However, curiously, attribute access doesn’t seem to be able to do this:

>>> import bad
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/bad.py", line 3, in <module>
    object().attribute = None
    ^^^^^^^^^^^^^^^^^^
AttributeError: 'object' object has no attribute 'attribute'

Meanwhile, using more “symmetrical” binary operators like +:

>>> import bad
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/bad.py", line 1, in <module>
    0 + '1'
    ~~^~~~~
TypeError: unsupported operand type(s) for +: 'int' and 'str'

I actually prefer this style: there’s one form of highlighting for the operator, and another used for both operands. But the subscripts work differently: there’s one form of highlighting for the left-hand side, and a different form for both the operator and the “right”-hand side. I guess this is a consequence of the operator being made of up multiple, non-adjacent symbols (the two square brackets), but I honestly don’t think I like it.

Files vs REPL

None of this seems to work in the REPL, even if you define a function to wrap the dirty work:

>>> def bad():
...     'test'[:0][0]
... 
>>> bad()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in bad
IndexError: string index out of range

It used to make sense to suppress all the code in REPL tracebacks, since clearly the error is referring to the code that you just typed, and it’s right there. However, now that there’s functionality to highlight parts of the line, suppressing the code misses the opportunity to do that highlighting and explain which string index was out of range.

The error messages

Since Python can now see what source code corresponds to the operands of the failed operation, why not incorporate that information into the message? Better yet, show the values as well. And while we’re at it, AttributeError ought to be able to distinguish between gets and sets.

Examples could look like:

    [None][0][1]
    ~~~~~~~~~^~^
TypeError: cannot subscript None (the result of `[None][0]`, of type 'NoneType')

    'test'[:0][0]
    ~~~~~~~~~~^~^
IndexError: `0` is an invalid index for '' (the result of `'test'[:0]`)

    object().attribute = None
    ~~~~~~~~^~~~~~~~~~
AttributeError: not allowed to set attribute `attribute` of an 'object' instance (the result of `object()`)

(There seem to be existing conventions of showing the repr of values directly in error messages, and putting type names inside single quotes; I’ve followed that, but I’m also here proposing to use backticks in error messages to surround excerpts from the code.)

Or perhaps:

    [None][0][1]
    ~~~~~~~~~^~^
TypeError: `[None][0]` is a 'NoneType', so it cannot be subscripted

    'test'[:0][0]
    ~~~~~~~~~~^~^
IndexError: `'test'[:0]` has length 0, so `0` is an invalid index

    object().attribute = None
    ~~~~~~~~^~~~~~~~~~
AttributeError: `object()` is an 'object' instance, so setting its `attribute` attribute is not allowed

Finally: maybe we could special-case to recognize None, True and False, and describe them as such, rather than as 'NoneType' objects or 'bool' objects.

da-woods · April 7, 2023, 10:58am

Don’t want to comment on the whole thing, but I think the reason the binary operators are different is because it tries both ways round: lhs.__add__(rhs) and rhs.__add__(lhs). Therefore it isn’t clear which operand you can blame for failing to match so the thing to highlight is the operator.

For indexing you know the bit that hasn’t worked in the index (i.e. the square brackets and everything in it) so highlight that. Similarly, for attribute errors it would probably make sense to highlight everything to the right of the dot (I know that isn’t what it does).

There may be good performance reasons for not customizing the error messages too much - it’d be legitimate and relatively common to catch a lot of these exceptions and proceed without printing the error. In those cases time spent formatting a custom message is wasted. e.g.

IndexError: `'test'[:0]` has length 0, so `0` is an invalid index

This has to prepare a custom string with some context code, a length, and the index. Which is very expensive for something that’s often never read. Also looking up the length can fail - not everything indexable has a length, so that needs to be special-cased

aroberge · April 7, 2023, 6:54pm

Karl Knechtel:

Files vs REPL

None of this seems to work in the REPL, even if you define a function to wrap the dirty work:
>>> def bad():
...     'test'[:0][0]
... 
>>> bad()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in bad
IndexError: string index out of range
It used to make sense to suppress all the code in REPL tracebacks, since clearly the error is referring to the code that you just typed, and it’s right there. However, now that there’s functionality to highlight parts of the line, suppressing the code misses the opportunity to do that highlighting and explain which string index was out of range.

I believe that this is because “files” whose names are of the form “<…>” are not saved in the linecache module which makes them unavailable for further processing and analysis. This is also the case for code executed using exec(). This was mentioned before at Mailman 3 Access to source code when using exec() - Python-ideas - python.org but, unfortunately, never got any traction.

Other interpreters, such as IPython, IDLE (and my own friendly/friendly-traceback) implement their own version of linecaching so as to make it possible Edit the following is wrong (in theory for IDLE - shown in practice with friendly_idle) to add information for the new error messages (and more!)

EDIT: In the original version of this post, I got confused after retrieving the link about exec() and wrote something that is completely incorrect about IDLE. I’ve hidden the incorrect information behind a “spoiler”, keeping it as a record of what I initially wrote for accuracy.

tjreedy · April 7, 2023, 8:03pm

@aroberge off topic here but cannot easily find your GH id. Please check whether https://github.com/python/cpython/pull/103339 impacts friendly-idle. ie. does it use stackviewer module?

aroberge · April 7, 2023, 9:08pm

@tjreedy First of all, I would like to publicly apologize for incorrectly stating/implying that IDLE did not show the new error messages. I’ve edited my original post in an attempt to correct the record.

My github id is aroberge (André Roberge) · GitHub. The work done for friendly/friendly-traceback/friendly_idle is one as part of a separate “organization”. I have not worked on friendly_idle (GitHub - friendly-traceback/friendly_idle: Version of IDLE patched at import time to incorporate the best features from friendly/friendly-traceback) and company for the last 6 months or so. As a result, I have not kept track of anything new done by CPython in the same period. friendly_idle does not use stackviewer; instead, it is a monkeypatched version of IDLE. It is quite possible that the recent changes might make it possible to simply use “friendly” straight within IDLE without any monkeypatching. When I get back to programming, I will have a look.