Suggestions could consider standard library module names as well

pamelafox · September 21, 2022, 5:15pm

First, I think the “did you mean” feature is really cool and will be very helpful, as I’ve spent a good amount of time helping students spot spelling errors in their code.

I ran into the suggestions a few time in shell today when I forgot to import a module:

>>> stream = io.StringIO()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'io' is not defined. Did you mean 'id'?

I suggest that it can consider standard library module names for suggestions, so the experience would look like:

>>> stream = io.StringIO()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'io' is not defined. Did you mean 'id'? Or did you forget to import 'io'?

It seems to me that it will be fairly common for users to forget to import a module. It may throw them off to see a suggestion like ‘id’ so I think it’d help to acknowledge the very real other possibility that they meant to import a module instead.

What do you think? I believe it’d involve a change to cpython/suggestions.c at 27b989403356ccdd47545a93aeab8434e9c69f21 · python/cpython (github.com) to consider library names as well.

aroberge · September 21, 2022, 6:34pm

Based on the comments on Typo hint message for from-imports? · Issue #91058 · python/cpython · GitHub, I suspect that the answer given will be that it is not easily feasible as it would slow down cPython too much for the average user.

If you and your students want more helpful suggestions, may I suggest to look at friendly? Here is how friendly deals with this case (I use friendly with the default iPython-style prompt below):

friendly-traceback: 0.6.6
friendly: 0.6.0
Python: 3.10.2
Type 'Friendly' for help on special functions/methods.


[1]: stream = io.StringIO()

Traceback (most recent call last):
  Code block [1], line 1
    stream = io.StringIO()
NameError: name 'io' is not defined. Did you mean: 'id'?

Did you mean id?

[2]: why()

The name `io` is not defined in your program. Perhaps you forgot to import `io` which
is found in Python's standard library.

The Python builtin `id` has a similar name.

`io` is a name found in the following modules from the standard library: bz2,
configparser, dbm, dis, getpass, gzip, logging, lzma, mailbox, modulefinder,
pathlib, pdb, pickle, pickletools, pyclbr, pydoc, runpy, site, smtplib, socket,
subprocess, tarfile, typing, zipfile. Perhaps you forgot to import `io` from one of
these modules.

As you can see, it included the additional suggestion you thought of and, in this particular case, quite a few other ones.

NeilGirdhar · September 22, 2022, 7:50pm

Why would slow down cpython? Doesn’t this only happen when you’re already raising a NameError, which is extremely rare?

pamelafox · September 25, 2022, 2:06am

Thanks for linking to that thread, very interesting. That seems to be a slightly different situation since that’s specifically about ImportErrors, whereas I’m requesting it for NameError (where it already is). My naive implementation would be to hardcode the std modules list and compare to that. I still wouldn’t be surprised if there was a performance hit for it in some way, of course.

As for friendly-traceback, that looks great! However, I have students working in quite a range of environments - Jupyter, CoLab, Pyodide, VSCode, etc, so I’d need to show them how to install it in all those places (or somehow do it for them). Still a possibility though!

The other drawback is that a student could get too used to friendly-traceback and have a hard time using Python without it, but that’s a question of pedagogy that I don’t have an answer for.

steven.daprano · September 25, 2022, 2:41am

Raising an exception is already costly, but unlike exceptions like StopIteration and KeyError, NameError is not often expected and caught. If your buggy program takes 20ms to fail and raise instead of 15ms, is anyone going to care?

(Ironically, I do catch NameError, usually for feature detection, especially in older code that had to run under both Python 2 and 3. But that’s just a one-off cost when the module loads so I don’t think it will affect me too much.)

I don’t think the extra cost will be prohibitive, but actually trying it and measuring the performance hit is the only way to be sure.

I think that extending suggestions to stdlib modules is a good idea, especially for students.

The main downside is that we all know what comes next:

“Why can’t the suggestion also check for third-party modules in the search path too?”

@rhettinger what do you think about this as an aid to students?

pablogsal · October 13, 2022, 10:24pm

Here is an implementation of the idea:

github.com/python/cpython

gh-98254: Include stdlib module names in error messages for NameErrors

python:main ← pablogsal:gh-98254

opened 10:23PM - 13 Oct 22 UTC

pablogsal

+99 -30

* Issue: gh-98254

Given that we have enough information already and we don’t need to capture anything extra this doesn’t impact performance because all calculations are done only if the NameError reaches top level without anyone capturing it, so the interpreter is going to terminate anyway.

Also, as we are doing exact matches and not partial ones, the check is very fast.

After this, we can consider doing more complicated stuff if we want to or to refine the check and the message.

pamelafox · October 13, 2022, 11:34pm

Thank you @pablogsal ! The PR looks great, I think the wording of the messages sounds good, seems like it’s similar to what friendly says.

As for:
“Why can’t the suggestion also check for third-party modules in the search path too?”
It’s a great point. The audience that I’m hoping will benefit extremely from this is new Pythonistas, and they tend to not be working with third-party modules as much in their code. The two exceptions I think would be “numpy” and “pandas”, since a large number of people are learning Python via data science, and I can imagine new folks not realizing/remembering that those need to be imported.
Generally, I think this will be a big win and we can try to put off what comes next…

pablogsal · October 14, 2022, 12:15am

“Why can’t the suggestion also check for third-party modules in the search path too?”

I will give this a go after we land the first PR adding stdlib modules but I am a bit worried about having to maintain the complexity of that since we need to import pkgutil or similar and a bunch of the Python C-APi to iterate over pkgutil.iter_modules() and extract the names. This also means that we will be doing IO on the except handler, which worries me because this is a tricky place. Also, that list is potentially unbounded so we need to add some limits to control the overhead, which increases the complexity.

I don’t mean this to suggest we won’t be adding it, just to tame expectations around it. Maintaining CPython can be very challenging and we have limited resources so we need to control the complexity. especially when involving critical C code, even if the feature sounds very useful.

For a more complete set of helpers, there is always @aroberge 's fantastic friendly library. We (CPython) need to maintain a balance between being helpful and maintenance, performance, security, correctness … etc so is important to have in mind that sadly some times we need to say “no” to some exciting features.