Nicer diagnostics for `python -m invalid-name`

sorcio · November 12, 2022, 3:31pm

A Python user reported some confusion after getting this error:

$ pip install some-package
...
$ python -m some-package
/usr/bin/python: No module named some-package

The user spent some time before realizing the correct command was python -m some_package (note: underscore instead of hyphen).

As it’s often the case, the installed module/package name does not match the project name you would find on PyPI. The user installed some-package but what is installed is called some_package.

Can Python do something to help this confusion?

tl;dr: I think it can (I prefer approach 3 below) but there might be caveats.

runpy can notice that some-package is not a valid identifier, and perhaps give a better error in that case.

But that’s trickier than I initially thought, because in general python -m doesn’t care that a module name is a valid identifier:

The module name should be a valid absolute Python module name, but the implementation may not always enforce this (e.g. it may allow you to use a name that includes a hyphen).

(1. Command line and environment — Python 3.12.1 documentation)

It’s not an error to have a module/package name which contains a hyphen. It can be imported correctly with e.g. __import__("some-package"). And importlib has no issue whatsoever with names that are not valid Python identifiers^[1].

python -m and runpy.run_module() do the same because they just use the same importlib stuff, and the behavior is documented to be implementation-specific (maybe at this point it can be made to be the correct behavior, and remove the “should be …” part in the docs).

But I’m still not sold. Even if python -m something-with-hyphens is legal behavior, the error encountered by the user is common enough that it might need special treatment.

Project names on PyPI are not allowed to have underscores. Well, technically they are, but they get normalized and what users see on the PyPI website is the normalized name. So all the some-packages in the world are likely to correspond to a local some_package.

If looking up a module with the given name fails and the module is not a valid Python identifier, there is a decent chance that this is a user confusion on the correct package name.

What can be done?

A simple approach would be to suggest alternative names based on common alterations. Something like:

$ python -m some-package
/usr/bin/python: No module named some-package. Did you mean `some_package` or
`somepackage`?

But the simple approach may lead to further confusion if those names cannot be imported either. runpy could produce alternative names, find_spec() them, and only suggest those that are importable. This would be a user experience improvement 99% of the times. Things can get complicated though:
1. Depending on what meta path finders are installed, find_spec() can execute unexpected code, or access resources that the user doesn’t expect to access. Maybe it’s acceptable because we are responding to a user error, but what about usage in scripts?
2. What about module paths? If the user runs python -m some-package.some-thing.foo-bar should runpy try all combinations for every component in the path? My answer would be yes, but this is arguably a less common case and I’m not sure if there are complications.
3. False positives are still possible. More of a corner case, to be sure, but all this work might still tell the user to run the wrong module.
Alternatively to 2, use importlib.metadata and only be concerned with installed distributions. This might be my favorite option because it involves a lot less guesswork. If I pip-installed some-package then importlib.metadata.distribution("some-package").read_text("top_level.txt") contains "some_package\n". I can use this information to provide a better and more informative message, e.g.:
```
$ python -m some-package
/usr/bin/python: No module named some-package. The distribution some-package
provides `some_package`. Did you mean to run `python -m some_package`?
```
I’m not familiar with importlib.metadata and the caveats of this approach, but it seems to address the exact problem a user is most likely to see.

Additional thoughts?

In bpo35358, Guido suggested that this behavior deserves to be explicitly documented, but the issue was closed before progress was made. ↩︎

steven.daprano · November 12, 2022, 2:20am

And my prediction came true.

Having said that, in this case at least, I don’t see a downside. It would help for misspellings of stdlib modules as well:

$ python3.10 -m unitest spam.py
/usr/local/bin/python3.10: No module named unitest

We already have a mechanism for suggesting misspellings, which is applied to NameError, AttributeError, and at least some ImportError exceptions. We don’t have to limit suggestions to simple heuristics like “change hyphen to an underscore”, or worry about what is an identifier or not, which just do a string distance calculation with existing modules and return the closest match.

Technically this has an unbounded cost (say, you have a hundred thousand python modules in your PYTHONPATH) but unlike the import statement, there is unlikely to be much concern about speed when using runpy at the command line. And if necessary, we could implement a limit of (say) 500 modules per directory in the path.

sorcio · November 13, 2022, 7:29pm

Hey, thanks for the constructive feedback!

I kinda think that correcting misspellings in general, and addressing the general case where one uses the distribution name instead of the importable package name, are two slightly different problems, which don’t necessarily share the same solution.

But I like the idea to make it more general. Producing suggested alternatives may use both installed distribution packages and importable modules.

The most obvious limitation is that MetaPathFinder doesn’t have an interface to search or scan importable modules, so an implementation would need to special-case file system access and ignore other finders.

Interactively, yeah, probably it’s a net win. Besides, the unlikely case where you have 100k entries and slow file system I/O might be less supported, or artificially capped. But for scripts it can be a deal-breaker. It could work to check if you’re connected to a tty, and only give suggestions in that case.

I can open an issue and sketch an implementation, would it be worth it?

Rosuav · November 13, 2022, 7:55pm

I’d be inclined to go ahead and implement a proof of concept without checks like that, and then compare the run time of python3 -m borked-module with python3 -c "import borked.module" to see whether it’s even worth having those checks.

This would definitely be helpful, just like the NameError enhancements we’ve seen recently. IMO it’s worth a bit of cost from grinding your hard disk a bit. When I’m on a call with someone and I dictate a command to do some unusual fix, it’d be hugely beneficial to not have to try to nitpick someone’s spelling when something goes wrong; the system will make a viable suggestion directly.

steven.daprano · November 13, 2022, 2:33am

Are you concerned about scripts (written in any language) which call runpy many times? Say, some bash script that runs

/usr/lib/python3 -m $scriptname

in a loop, where $scriptname may or may not exist?

I don’t think that’s very likely, but if it did happen, I don’t care.

If adding this positive feature which is a win for Python users results in a performance regression for people making lots of calls to runpy with invalid script names, too bad for them