Warning when importing a local module with the same name as a 2nd or 3rd party module

Arne · June 13, 2023, 2:52pm

Accidentally shadowing modules by giving your python file the same name is one of the most notorious gotchas in python, which every now and then still manages to make otherwise experienced developers stumble, e.g.

I'm stumped by this (a post I found while looking if this post has already been made)
Issue 29929: Eliminate implicit __main__ relative imports - Python tracker (ncoghlan, with examples and a partial workaround)
Issue 35375: name shadowing while a module tries to import another - Python tracker (tricky examples)
software assistance - Why am I getting "AttributeError" when running this Python function? - Matter Modeling Stack Exchange (a post that I saw which made me consider writing this here)
python - Importing a library from (or near) a script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError - Stack Overflow (the stackoverflow canonical on the topic)
me, 5 years ago, installing flask and trying to write my first webapp in a script called flask.py (admittedly, I wasn’t very experienced back then)

Would it be possible to have a warnings.warn in the importer code that triggers if a module was imported as a local file from the current working directory with a second or third party module of the same name existing as well? It would have helped me 5 years ago, and I think most of the people who stumble over the same issue as well.

Just in case, an example of what I have in mind:

fastapi.py

import fastapi

app = fastapi.FastAPI()
...

On execution:

/home/dev/fastapi.py:1: UserWarning: 
    Importing local module /home/dev/fastapi.py that might shadow the installed module 'fastapi'.
    Consider renaming local files in order to avoid name clashes.
  warnings.warn("Importing local module {module_name} that might shadow the installed "
Traceback (most recent call last):
  File "/home/dev/fastapi.py", line 1, in <module>
    import fastapi
  File "/home/dev/fastapi.py", line 3, in <module>
    app = fastapi.FastAPI(
          ^^^^^^^^^^^^^^^
AttributeError: partially initialized module 'fastapi' has no attribute 'FastAPI' (most likely due to a circular import). Did you mean: 'fastapi'?

cofiem · June 13, 2023, 3:12pm

I think this is a great idea. I have run into this myself, and I’ve had a few head-scratching times trying to help others solve “circular imports”, which were not circular imports.

Is there a Python docs page that mentions or discusses this anywhere?

Edit: This short mention on Nick Coghlan’s Python Notes - Traps for the Unwary in Python’s Import System - The name shadowing trap seems a useful reference.

Arne · June 13, 2023, 3:25pm

None that I could find, but my reading comprehension and google-fu isn’t the best.

ntessore · June 13, 2023, 4:14pm

Ah yes, another example of this is someone having e.g. a math.py in their local folder, and now other packages start breaking.

Gouvernathor · June 13, 2023, 8:58pm

Wouldn’t this pose a problem of install-specific errors ? For example, if I distribute a package with a scipy submodule, it would work on my end but raise the proposed new error to a user who would have scipy installed.

Arne · June 13, 2023, 9:10pm

As long as I didn’t misunderstand you, I don’t think that’s how this kind of bug can happen. Let’s say you distribute a package foo with a submodule scipy. Importing that module would be either with import foo.scipy, which won’t clash with import scipy, or from . import scipy within your package, with also won’t clash with import scipy.

It would only be a problem if your distribution manipulated sys.path in order to be able to import your scipy submodule with import scipy internally. At that point, it’s so far from best practices that it should break.

I think this kind of issue only really takes place if a user writes scripts, it’s a non-issue for libraries. And if someone has a script foo.py and installs your package, that’s exactly when you’d want to be warned about potential name clashes. Worst case, if you know what you’re doing you just ignore or suppress the warning.

edit: also, I’m not proposing to raise an Error, just to print a warning that is more helpful than the likely AttributeError/NameError/ImportError which follows right after.

wim · June 15, 2023, 12:39am

It would mean that import x would have to exhaustively search the entire sys.path to check for possible collision, rather than short-circuit early. That has a cost, and in some cases might be unacceptable (consider for example the last element on sys.path being a slow NFS fallback)

Arne · June 15, 2023, 7:02am

Hi wim,

I avoided adding a reference implementation along with my initial post because I’m not confident I’ll be doing it right. In particular,I don’t know if there is a reliable way to tell from an imported module if it was installed or not.

I didn’t find one, so I came up with this instead (again, not very confident I’m doing things right or without bugs, I only tested it a little bit), which shouldn’t add significant overhead if it was called by importlib.import_module just before returning, because it only checks the cwd and not “importables from all possible paths” (I think that’s what you meant). The latter might find lots of false positives too, iirc it’s not too unusual for modules to be findable from multiple locations.

import sys
import os
import importlib.metadata
from pathlib import Path
from types import ModuleType

SECOND_PARTY = sys.stdlib_module_names
THIRD_PARTY = {*importlib.metadata.packages_distributions()}


def maybe_shadows(module: ModuleType) -> bool:
    return (
        str(Path(module.__file__).parent) == os.getcwd() and
        (module.__name__ in SECOND_PARTY or module.__name__ in THIRD_PARTY)
    )

ajoino · June 15, 2023, 8:11am

This seems like a problem for linter/code analyzers, that means the import machinery can be fast and backward compatibility is maintained (adding warnings could cause test suites to fail but I’m not sure if it’d be that big of a problem).

Arne · June 15, 2023, 8:18am

Good point, makes my wonder why it’s not something I’ve ever come across yet. I personally would still advocate in favor of such a warning, given that there is a good way to implement it, but I can see why it might not get accepted.

ajoino · June 15, 2023, 8:33am

While this is an annoying issue that is surprisingly hard to debug, I don’t think it’s that common. I only remember this happening once when I shadowed the logging module, and while it was incredibly aggrevating it has never happened since (2-3 years since I think). What could be more useful than a warning is actual diagnostics in the error message. For example, maybe the import machinery could, in case of an error, check through stdlib and 3rd-party modules if there is a name collision. Since import (mostly) should cause a crash anyway that extra cost doesn’t matter. Well, it doesn’t matter unless you use try/except to do conditional imports, like this fairly common idiom:

try:
    from typing import <fancy new typing feature>
except ImportError:
    from typing_extensions import <fancy new typing feature>

but even using that idiom I’m not sure if it’ll affect performance too much. Only benchmarking could tell us that.

Arne · June 15, 2023, 1:30pm

I only remember this happening once when I shadowed the logging module, and while it was incredibly aggrevating it has never happened since (2-3 years since I think).

That might have to do with your workflow, though. It also doesn’t happen to me any more, because by now I 1) am aware of the issue, and 2) package pretty much all of my code. It’s hard to collect data on the severity, because there is selection bias in this forum towards people who aren’t as affected.

It’s probably also not the best idea to scrape github for python files shadowing stdlib or popular third party module names, because you’ll have to guess a lot if that file was supposed to be in the cwd, and the fact that the buggy versions that we’d like to see wouldn’t have been uploaded due to being buggy.

The best bet might be to query stackoverflow, but the fact that this issue is hard to google for because the shown error has little to do with the actual problem doesn’t help. Still, from the linked dupes of the canonical, I get:

906 votes on all questions
3020 votes on all answers
2265463 views

For comparison, another popular gotcha’s canonical mutable function arguments:

4116 votes on all questions, 454%
5060 votes on all answers, 167%
798817 views, 35%

At the very least, it’s in the same order of magnitude.

Base query found here.

What could be more useful than a warning is actual diagnostics in the error message.

Sounds just as good to me. The payoff here is probably ugliness of the implementation (I’m assuming here, but I don’t see how the interpreter would try to guess “I just crashed due to module name shadowing”, or would you run that logic on every crash?) vs saved time over my proposal.

Since import (mostly) should cause a crash anyway that extra cost doesn’t matter.

I’ll try to do a benchmark, but given the overhead of importing a module I’m fairly confident that running an additional function call, string comparison, and two set look-ups will be a drop in the ocean.

edit: turns out the import statement uses a C implementation, and not importlib. That might or might not be a problem, but it certainly puts my benchmarking experiment on hold.

alexei · June 18, 2023, 8:13pm

I like the idea. But is there any reason why this shouldn’t be a plug-in for flake8 or similar?

jamestwebber · June 18, 2023, 8:15pm

I think the audience who benefits most from this kind of warning is quite different from those using tools like flake8. I’m sure we could use it too, sometimes, but it’s most likely to bite a beginner.

CAM-Gerlach · June 19, 2023, 9:20am

FWIW, at least in my experience helping others learn Python and maintaining an IDE (Spyder) particularly popular with beginners and students, it seems like every other new user runs into this at some point, and they have to either struggle till they figure it out, search it or have someone explain it to them. These are the users least likely to use linters, or know how to.

While occasionally it occurs with third party modules, I’ve found it to generally be disproportionately common with stdlib modules. By contrast, a little quick benchmarking shows its around _five orders of more expensive (potentially even 6 in the worst case), as well as substantially more complex and potentially edge-case-prone, to do the check against third-party modules vs. just first-party ones:

Just getting the names (not counting the imports) is 100 000x faster for the stdlib vs. third party modules (on a machine with an old slow CPU but a relatively fast SSD, which is close to best case here for third party modules relative to stdlib) as it is hardcoded constant attribute access against a full filesystem search:

$ python -m timeit -n 1 -r 1 -s "import sys" "sys.stdlib_module_names"
1 loop, best of 1: 1.28 usec per loop
$ python -m timeit -n 1 -r 1 -s "import importlib.metadata" "{*importlib.metadata.packages_distributions()}"
1 loop, best of 1: 113 msec per loop

Worst case, on first run (I’m assuming something’s not cached, the latter can take over two full seconds on my machine, or 2 million times slower:

$ python -m timeit -n 1 -r 1 -s "import importlib.metadata" "{*importlib.metadata.packages_distributions()}"
1 loop, best of 1: 2.25 sec per loop

Adding the in check does add a small cost to the former, but it is still in the low-microsecond range, much less than the average import (in the ms range):

$ python -m timeit -n 1 -r 1 -s "import sys" "'mymodule' in sys.stdlib_module_names"
1 loop, best of 1: 2.14 usec per loop
$ python -m timeit -n 1 -r 1 -s "import importlib.metadata" "'mymodule' in {*importlib.metadata.packages_distributions()}"
1 loop, best of 1: 117 msec per loop

Adding the necessary imports to the time counted (which would also themselves incur the same cost, at least with a semi-naive implementation) doubles the time taken by the latter to over 200 milliseconds:

$ python -m timeit -n 1 -r 1 "import sys; 'mymodule' in sys.stdlib_module_names"
1 loop, best of 1: 15 usec per loop
$ python -m timeit -n 1 -r 1 "import importlib.metadata; 'mymodule' in {*importlib.metadata.packages_distributions()}"
1 loop, best of 1: 222 msec per loop

Of course, this all assumes the check is implemented in Python; if its implemented in C, then the former will be perhaps an order of magnitude or more faster still, while the latter would essentially not benefit at all, as it is constrained by IO and the existing importlib.metadata.packages_distributions() implementation.

On the whole, just checking stdlib module names and with some other simplifications, a minimal Python implementation could look like this:

import sys
from pathlib import Path

def maybe_shadows(module):
    return (Path(module.__file__).parent == Path().resolve()
            and module.__name__ in sys.stdlib_module_names)

The check would also only be run if not sys.flags.safe_path, which is generally recommended, and will hopefully eventually become the default in the future.

Therefore, given the performance impact is only on the order of a 0.1% (1-10 us) or less slowdown in typical import times, even without optimization, it seems worth at least considering to me.

Arne · June 19, 2023, 2:07pm

Thanks a lot for the insight and benchmark! Good to know that there is such a difference in bang-per-buck between stdlib and 3rd party.

ajoino · June 20, 2023, 11:06am

@pablogsal Do you think it would be possible to enhance the error message of ImportError in the case where a stdlib module might be shadowed?

pablogsal · June 20, 2023, 11:21am

I will give it a go when I have some time

Arne · June 21, 2023, 8:56am

Writing a shadowing-check for ImportError specifically would mean that

math.py

try:
    from math import ceil
except Exception as e:
    print(type(e))  # ImportError, will get a helpful message.
try:
    import math
    math.ceil(2.3)
except Exception as e:
    print(type(e))  # AttributeError, won't get a helpful message.
                    # could also be a TypeError, ValueError, ... depending 
                    # on what specifically was shadowed in which way

ajoino · June 21, 2023, 10:22am

Yeah it’s hard to do this generally, wether you go for exceptions or warnings.