Unpredictable `__file__` While Manually Importing Modules - A Potential Issue?

Hello there!
I was playing around with "import"s and realized that importing from a path(via importlib.util) causes __file__ to be a bit “unguessable”. That changes (of course, it still points the same location/path) according to the string you use to define the same path ("C:\\Users\\Someone\\file" vs ".\\file" vs "./file" etc.). Please have a look at the code below.

this_module.py
import importlib.util, sys, pathlib

def import_from(path)->"Return the module exists in the given path.":
    mod_name = pathlib.Path(path).stem
    spec = importlib.util.spec_from_file_location(mod_name, path)
    module = importlib.util.module_from_spec(spec)
    sys.modules[mod_name] = module
    spec.loader.exec_module(module)
    return module

def msg_print(*messages, name=__name__):
    print("\n" + name)
    print("    ","\n     ".join(messages))
    
if __name__ == "__main__":
    msg_print("message", name="__name__")
    msg_print(f"__file__ is '{__file__}'")

    this_module = import_from("./this_module.py")

    msg_print(f"this_module.__file__ is '{this_module.__file__}'")
    msg_print(f"(__file__ == this_module.__file__) == {__file__ == this_module.__file__}")     

else:
    msg_print(f"__file__ is '{__file__}'")
OUTPUT
 __name__
	 message

 __main__
	 __file__ is 'C:\Users\***\this_module.py'

 this_module
	 __file__ is 'C:\Users\***\./this_module.py' #notice the extra "./" please, that was how I defined the path -"./this_module.py"-. 
                                                 #But, why not a smooth and validated path string?

 __main__
	 this_module.__file__ is 'C:\Users\***\./this_module.py'

__main__
     (__file__ == this_module.__file__) == False 
                    # "Issue" may lead problems such as the one above(it may break simple tests).(of course,
	                # you can simply solve it via Path, but why not implement
                    # the solution in the core -importlib- itself and prevent confusions?)

I am calling this as an issue(or, more likely as an overlooked point) and kindly requesting a tiny improvement about importlib. (about making __file__ a “smooth” path string, via using Path(x).resolve() etc. in all appropriate situations to prevent any possible confusion ) What are your thoughts?
Thanks.

PS: You can see my second reply in this topic for further ‘information’.

Hi sandraC,
The easiest workaround may be,

def import_from(path)->"Return the module exists in the given path.":
+   path = pathlib.Path(path)
    mod_name = pathlib.Path(path).stem
    spec = importlib.util.spec_from_file_location(mod_name, path)

You should also add,

+   spec.origin = path.parent.absolute()

to show repr(this_module) nicely.

cf. see also a similar topic.

1 Like

Thank you for your alternative solution and the link(though the OP was quite different, replies were helpful). Really appreciated.

My idea was to implement (a version of) your solution to importlib. Since users who are not aware of this issue may have big problems about it, Python should supply at least(“at least”, because importing from a path shouldn’t be such complicated) that easiness to users (in this case). (Of course, that would be an enhancement even when you do not think about those problems [which users who are not aware of this issue will need to spend lots of effort to solve].)

Are you referring to importlib — The implementation of import — Python 3.10.4 documentation ?

That’s on purpose as not everyone wants to have file paths be made absolute or pay the cost of doing the work (run python3 -S with a relative path in PYTHONPATH and you will see those paths not be made absolute). Importlib also doesn’t do more higher-level utilities like making importing from a file easier on purpose as it’s very easy to get wrong and to make assumptions that don’t apply to all situations.

While I appreciate the idea, I would rather not make this change to importlib. If you would like to get imports of single files working as simple as a single function, I would suggest creating a package and putting it on PyPI.

Thank you for sharing your ideas.

My main goal wasn’t exactly about it(of course, that would also be great). But a tinier thing.

As I understand so far, importlib(yes, the builtin module) auto-generates __file__(and other) attributes of modules according to the module spec object.(e.g. __file__ is spec.origin)
But, the ModuleSpec objects’ origins are customisable. (I tried to explain what I mean below[1] ) That is what I do not want since that may break some (simple but) important test cases like:

>>> mod1 = importlib.util.spec_from_file_location("module", "dir/./module.py")
>>> mod2 = importlib.util.spec_from_file_location("module", "dir/module.py")
>>> mod1 == mod2
False

[1] BELOW:

import importlib.util as ut
  • When you do spec = ut.spec_from_file_location("module","C:\\Users\\Sandra\\Desktop\\module.py"), spec.origin is:
    "C:\\Users\\Sandra\\Desktop\\module.py"

  • When you do spec = ut.spec_from_file_location("module","./module.py"), spec.origin is:
    "C:\\Users\\Sandra\\Desktop\\./module.py" (which is not the same string -and a bit weird-)

  • Or, when you do spec = ut.spec_from_file_location("module",".\\module.py"), spec.origin is:
    "C:\\Users\\Sandra\\Desktop\\.\\module.py" (which is also a different string)

(this is not an absolute vs. relative matter)

I would expect to get the same string from all those examples.


My implementation idea is to change importlib._bootstrap_external as :

  30  import marshal
+ 31  from pathlib import Path
  [...]    
  781 def spec_from_file_location(name, location=None, *, loader=None,
  782                             submodule_search_locations=_POPULATE):
  [...]
  804     else:
- 805         location = _os.fspath(location)
+ 805         location = Path( _os.fspath(location) ).resolve()
  806         if not _path_isabs(location):
  807             try:
  808                 location = _path_join(_os.getcwd(), location)
  809             except OSError:
  810                 pass

But, I am not sure if this will break something or if it is a good idea. As said, just an idea.

You could probably get most of the same effect without adding a dependency on pathlib by using os.path.realpath(). Given your examples it seems to make sense to do that.

1 Like

In that instance I would say you should normalize the paths before you create the module specs. Specifically, module specs are not meant to be “smart”, but to simply store values. And the helper functions in importlib.util are designed to pragmatically use information that can be gather via APIs found in importlib or info passed into the functions to create module specs, but without doing any real work with those provided values. As such, normalizing paths as passed in are the wrong abstraction layer for importlib.util and instead would be better served in a different library.

1 Like

Thanks for your thoughts .

Hmm, you may be right at that point.
But, this directly effects the __file__(if you create a module from the spec) which must be a bit smarter to me. ( that was my main reason for posting the idea.)


Actually, I had first thought that it would be better to implement something inside importlib.util._bootstrap , in _init_module_attrs (something directly about __file__):

But, some imports were needed and I doubted if there is something special about this module since it is a part of import lib and there is not even an import keyword in the file(I now doubt that my doubt was a silly thing). In the end, I decided to try to implement it inside spec_from_file_location(didn’t want to update the ModuleSpec itself for some reason) and tried to change __file__ via changing the way of creating the spec – I see that it doesn’t reflect the actual idea, I am sorry for not being clear. I actually wanted to improve the quality of __file__ but showed(or made) it (as) a module spec matter. I also probably couldn’t express myself well.

I still would like to see some improvements(and to hear your comments about this improvement idea ) about __file__, but I would rather not to continue this topic and waste your time here. (not sure about what you would rather to, though :))


PS: In the near future, I will definitely consider opening a pr about __file__(that probably will not be a change about modulespec but directly about __file__ - i will also consider Guido Van Rossum’s reply), after getting some further information about importlib and its structure/mechanism -which may take a bit long time-.


Thanks.

__file__ is also just a recording of the file path used to import a module, relative or not. Actually, __file__ wouldn’t even exist in favour of __spec__ if __file__ wasn’t so widely used to calculate the location of a file within a package instead of using importlib.resources.