Cannot load a file that defines a dataclass using SourceFileLoader

Hi,

So let me start by saying that it is entirely possible that I don’t know what I’m doing and there is a much, much better and cleaner way to do it :slight_smile:

A bit of context: I’m working on some updates to my nox-dump tool that, even though it will at some point be kind of obsoleted by a new release of Nox that already has JSON list output, may still be useful for people using older versions of Nox. As part of the current way nox-dump works, it has to tell Python to load the contents of “noxfile.py” and not just parse it, but really load it, let it declare its functions using the Nox decorators with the side effect of registering all the Nox environments in Nox’s internal structures.

TL;DR: I need to load a Python file and let the current interpreter evaluate its function definitions. If I do that using SourceFileLoader, then things break if the loaded file contains a dataclass.

There is a minimal reproduction at Peter Pentchev / load-things · GitLab - the way I have current written the load_the_thing() function produces this output:

[roam@straylight ~/lang/python/misc/py-dataclass-load]$ .tox/functional/bin/load-things loaded_dataclass.py 
Traceback (most recent call last):                                                                                     
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/bin/load-things", line 8, in <module>
    sys.exit(main())      
             ^^^^^^                                                                                                    
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/click/core.py", line
 1130, in __call__  
    return self.main(*args, **kwargs)                                                                                  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/click/core.py", line
 1055, in main                                                                                                         
    rv = self.invoke(ctx)                                                                                              
         ^^^^^^^^^^^^^^^^                                                                                              
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/click/core.py", line
 1404, in invoke                                           
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                     
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/click/core.py", line
 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/load_things/__main__
.py", line 46, in main
    mod: Final = load_the_thing(path, from_source=from_source)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roam/lang/python/misc/py-dataclass-load/.tox/functional/lib/python3.11/site-packages/load_things/__main__
.py", line 32, in load_the_thing
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/roam/lang/python/misc/py-dataclass-load/loaded_dataclass.py", line 10, in <module>
    @dataclasses.dataclass
     ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1220, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1210, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 947, in _process_class
    and _is_type(type, cls, dataclasses, dataclasses.KW_ONLY,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 712, in _is_type
    ns = sys.modules.get(cls.__module__).__dict__
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '__dict__'. Did you mean: '__dir__'?
[roam@straylight ~/lang/python/misc/py-dataclass-load]$

So after a couple of frames related to the way the click library invokes the command-line tool’s main() function, we get to load_the_thing(), it invokes loader.exec_module(), and the dataclass decorator breaks since the module does not seem to be initialized enough.

So… am I even on the right path? Should I use a different method, not a SourceFileLoader? Or have I encountered some kind of problem that seems to be present at least in Python versions 3.8 through 3.11?

Thanks for reading this far, and thanks in advance for any help or advice!

…so, of course, the minute I posted this - after spending some time on constructing a minimal test case - it kind of hit me… Okay, so it seems to work if I add sys.modules[mod.__name__] = mod immediately before the loader.exec_module() invocation… but is this really what I want to do? :slight_smile: do I really want to pollute sys.modules with something that is not really a module as such, although I do want to kind of use it as one? :slight_smile: So again, is there a better way to do this whole thing?

…and once again replying to myself: this is exactly what Nox itself does. Oof. OK, so… sorry for the noise, I guess?..

Try putting top-level code in a module to have it inspect its own name in sys.modules:

example.py

from sys import modules
print("during module import, I am already present:", modules['example'])

Now when we try it out from the same directory:

$ python
Python 3.11.2 (main, Apr  5 2023, 03:08:14) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
during module import, I am already present: <module 'example' from '/path/to/example.py'>
>>> 

I can’t tell you, though, why the import system would rely on the module object already being there, after it has already decided to load the file… especially considering that you can replace it, and the loader won’t re-replace it, but instead attach attributes to an object that is then discarded:

example.py

from sys import modules
modules['example'] = 1
x = 2

and then

$ python
Python 3.11.2 (main, Apr  5 2023, 03:08:14) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
>>> example.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'x'
>>> example
1

(Edit: I think maybe it has something to do with detection of partially-initialized modules from circular imports?)