How do I migrate from imp?

hroncok · June 15, 2023, 9:39am

Hello folks. We are upgrading to Python 3.12 in Fedora and I need to port various packages I haven’t written from imp to importlib.

This and an example of such code (from dblatex):

import imp

def load(modname):
    try:
        file, path, descr = imp.find_module(modname, [""])
    except ImportError:
        try:
            file, path, descr = imp.find_module(modname,
                                                [os.path.dirname(__file__)])
        except ImportError:
            raise ValueError("Xslt '%s' not found" % modname)
    mod = imp.load_module(modname, file, path, descr)
    file.close()
    ...

This is what I get (obviously):

Traceback...
    import imp
ModuleNotFoundError: No module named 'imp'

First, I look at What’s New In Python 3.12 — Python 3.12.1 documentation. It says:

The imp module has been removed. (Contributed by Barry Warsaw in gh-98040.)

The Porting to Python 3.12 is silent about this.

Looking at the Python 3.11 documentation for imp imp — Access the import internals — Python 3.11.7 documentation I looked up the find_module function. It says:

Deprecated since version 3.3: Use importlib.util.find_spec() instead unless Python 3.3 compatibility is required, in which case use importlib.find_loader(). For example usage of the former case, see the Examples section of the importlib documentation.

But neither of that functions or examples sets its custom path. The only thing that seems to accept path is also deprecated.

Where do I find guidance on how to move from the simple (but apparently evil) imp module?

AlexWaygood · June 15, 2023, 9:45am

Cc. @vstinner, who I think has been working on improving our docs for migrating from imp to importlib

hroncok · June 15, 2023, 9:47am

So, based on importlib documentation I figured I might be able to replace imp.find_module + imp.load_module with:

spec = importlib.machinery.PathFinder.find_spec(modname, [""])
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
sys.modules[modname] = mod

However, I still don’t know if this:

is actually correct
is the one obvious way to do it

(Full patch available in Fedora or upstream.)

vstinner · June 16, 2023, 10:13am

Please help me to get my proposed importlib.util.load_source() function added to Python 3.12 stdlib You can explain your use case in the PR to convince Brett

vstinner · June 16, 2023, 10:15am

I have the same concern with importlib. The API is not easy to discover/use, so I mostly push random button randomly until the code works as I expected, then I don’t touch it ever again

CstQrjWUkAAdpWr

(on the Internet, nobody knows that I’m a dog!)

vstinner · June 16, 2023, 10:19am

If someone has a recipe to port existing code, please share them in Improve the docs regarding the migration from imp to importlib · Issue #104212 · python/cpython · GitHub I will try to convert them to actual documentation. In the meanwhile, you look for “deprecation” notes of the Python 3.11 imp documentation.

The problem is that there is no 1-to-1 replacement method. You have to rethink and redesign the code snippet a little bit to adopt the new importlib design which is based on loaders and “spec” objects.

kknechtel · June 16, 2023, 1:24pm

Indeed. Higher-level wrappers are sorely needed. It shouldn’t be complicated to say “here is a path to a .py file; please give me the corresponding module object” (with or without putting it into sys.modules).

Rosuav · June 16, 2023, 2:02pm

If you want that without putting it into sys.modules, that should be pretty easy:

>>> with open("tmp/demo.py", "rb") as f:
...     mod = types.ModuleType("demo")
...     exec(f.read(), mod.__dict__)
... 
>>> mod.f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() missing 1 required positional argument: 'x'
>>> mod.f(42)
hello, world
131

Should be easy enough as a simple function in importlib, although it isn’t technically using the import machinery at all. Let the bikeshedding begin: what SHOULD this be called?

brettcannon · June 16, 2023, 8:39pm

Yes and yes (I think).

Do note that what you’re proposing and what @hroncok is proposing are different things. What you’re proposing is the pure Python file case where you don’t want a normal search performed (i.e. imp.load_module() for just Python code at any file path). What Miro is proposing is search + load (i.e. imp.find_module() + imp.load_module() following import semantics, albeit is specified directories).

And this is why I have not tossed in a ton of helper functions into importlib.util; everyone wants something slightly different. Luckily the abstractions and composability of importlib should be enough that most things are about 4 lines.

And Thomas as I still think it’s a new feature and thus would need RM approval.

It’s not, hence why it’s documented in 4 lines of code .

People say that, but then everyone wants something slightly different in the various scenarios. For instance, @kknechtel wanted the option to not insert the module into sys.modules. No one has brought up packages versus modules. And this is all over what can be accomplished in 4 lines (yes, I realize not everyone wants to read the docs to understand finders versus loaders, but it’s still at least not a ton of code).

I also want to point out the imp module has been pending deprecation for 10 years and a direct deprecation for 8 years. There hasn’t been a flood of asks until now when people are panicking.

Now, I am totally happy to consider adding things to importlib.util based on what @hroncok and friends discover in their journey of dealing with folks ignoring deprecation warnings for 8 years. But I do want to try and ground it in real-world needs that are common and not on one-off scenarios that can be solved in 4 lines of code, else we are going to end up a bunch of little functions that don’t get enough usage to warrant me having to maintain them forever (because, let’s be honest, we all know I’m going to be asked to maintain them ).

Rosuav · June 16, 2023, 9:05pm

Which is part of what I meant when I said “let the bikeshedding begin”. Name isn’t the only thing that’ll be argued.

But I also want to put it out there that, in some cases, maybe it ISN’T an importlib feature that people want. If it turns out that what you want is 95% handled by a simple exec call, maybe you actually don’t need it to be part of importing.

Exactly. Nobody cares about pending deprecations, it only matters once it’s about to hurt.

brettcannon · June 16, 2023, 10:11pm

One thing I forgot to say about the code sample from @hroncok is it doesn’t handle any module name with . in it correctly (e.g. submodules and subpackages). Key thing is it will assume you’re looking for the tail part of the dotted name in the specified directory, but it won’t set the attribute on the containing package (but the finder should make sure all the spec-related values are specified appropriately).

kknechtel · June 17, 2023, 10:44am

Right:

spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)

I have code in a current project that incorporates this (minus the sys.modules[module_name] = module part, because I don’t happen to want it in my case).

The point is, it’s clunky and non-obvious. The separation of spec and loader is not useful for the case where no searching is requested. (And indeed, as Chris showed, it can be done with an exec hack instead.) But more importantly, it’s common and expected for the standard library to replace four lines of code with one, and provide a cleaner interface, for use cases that can be reasonably anticipated as common. @rhettinger dedicated several minutes to this in “The Mental Game of Python”.

And really, there’s a lot more than 4 lines of code we might want to replace, to cover the other common use cases. That said, I think those use cases can neatly fit into a single, pragmatic, top-level function interface. Here’s my first cut at it:

import sys
import os.path
from importlib.util import spec_from_file_location, module_from_spec

def _try_loading(name, paths):
    for path in paths:
        spec = spec_from_file_location(name, os.path.join(path, f'{name}.py'))
        if spec:
            # If a spec was found but the file is invalid, let exceptions propagate
            # (don't keep searching for a different source file that would work)
            module = module_from_spec(spec)
            spec.loader.exec_module(module)
            return module
    raise ModuleNotFoundError


def _cleanup(name_or_path, paths):
    folder, filename = os.path.split(name_or_path)
    if not folder:
        # TODO: support for dotted package names?
        # Can that even be done if we have opted not to modify sys.path?
        return name_or_path, paths or sys.path
    if paths:
        raise ValueError("redundant path(s) specified")
    name, extension = os.path.splitext(filename)
    if extension != '.py':
        raise ValueError("directly specified file must be a .py file")
    return name, [folder]


def dynamic_import(name_or_path:str, *paths:str, use_cache:bool=False):
    """Dynamically import the specified module from a Python source file,
    directly specifying where to find the file.
    name_or_path -> if it includes a path separator, this is a complete path
                    to the file including its name and extension; *paths must
                    not be provided. Otherwise, it is the name of the module
                    to search for; the filename will be inferred.
    *paths -> if a name was provided, these paths will be searched. Defaults
              to sys.path if a search is required and no paths are provided.
    use_cache -> if set, the name will be looked up in sys.modules before
                 attempting dynamic import, with that result used instead;
                 if dynamic import is attempted and successful, sys.modules
                 will be updated with the result."""
    name, paths = _cleanup(name_or_path, paths)
    if use_cache:
        try:
            return sys.modules[name]
        except KeyError:
            pass # proceed with the actual dynamic import logic
    module = _try_loading(name, paths)
    if use_cache:
        sys.modules[name] = module
    return module

brettcannon · June 20, 2023, 12:15am

Sure, but I don’t know how common that is.

I disagree with that assessment. That line of argument could be used for any four lines of code that exists in the world. My personal guidelines for what should go into Python (stdlib or language) is:

How common is it?
How complex is it to implement for oneself?
Should people even be doing that thing?
Is it actually simpler to have in Python than to leave it to people to implement however they want?
Why doesn’t this make sense up on PyPI?

And the judgment varies depending on a subjective weighing of all of that. This specific topic really varies in almost all 4 areas depending on your view. For instance …

This doesn’t cover the use case that Victor linked to on the issue tracker: they explicitly did not want to assume a .py file extension. There’s also the question as to whether the resulting module should go into sys.modules (not even asking whether you should check there)? So I don’t know if you can claim that this is “common” yet.

I understand why people think their use case is common, but when people start to delve into custom import stuff the commonality diverges really quickly (hence why I designed importlib to be flexible and composible). So I don’t know people using imp’s old APIs were doing it because some module was in some odd spot and they didn’t want to tweak sys.path, the code was in a weird file and they just wanted a way to get a module where they could access the code, or if people used it instead of exec() to simply run code. And I have basically seen examples of all three at this point just in the discussion of this API request. And all of that changes depending on whether you’re wanting to import, load, or execute code.

vstinner · June 20, 2023, 8:19pm

I completed the documentation: What’s New In Python 3.12 — Python 3.13.0a0 documentation

In the PR comments, I wrote recipes for removed load functions: gh-104212: Explain how to port imp code to importlib by vstinner · Pull Request #105905 · python/cpython · GitHub I’m not sure if they should be included in the doc.

vstinner · June 25, 2023, 3:25pm

I also added a recipe to replace imp.load_source(): What’s New In Python 3.12 — Python 3.13.0a0 documentation

I didn’t add recipes for other load functons and init_builtin(). You can find recipes in my first PR: gh-104212: Explain how to port imp code to importlib by vstinner · Pull Request #105905 · python/cpython · GitHub

I closed the doc issue. If someone wants a more complete explanation for a specific removed imp function, please open a new issue.

hroncok · July 11, 2023, 10:01am

I’ve made a similar change in Python 3.12 compatibility: Replace the removed imp module with importlib by hroncok · Pull Request #1142 · rpm-software-management/mock · GitHub

         # features later when we prove we need them.
         for plugin in self.plugins:
             if self.plugin_conf.get("{0}_enable".format(plugin)):
-                try:
-                    fp, pathname, description = imp.find_module(plugin, [self.plugin_dir])
-                except ImportError:
+                spec = importlib.machinery.PathFinder.find_spec(plugin, [self.plugin_dir])
+                if not spec:
                     buildroot.root_log.warning(
                         "{0} plugin is enabled in configuration but is not installed".format(plugin))
                     continue
-                try:
-                    module = imp.load_module(plugin, fp, pathname, description)
-                finally:
-                    fp.close()
+                module = importlib.util.module_from_spec(spec)
+                spec.loader.exec_module(module)
+                sys.modules[spec.name] = module
 
                 if not hasattr(module, 'requires_api_version'):
                     raise Error('Plugin "%s" doesn\'t specify required API version' % plugin)

vstinner · July 11, 2023, 10:43am

Usually a module expects to exist in sys.modules while being executed: sys.modules[__name__] in the module should give its own module. So IMO you should store the module in sys.modules before executing it.

brettcannon · July 11, 2023, 8:42pm

That’s correct to handle circular imports. Relevant code:

github.com

python/cpython/blob/3590c45a3d564b3182ae21d899bae81c49d685a2/Lib/importlib/_bootstrap.py#L817


      
                          module.__package__ = spec.name.rpartition('.')[0]
                  except AttributeError:
                      pass
              if getattr(module, '__spec__', None) is None:
                  try:
                      module.__spec__ = spec
                  except AttributeError:
                      pass
              return module
          
          def _load_unlocked(spec):
              # A helper for direct use by the import system.
              if spec.loader is not None:
                  # Not a namespace package.
                  if not hasattr(spec.loader, 'exec_module'):
                      msg = (f"{_object_name(spec.loader)}.exec_module() not found; "
                              "falling back to load_module()")
                      _warnings.warn(msg, ImportWarning)
                      return _load_backward_compatible(spec)
          
              module = module_from_spec(spec)

csm10495 · July 12, 2023, 4:14am

Am I a crazy for still calling __import__() like so? Like every time I need to import by path, I look at the docs for imp and now importlib, get confused, and think: Why can’t I just do something like this? :

import sys
import pathlib

def import_file(f):
    p = pathlib.Path(f)
    old_sys_path = sys.path[:]
    sys.path.insert(0, str(f.parent.resolve()))
    try:
        return __import__(p.name.with_suffix(''))
    finally:
        sys.path = old_sys_path

And wind up doing something like ^

encukou · July 12, 2023, 6:31am

You’re not crazy, but __import__ is discouraged in favour of importlib.import_module. (__import__ does some weird things the import statement needs, or needed in the past. You almost never want the extra complexity.)

Beware the consequences. For example, if you import from a temporary file, the above could add /tmp/ (or C:\TEMP or something) to sys.path, and that’s not good at all.

When porting some dependency that used imp and broke in 3.12, we want to match the old behaviour as closely as possible.