Guidance creating custom python importer

djcopley · January 16, 2024, 10:20pm

Hello Python wizards! I’m in the process of developing a pluggable Python framework and have encountered the following issue: plugins are designed to be mix and match interchangeable, and it is mandatory for them to include all of their dependencies. The directory for dependencies is specified by the __deps_dir__ magic attribute in the top level plug#/__init__.py file. Currently, I add the dependencies directory to the Python path. While this works, if two different plugins use two different versions of the same package, whichever one got loaded first gets used.

Here is an example of what a populated plugin directory might look like:

.
└── plugins
    ├── plug1
    │   ├── __init__.py
    │   ├── deps
    │   │   ├── requests
    │   │   └── requests-2.2.0.dist-info
    │   └── plugin
    │       └── __init__.py
    └── plug2
        ├── __init__.py
        ├── deps
        │   ├── requests
        │   └── requests-2.3.0.dist-info
        └── plugin
            └── __init__.py

In the past, we directed plugin developers to manually vendorize their dependencies. However, after delving deeply into importlib and import hooks, I believe there’s a viable solution. I propose using a custom MetaPathFinder and loader to disambiguate these dependencies, providing each plugin with its own copy of the module.

Here’s what I’m thinkin:

Traverse the plugins directory, executing each top-level __init__.py file (e.g., plug1/__init__.py and plug2/__init__.py).
1.1 Store the __deps_dir__ attribute if the module has one.
Recursively go through all Python modules in the plugin/ directory and execute each of them.
During the execution of each module, the import machinery will trigger with each import. A custom finder intercepts this call, searching in the plugin’s __deps_dir__ (not sure where this will be stored? see caveats).
The loader needs to distinguish the module name in sys.modules. My idea is to prepend the plugin name before the imported module. I’ve seen some special handling of dot-separated modules in the importlib source so I’m not sure if using the scheme plug1.requests, etc. might have unintended effects.

Caveats:

There’s no prescribed way to identify which plugin initiated the import.
- Maybe a context manager with a ContextVar?
If two plugins import global state, a situation may arise where things get double registered/executed.

I welcome any guidance, recommendations, examples, or alternative approaches you may have! Thanks!

kknechtel · January 17, 2024, 12:01am

I think you may find it easier to just use importlib to create the module instances, and maintain your own module cache (which can then use custom semantics - assuming that you want to cache loaded modules at all!), rather than trying to manipulate the ordinary import process so that the default cache works. Especially if you’re trying to provide for “vendoring”, explicitly allowing multiple versions of the same package to be imported simultaneously, etc.

djcopley · January 20, 2024, 6:03pm

Thanks for your response. Let me just make sure we understand each other correctly. I currently use importlib to load all the top level plugins and I agree that in this case it is the right approach. The custom importer is to handle 3rd plugin import statements.

As an example, a highly reductive view of what the plugin importing looks something like:

def load_plugins(plugins)
    for plugin_spec in plugins:
        module = importlib.util.module_from_spec(plugin_spec)
        sys.modules[plugin_spec.name] = module
        plugin_spec.loader.exec_module(module)

Plugin names are assumed unique, so no special cache is needed to prevent name collisions.

The problem is that each plugin has imports that occur during the exec_module phase. Let’s suppose the plugin being imported is named “plug1” from my initial post. It is going to try to import “requests” when I exec_module and it expects version 2.2 of that library (assume that it is incompatible with version 2.3 even though it probably isn’t in this case).

My current understanding is that a custom importer would be required to change the import behavior for each module and it’s dependencies. Am I incorrect in this assumption?

kknechtel · January 21, 2024, 3:55am

Ah, that’s definitely different, then. But it’s not clear to me why the overall design of the system should require plugins to do this.

Okay; and what should happen instead if the local environment has an incompatible version of the library? Do you want it to download the other version and… somehow use it without installing it? Figure out some way to “install” it into a separate environment, and then use it somehow? Instruct Pip (remember to call a subprocess and don’t try to use it as a library) to upgrade/downgrade the library? (Should it be restored later? When? It’s not designed to support having multiple versions in parallel, you know…) Something else?

Also, how should your system know what version of the library the plugin wants?

djcopley · January 21, 2024, 5:05am

Do you want it to download the other version and… somehow use it without installing it? Figure out some way to “install” it into a separate environment, and then use it somehow?

Each plugin brings a copy of their dependencies with them in the “deps” dir. The challenge at hand is to ensure that each plugin is compelled to obtain a distinct copy of the dependencies it brings along.
Referring to the directory tree in my initial post, when plug1 imports “requests,” its version 2.2 copy becomes is bound to its local scope. Similarly, when plug2 imports “requests,” its version 2.3 copy is bound to its local scope.

Also, how should your system know what version of the library the plugin wants?

Plugins define a magic attr that tells the framework where its dependencies are located.

Some additional context

A typical solution to this issue would involve using pip to identify and install a version that is compatible with all plugins. However, due to distribution constraints, utilizing pip or a package manager is not feasible. Consequently, plugins carry their own dependencies with them.

kknechtel · January 21, 2024, 5:10am

They should be able to solve this trivially by just using relative imports.

However, if you can’t trust the plugin code - if you need to audit what its imports are doing, and potentially re-route them to e.g. use a temporary “sandbox” sys.path - then yes, that’s where a custom import hook would come in, I agree. Maybe. Maybe it’s enough to just hack sys.path according to the magic dependency-location attribute before executing the module, and restore it afterwards. Except you’d need to determine the attribute first…

Just keep in mind that all of this can’t even remotely secure against malice, only against developers who want to write absolute imports for their vendored dependencies.

Topic		Replies	Views
How can pip be used locally with custom metapath finder? Packaging	3	748	March 24, 2023
Installer creation based on distributions Packaging help	17	1222	September 26, 2023
Plugin architecture with embedded Python and dependencies Python Help	0	121	March 10, 2024
Add a module_names attribute to importlib.metadata.Distribution Ideas	2	590	July 17, 2023
How to implement namespace packages (as in PEP 420) via import hooks? Python Help	3	755	April 29, 2022

Guidance creating custom python importer

Some additional context

Related Topics