As far as I can tell there is no standard hook for being able to intercept imports of already loaded modules. The existing hooks are only invoked when a module hasn’t yet been loaded. I believe currently when you do an import foo the following happens:
__import__ is called
sys.modules is checked to see if the module is already loaded
If the above check fails, call import hooks
This unfortunately makes it impossible to use the import hooks to build a dependency graph of modules. You can discover that A depends on B, but then if there is some other module C that depends on B you won’t get a callback because B was already imported, so that dependency will be missed.
As far as I can tell, you need to either overwrite builtins.__import__ with your own implementation, which is the old practice that the import hook system was introduced to replace, or you can create a custom dict like object to replace sys.modules in order to intercept the checks, but it’s unclear how robust this would be – I suspect the C APIs may bypass the user writing their own __getitem__ or __contains__ but I haven’t verified this. It’s definitely off the beaten path.
Would love to hear any feedback/alternatives. I would like this functionality to enable more robust automatic reloading of modules when py files change on disk.
You want to intercept all import statements, therefore overwriting __import__ seems like the perfect and correct solution. __import__ is not deprecated, nor have I ever seen any discussion of it being deprecated.
The import hooks were introduced because overwriting __import__ does not work well with caching and is difficult to compose - neither of which are really relevant concerns for you.
Oh, I overlooked this line when writing my first comment. This is probably a bad idea, and you will probably not be able to make it robust to a relevant degree (e.g. you can’t hunt down all instances of classes from the module or all references to functions that were imported). I don’t see how this feature would be able to help you with this.
What’s your view on conditional imports? For example:
try:
import fancymodule as mod
except ModuleNotFoundError:
import basicmodule as mod
Does this depend on basicmodule? If fancymodule exists, the fallback won’t be called on.
If your answer is “no” (that is, you only care about imports that actually happen), then __import__ is (still) the correct choice I believe.
But if the answer is “yes” (you care about imports that might not happen until a function is called, or only in the case of another module being missing, or anything like that), the best solution would be to parse the script to AST and walk it for Import nodes (and ImportFrom nodes).
In this case it’s good enough for my needs. I don’t have to worry about malicious adversaries, and this is to prevent needing to fully restart processes (yes I measured, a lot) inbetween top level requests.
Okay, yeah, then it’s hooks or __import__. Since you’re doing something highly unusual (trying to hook import statements that don’t actually import anything), I don’t see a problem with using __import__ - just be aware of the consequences. In this case, my reading of the warning is that you can confuse “code which assumes the default”; but since you’re not actually CHANGING the behaviour of imports (just LOGGING what they do), anything that makes that assumption should still be correct.
Ah in this case I control PYTHONPATH so I can use sitecustomize.py to inject my code before pth files are processed, assuming the imports in them are still processed with __import__?
Don’t some imports go through the builtins.__import__?
I am just interested in this question in the context of security.
And it seems to me that this api isn’t only convenient for monitoring but also for the ability to change the behavior of all loading modules.
Not only. Such calls can do anything in the context of process privileges.
And redefining dunders can be the most harmless.
If we are talking about security then I don’t see any other way out either using third-party sundboxed environments or interrupting the execution of third-party C code.