I am surprised by the obviously unbounded optimism in this thread 
Let’s take a step back.
Please read to the end before replying. This is a longer text.
Lazy Imports a great thing
I think that most people in the discussion regard lazy imports as a good thing, including myself.
Where we have different opinions is around the method of enabling such lazy imports and in valuing the possible problems which may arise from the lazy aspects of imports (exceptions, registrations, cache setups, etc. getting deferred as well).
The latter is important to get a good grasp on for deciding which approach to take for the former.
Import and lazy binding side effects
First, I think it’s important to agree on the fact that in Python, programmers expect a module to be readily usable after import. This is very much unlike in e.g. C where you’d always except to
have to call some kind of init function get have the library setup some global state.
With this in mind, it is clear that module import time side effects are in fact more common that not having these. The side effects could be loading DLLs and setting those up, reading config files or importing config files (in case the configuration uses Python as config language), registering codecs, import handlers, atexit functions, setting up loggers, event loops, creating large mapping dictionaries or long lists of constants / enums, etc. etc.
With the current eager default logic, all of these side effects happen deterministic at import time. You know exactly when and where in your code you have to catch possible exceptions. And you (usually) know how to deal with that at that point in time and that point in your code.
With lazy imports, the import itself will happen when the module is first used in other code. This could be for an attribute, a function or a class, among other things.
Example
Let’s take an example and assume the lazy import is for a config module, which loads a float value from the OS environment, say config.py
:
import os
factor = float(os.environ.get('FACTOR', 1.0))
If the OS env var is set to a value which cannot be read as a float, a ValueError
is raised by the config module.
With the standard import, this ValueError
would happen when config is imported.
With lazy imports, this could well happen in other code, using the config value, e.g.
import config
def parse_float(value):
try:
return float(value) * config.factor
except ValueError:
return math.nan
The example shows that the import is no longer deterministic, it’s in fact data driven, since the first import of config would happen when the first float needs to be parsed by the code.
Now, if config.factor causes a ValueError
as result of reading the OS env var, the error would bubble up the stack and cause the above function to return math.nan
regardless of whether
value could be parsed or not.
The result is a silent data corruption – pretty much the worse thing which can happen in any data processing workflow.
Depending on how the lazy import logic is coded, the failed import could have various effects, e.g.
-
We have a half loaded config module in sys.modules
and the next use of config.factor
would trigger an AttributeError
or NameError
, looking odd to the programmer, since the function reads perfectly fine.
-
The lazy loader does not add the config module to sys.modules
and makes another attempt at importing the module, again raising a ValueError
.
-
The lazy loader marks the module as failing and returns e.g. an ImportError
. This could then cause other weird effects further up the stack.
Mitigating problems
Now, if we know that config can fail in this way, we could mark this module as not-safe-for-lazy-imports (NSFLI) by adding the module name to the unsafe set, e.g.
sys.lazy_import_unsafe.add('config')
# only the module "config" matches
before importing the module.
That is, if we control the code with the parser.
Let’s assume this parser is part of a 3rd party package called fast_csv.
The package author could set the marker in the package init.py:
sys.lazy_import_unsafe.add('fast_csv.config')
# only the module "fast_csv.config" matches
or the author could mark all modules in the package as unsafe, because she is also playing other tricks which are not compatible with lazy imports.
She could then let the lazy loader know, by adding the whole package:
sys.lazy_import_unsafe.add('fast_csv.')
# the trailing dot indicates: any module with this prefix matches
Now, let’s assume that the package author doesn’t care about lazy imports, but your CLI has to use the package and you still wants to benefit from lazy imports.
PEP 690 only provides an all-or-nothing switch, so the CLI would not be able to use the fast_csv package.
However, with the more flexible approach shown here, you could get around this by putting the marker inside your CLI code after analysis of the fast_csv package:
sys.lazy_import_unsafe.add('fast_csv.')
Now, what if you don’t want to bother with all this and can accept longer loading times, e.g. say you are working on a long running server.
Then you’d simply switch off lazy loading for everything:
sys.lazy_import_unsafe.add('')
# the empty string matches all modules; alternatively, "." could be made
# to have this meaning
What if I want to use lazy imports for a specific module or package ? Well, we could have a second set, marking safe modules/packages:
sys.lazy_import_safe.add('utils')
# matches just the "utils" module
# unsafe matching overrides safe matching to be on the safe side :-)
Aside: The reason for using a set of strings and not regular expressions is to avoid the overhead of loading the re
module for this purpose.
What am I trying to say with all this ?
The global switch solution proposed by PEP 690 is not adequate to solve real-life issues with lazy imports. The suggested solution should provide a better way to approach those real-life problems or at least hint at a better solution.
-
There are cases where you have to use packages which are known to not play well with lazy imports.
And it’s well possible that these are not marked by the maintainers as unsafe.
-
There are also cases where you may only want to have lazy load apply to specific modules where you know a lot happens at import time, e.g. a parser module compiling lots of regular expressions, a module loading a fairly large DLL which you don’t always need, a plugin which pulls in other large packages, but is only needed when enabled, etc.
Like in the above case, it’s possible that the code in question (e.g. the plugin) does not include a marker for being safe for lazy import.
What else can we do to make this safer ?
The problem mentioned above with unexpected exceptions causing changes to the flow of execution can be mitigated by having the lazy loader catch all such exceptions during he actual import and wrapping them into a new exception LazyImportError
, which inherits directly from BaseException
.
That way, false interpretation could not happen that easily and debugging would be simplified.