This is an idea I have be working on, trying to make a prototype. I think it is relevant to the topic of “Postponed annotations break inspection of dataclasses” discussed on python-dev. Perhaps it is too radical a change but I think it should be at least considered.
Currently the bytecode evaluation in CPython deals with frames and uses the f_globals and f_builtins properties to handle LOAD_GLOBAL, etc. I wonder, wouldn’t it make more sense if we gave frames a f_namespace property that would replace those two? In the normal case, f_namespace would be a module object. The module would take care of falling back to looking inside the builtins.
To me, this seems a cleaner design. Maybe we can find ways that the f_namespace object can more efficiently handle LOAD_GLOBAL and friends. E.g. the code object could have an optimized version that uses slot offsets and that code gets used if it matches the module (i.e. the module has matching slots). Since you can exec the code in a different namespace, we have to detect that and switch to the non-slot based lookups. That doesn’t seem too hard to do.
When dealing with importlib, import hooks, etc, the fact that we execute code in dicts rather than in modules also makes things more complicated, IMHO. There are times you want to know what is the module for this certain globals dict. I believe there is no clean solution for that now. You can crawl around in sys.modules to look it up by name. That’s ugly. If we just had the module in the first place, i.e. exec() takes the module as the globals, things would become neater.
Obviously we would have to work hard to maintain backwards compatibility. E.g. exec() would still have to accept a dict as the globals. I think internally, we would wrap the dict in an anonymous module. The C APIs that evaluate code would also still have to accept dicts. However, internally the normal and fast case could be to use f_namespace as a module or module-like object. As I said, I’m trying to prototype this idea. I don’t have all unit tests passing yet but I think there is hope it can work.
I have a proof of concept working now. Currently it is mostly slower and has questionable benefits. However, I think it proves that this change could be made with a minimal of backwards incompatibility.
Major differences: frames no longer have a f_globals member. Instead, they have a f_namespace which refers to a module. In the case someone passes a dict to exec() or eval(), we create an anonymous module. That’s a bit of a performance hit but it doesn’t happen often.
Second, functions objects lose func_globals and gain func_namespace. Again, we refer to the module, not the globals dict. We give functions and frames a property (
__globals__ and f_globals that use the module object to get the dict).
eval(), exec() and varies functions like PyEval_EvalCode() have been changed to allow ‘globals’ to be either a dict or a module object. Using a module is slightly preferred as it performs a bit better.
So, what’s the point of all this? I’m not sure yet my dreamed advantages can be realized but I think it opens the door to some optimizations. E.g. the module object could do some specialization so that LOAD_GLOBAL/LOAD_NAME is faster. Maybe it can keep track of if all names are strings. For normal modules (not anonymous ones) it already knows that the md_dict is a exact dict object. So, it could keep a flag around and do something faster in that case.
Being able to quickly find the module, either given a frame or a function, seems useful. Getting is from sys.modules using
__module__ is not a good way.
I didn’t change how f_builtins works. I would like to try making a md_builtins member of modules and then use that instead. Remove the f_builtins member from frames.
Thanks for the encouragement Carol. I have another experiment: remove f_builtins and use the module instead. I’m surprised this seems to work almost perfectly. There is one failure in
test_dataclasses that I need to investigate. I think it is related to
__builtins__ being either a module or a dict. Not sure.
__builtins__ works currently is quite a mess. It would be nice if we could clean it up (obviously without breaking too much real-world code). I’m not sure what a good design would be. Perhaps when you create a module, you could be able to specify an optional
builtins keyword to the module
__init__. If you don’t specify it, I guess you get the builtins from PyEval_GetBuiltins(). We could find code that actually tries to override builtins and think about how that should work.
EDIT: I should add, after inlining a few functions, it seems like performance is about the same as before all these changes. I need to run “pybenchmarks” yet but based on quick benchmarks it seems nearly the same.
There have been several experimentations in the past to make LOAD_GLOBAL faster. It doesn’t make any real-world workload significantly faster. See e.g. https://bugs.python.org/issue1518 and https://bugs.python.org/issue10401.
I don’t understand why people like to push ideas like this. You should just accept that CPython’s current bottlenecks don’t include global variable access
Hmm, my bad! It seems like I’ve responded to a dormant discussion. But why did Discourse put it to the list of “unread” discussions?