Use module as global namespace object

This is an evolution of my idea “Pass module to exec()”. In summary: I think it would be better if Python did not use a dict object as the global namespace for exec/eval. Rather, it should use a module instance as the namespace (also known as the environment in other language implementations).

The change would affect a number of different things and so I’m struggling to flesh out the complete design. I should write a PEP eventually. I think the backwards compatibility impact can be kept quite low. Performance should not really be affected. I think this change makes things easier for alternative Python implementations. Having dict as the “uber” namespace object in Python has some nice properties but I think it overly constrains implementation choices. We can mostly have our cake and eat it too. E.g. we can still allow dict to be used as namespaces (pass to exec()) but internally not require that namespaces are actually dicts.

Here is a description of my prototype implementation:

  • Replace func_globals on function objects with func_namespace. Rather than pointing to the globals dict, we point to the module. I don’t call it func_module because we already use __module__ as a string containing the module name. So, I use __namespace__ as a property that returns the actual module object. Having this property avoids crawling into sys.modules to lookup the module by name. There is a number of places within Python we do that and it could be eliminated.

  • For backwards compatibility, functions still have a __globals__ property. It returns the dict from the __namespace__ object.

  • Remove f_builtins and f_globals from the frame object. Replace with f_namespace which is a reference to the module object. This change requires some surgery to ceval but the performance impact seems negligible.

  • This does mean if we want to exec() a code object, we need to have a module for it, not a dict as globals. I address this in two ways. First, globals for existing modules contain a __module_weakref__ entry that points back to the module. So, if we get the dict, we can get back to the proper module. That entry is an internal implementation detail, like __builtins__. Other Python implementations might not have it. For dicts that don’t have __module_weakref__, I create an anonymous module to wrap them in. I think the performance impact of this allocation should be okay since using exec() that way is not fast anyhow.

  • Because f_builtins is gone, the behavior of builtins is slightly changed. When you create a module, the value of the builtins is captured at that point. You can’t replace globals()['__builtins__'] and have ceval pickup that change. Personally I think this is a cleaner design and I think the amount of code affected should be very small.

  • There are a number of cleanups to importlib and related logic that could be done but I haven’t made those. Passing around modules rather than the dict for the module would be cleaner.

The code implementing this prototype is in my github repo. I have all tests passing except test_gdb. That should be fixable.

I think this change could also lead to cleaner implementation of PEP 573 – Module State Access from C Extension Methods. If all function objects have a __namespace__ attribute that is the module object, it becomes easy to have a METH_X flag that makes that object be passed to the function implementation. I have a rough implementation of that idea but it is not in good enough shape to share. I think the idea works.

To me, having all functions refer to their containing modules (via __namespace__) is a nice design. Whether the function is implemented in Python or in C, that reference could exist. The proposal in PEP 573 takes what I consider a more roundabout solution (types referring to containing module, methods referring to their types). The PEP 573 design looks like it works but I think it makes the Python runtime model more complicated.

Because f_builtins is gone, the behavior of builtins is slightly changed. When you create a module, the value of the builtins is captured at that point. You can’t replace globals()['__builtins__'] and have ceval pickup that change. Personally I think this is a cleaner design and I think the amount of code affected should be very small.

I think the behavior of changing builtins should basically be considered undefined at this point anyway. Due to this optimization https://github.com/python/cpython/blob/master/Objects/frameobject.c#L651 you can re-assign builtins and have it picked up on the next call or not depending on if you’re calling a function in the same module or calling through a function in another module. And reassigning it isn’t picked up immediately anyway because of the caching in the frame object.

Dino Viehland wrote:

[quote]

I think the behavior of changing builtins should basically be

considered undefined at this point anyway.

[end quote]

__builtins__ is a dunder and a CPython specific implementation detail.

User code should not touch __builtins__, and what about other

implementations?

Neil Schemenauer wrote:

[quote]
In summary: I think it would be better if Python did not use a dict
object as the global namespace for exec/eval. Rather, it should use a
module instance as the namespace (also known as the environment in
other language implementations).

Here is a description of my prototype implementation:
[end quote]

Before discussing the implementation, you ought to give a convincing
reason why this functional change is desirable, and why the costs of
change (changing the implementation, testing the changes, risk of
breaking user code, admitted performance impact) will be worth it.

I might be missing something here, but you seem to be conflating two
independent issues:

  • whether functions refer to their containing module by name
    or reference;

  • what arguments we pass to eval and exec as the globals and/or
    locals parameters.

Since this is a change which will affect user code, you should also
pitch your explanation at users, not just core developers who are deeply
familiar with the internals of how eval, exec and functions work.

Regarding eval and exec, from the perspective of the user, this proposal
seems to want to change code that currently looks like this:

eval(source_or_code, vars(module))

to this:

eval(source_or_code, module)

which to me seems like too small a benefit to care.

__builtins__ is a dunder and a CPython specific implementation detail.

User code should not touch __builtins__ , and what about other

implementations?

Good point! It does also seem to be documented as being a CPython implementation detail: https://docs.python.org/3/library/builtins.html. I’m not sure about other implementations but IronPython’s support on assigning it was pretty inconsistent as well.