Use module as global namespace object

This is an evolution of my idea “Pass module to exec()”. In summary: I think it would be better if Python did not use a dict object as the global namespace for exec/eval. Rather, it should use a module instance as the namespace (also known as the environment in other language implementations).

The change would affect a number of different things and so I’m struggling to flesh out the complete design. I should write a PEP eventually. I think the backwards compatibility impact can be kept quite low. Performance should not really be affected. I think this change makes things easier for alternative Python implementations. Having dict as the “uber” namespace object in Python has some nice properties but I think it overly constrains implementation choices. We can mostly have our cake and eat it too. E.g. we can still allow dict to be used as namespaces (pass to exec()) but internally not require that namespaces are actually dicts.

Here is a description of my prototype implementation:

  • Replace func_globals on function objects with func_namespace. Rather than pointing to the globals dict, we point to the module. I don’t call it func_module because we already use __module__ as a string containing the module name. So, I use __namespace__ as a property that returns the actual module object. Having this property avoids crawling into sys.modules to lookup the module by name. There is a number of places within Python we do that and it could be eliminated.

  • For backwards compatibility, functions still have a __globals__ property. It returns the dict from the __namespace__ object.

  • Remove f_builtins and f_globals from the frame object. Replace with f_namespace which is a reference to the module object. This change requires some surgery to ceval but the performance impact seems negligible.

  • This does mean if we want to exec() a code object, we need to have a module for it, not a dict as globals. I address this in two ways. First, globals for existing modules contain a __module_weakref__ entry that points back to the module. So, if we get the dict, we can get back to the proper module. That entry is an internal implementation detail, like __builtins__. Other Python implementations might not have it. For dicts that don’t have __module_weakref__, I create an anonymous module to wrap them in. I think the performance impact of this allocation should be okay since using exec() that way is not fast anyhow.

  • Because f_builtins is gone, the behavior of builtins is slightly changed. When you create a module, the value of the builtins is captured at that point. You can’t replace globals()['__builtins__'] and have ceval pickup that change. Personally I think this is a cleaner design and I think the amount of code affected should be very small.

  • There are a number of cleanups to importlib and related logic that could be done but I haven’t made those. Passing around modules rather than the dict for the module would be cleaner.

The code implementing this prototype is in my github repo. I have all tests passing except test_gdb. That should be fixable.

I think this change could also lead to cleaner implementation of PEP 573 – Module State Access from C Extension Methods. If all function objects have a __namespace__ attribute that is the module object, it becomes easy to have a METH_X flag that makes that object be passed to the function implementation. I have a rough implementation of that idea but it is not in good enough shape to share. I think the idea works.

To me, having all functions refer to their containing modules (via __namespace__) is a nice design. Whether the function is implemented in Python or in C, that reference could exist. The proposal in PEP 573 takes what I consider a more roundabout solution (types referring to containing module, methods referring to their types). The PEP 573 design looks like it works but I think it makes the Python runtime model more complicated.

1 Like

Because f_builtins is gone, the behavior of builtins is slightly changed. When you create a module, the value of the builtins is captured at that point. You can’t replace globals()['__builtins__'] and have ceval pickup that change. Personally I think this is a cleaner design and I think the amount of code affected should be very small.

I think the behavior of changing builtins should basically be considered undefined at this point anyway. Due to this optimization cpython/Objects/frameobject.c at main · python/cpython · GitHub you can re-assign builtins and have it picked up on the next call or not depending on if you’re calling a function in the same module or calling through a function in another module. And reassigning it isn’t picked up immediately anyway because of the caching in the frame object.

Dino Viehland wrote:

[quote]

I think the behavior of changing builtins should basically be

considered undefined at this point anyway.

[end quote]

__builtins__ is a dunder and a CPython specific implementation detail.

User code should not touch __builtins__, and what about other

implementations?

Neil Schemenauer wrote:

[quote]
In summary: I think it would be better if Python did not use a dict
object as the global namespace for exec/eval. Rather, it should use a
module instance as the namespace (also known as the environment in
other language implementations).

Here is a description of my prototype implementation:
[end quote]

Before discussing the implementation, you ought to give a convincing
reason why this functional change is desirable, and why the costs of
change (changing the implementation, testing the changes, risk of
breaking user code, admitted performance impact) will be worth it.

I might be missing something here, but you seem to be conflating two
independent issues:

  • whether functions refer to their containing module by name
    or reference;

  • what arguments we pass to eval and exec as the globals and/or
    locals parameters.

Since this is a change which will affect user code, you should also
pitch your explanation at users, not just core developers who are deeply
familiar with the internals of how eval, exec and functions work.

Regarding eval and exec, from the perspective of the user, this proposal
seems to want to change code that currently looks like this:

eval(source_or_code, vars(module))

to this:

eval(source_or_code, module)

which to me seems like too small a benefit to care.

__builtins__ is a dunder and a CPython specific implementation detail.

User code should not touch __builtins__ , and what about other

implementations?

Good point! It does also seem to be documented as being a CPython implementation detail: builtins — Built-in objects — Python 3.12.1 documentation. I’m not sure about other implementations but IronPython’s support on assigning it was pretty inconsistent as well.

Are you saying that for backward compatibility, we could just relax the PyDict_Check() and accept either a dict (possibly wrapped in an anonymous module internally) or a module?

Currently, when CPython evaluates bytecode, it needs a dict object for the global variables. I’m suggesting that it uses a module object for the globals instead of a dict. E.g. rather than f_globals on the frame object, there would be a reference to the module, e.g. f_namespace. On functions, instead of func_globals, have func_namespace. The previous attributes become properties, for backwards compatibility. That change is not so disruptive as it first appears, potentially does not decrease performance and makes quite a few other things nicer.

One side effect is that exec() could take a module as the globals parameter. In my prototype, I allow dicts as well, for backwards compatibility and then wrap them with an anonymous module object.

Whoa, how did we get this 2-year-old thread suddenly reactivated?

One side effect is that exec() could take a module as the globals parameter. In my prototype, I allow dicts as well, for backwards compatibility and then wrap them with an anonymous module object.

This could be done trivially today by taking the modules’s __dict__.

We should have a better motivation than “it probably isn’t slower and a few things might be more elegant”. :slight_smile:

Whoa, how did we get this 2-year-old thread suddenly reactivated?

Because Barry asked a question, probably because I mentioned this idea during the core sprint.

We should have a better motivation than “it probably isn’t slower and a few things might be more elegant”.

I thought I did but I guess I’m not able to explain it in a way that anyone else thinks it’s an advantage. Making exec() take a module is a minor thing, not the reason to do it.

Did you accidentally or intentionally misquote my statement? If it’s your own opinion, that’s fine but you shouldn’t have quotes around it in that case.

Sorry, I thought I was paraphrasing. The actual quote was “potentially does not decrease performance and makes quite a few other things nicer.” In your OP you also wrote “Performance should not really be affected.” Both of these made me think you were discounting the performance. Sorry for interpreting too much.

I still think you haven’t explained what would be nicer. Your OP mostly describes the implementation and from that it’s hard to infer what becomes nicer (since some things become less nice because of backwards compatibility). Are you primarily referring to PEP 573 being cleaner to implement?

I also assume you did your prototype before we had the inline globals cache. That uses dict versions, I’m not sure how your implementation would deal with that. (Honestly I didn’t look at the prototype, I don’t know if you’re using getattr or still reaching in to the __dict__.)

At this point writing a PEP would be the next step. I would need to be specific about advantages vs costs. Given we now have the inline globals cache, it is less compelling. I’m suspecting backwards compatibility issues kills the idea. A few things I think would be better: PEP 573, import.c/importlib simpler, possible path to faster global variable access (vs pre-inline cache), easier introspection (e.g. finding module given function).

Well, you could rebase your branch for main and show the benefits (and lack of perf degradation) and start a discussion on python-dev (still the forum with the larger reach), to see if it’s worth your time writing a PEP and defending it. A PEP with an implementation behind it usually gets taken more seriously than one without. And in this case (for the reasons you mentioned) the implementation ought to be up to date, to compare apples to apples.