PEP 726: Module __setattr__ and __delattr__

This PEP proposes supporting user-defined __setattr__ and __delattr__ methods on modules to extend customization of module attribute access beyond PEP 562.

Edit:

10 Likes

I would like to be able to define sys.__setattr__() and sys.__delattr__() to prevent users to “corrupt” the sys module. For example, make sure that sys.modules type is always a list: sys.modules = 123 would raise an error. Or, make sure that some important attributes are not deleted: del sys.excepthook would raise an error. See my issue for details.

This change alone doesn’t prevent other kinds of corruptions, like adding types other than str to sys.path list. But it’s better than nothing. If we can make sure that sys.modules always exist and its type is always list, the C code using sys.modules can avoid some error checking and use more efficient code.

5 Likes

I definitely like the PEP but I am confused by this sentence:

Defining setattr or delattr only affect lookups made using the attribute access syntax—directly accessing the module globals is unaffected, e.g. sys.modules[__name__].some_global = 'spam'.

Does this mean that:

import sys
import mod
mod.x = 42 # goes via mod.__setattr__
sys.modules['mod'].x = 42 # does not go via mod.__settattr__

If that is the case then I am not sure I understand either how this is implemented or what the implications are for all ways of setting a module attribute.

Does __setattr__ only come in to play after the module has finished “executing” so that it does not affect any code within the module body itself?

2 Likes

No, sorry. Actually, I meant here sys.modules['name'].__dict__['some_global'] = 'spam' (or globals()['some_global'] = 'spam' within the module code).

Does this sounds better:

?

That’s true. You can bypass __get/set/delattr__ hooks (and setting module __class__) by code like this

# mod.py

def __setattr__(name, value):
    print(f'Setting {name} to {value}...')
    globals()[name] = value

xyz = 123

Okay, then that makes sense and I understand what this means.

Yes, although also showing explicit code is clearer:

mod.__dict__['x'] = 42 # bypasses __setattr__

I’m still not sure I understand this. Let me be clearer:

# mod.py

x = 42 # no __setattr__ exists yet, so not called

def __setattr__(name, value):
    raise AttributeError("mod attributes are read-only")

x = 42 # does this call __setattr__?

Thanks. More verbose version: PEP 726: Correct and expand specification by skirpichev · Pull Request #3320 · python/peps · GitHub

No. In case you want to trigger module hooks, you could use

sys.modules[__name__].x = 42

after __setattr__'s definition.

I like it.

Maybe this is so obvious as to not be worth mentioning, but… This makes module __getattr__ less of a standalone method. It now lines up better with the model we use for class definitions. I think that’s a benefit in it’s own right.

Pure curiosity here, but has there ever been discussion of making sys.modules use a custom type rather than a list?

It may have been argued in the past that there was no point because it could be replaced by a user at runtime. But if we get this PEP in 3.13, then that’s about to change.

1 Like

This PEP does not in itself prevent anyone from replacing an object if they really want to. A determined monkeypatcher can still use sys.__dict__['modules'] = whatever. The intention here is more to prevent accidentally setting attributes. Otherwise the consenting adults principle still applies.

1 Like

I think it doesn’t line up well: for all objects, magic methods must be defined on the class, except for these three that can be inside the module object. Module-level __getattr__ and __dir__ are special cases here; for a different example, __class_getitem__ has a different name that helps recognize its specialness.

1 Like

Presumably one important case where a module’s __setattr__ and __getattr__ aren’t used is functions and methods defined in that module. Whenever those access a global, they use the module’s __dict__ directly, which is incorporated in the function object as its __globals__ attribute (and cannot be modified).

The interpreter highly optimizes access to such globals, since this includes references to imported functions, classes and constants, and disabling all that and going through a __getattr__ function written in Python would slow down the code unacceptably.

So we are left with these dunders being useful for access of module attributes from outside the module, for anything that uses an attribute on the module object, really. I like Victor’s use case of preventing users from accidentally messing with sys.modules, and also the use case of deprecating obsolescent module attributes.

5 Likes

I think that you summarized well the main usage: control how module attributes are get/set outside the module itself. The limitations should just be well documented.

I’m not surprised that accessing directly globals (dict) skips __getattr__(), __setattr__() and __delattr__() module functions.

Just a simple __getattribute__() method on a class can be ignored in different ways, it’s part of the Python semantics:

class MyObj:
    def __init__(self):
        self.attr = "attr"

    def __getattribute__(self, name):
        if name == "__dict__":
            return object.__getattribute__(self, name)
        else:
            return "__getattribute__"

obj = MyObj()
assert obj.attr == "__getattribute__"
assert obj.__dict__["attr"] == "attr"
assert object.__getattribute__(obj, "attr") == "attr"

If you want to skip the special methods, there are always ways. The problem here is not about building a secure sandbox, but customizing the common way to get and set module attributes, to add code in this case for different purposes.

2 Likes

FWIW, visually this also follows the same rules as a class definition.

def __setattr__(...):
    ...
x = 5

Also doesn’t call __setattr__ if you were to tab it over and slap class Foo a line above :wink:

1 Like

import mod loads mod.py and then call exec(code, module.__dict__) in importlib exec_module(). In the code object, an assignment like x = 5 writes into globals() which is module.__dict__.

It’s possible to hack the namespace used by exec(). Example which calls the custom __setattr__() on a simple x = 5 assignment in a module:

import textwrap

class ModuleDict(dict):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._inside = False

    def __setitem__(self, key, value):
        if not self._inside and key != "__setattr__":
            try:
                setattr_func = self["__setattr__"]
            except KeyError:
                pass
            else:
                try:
                    self._inside = True
                    setattr_func(key, value)
                finally:
                    self._inside = False
                return

        super().__setitem__(key, value)

code = textwrap.dedent("""
    def __setattr__(name, value):
        print(f"custom module setattr: set {name} to {value}")
        globals()[name] = value

    print(f"MODULE: globals() type: {type(globals()).__name__}")

    # call __setattr__() define above
    x = 2
""")

def exec_module(code, module_dict):
    namespace = ModuleDict(module_dict)

    exec(code, namespace)

    module_dict.clear()
    module_dict.update(namespace)

code = compile(code, "filename", "exec")
namespace = {'orig_attr': 1}
exec_module(code, namespace)
namespace.pop('__builtins__', None)
print(f"namespace: {namespace}")

Output:

MODULE: globals() type: ModuleDict
custom module setattr: set x to 2
namespace: {'orig_attr': 1, '__setattr__': <function __setattr__ at 0x7f28991531d0>, 'x': 2}

The problem is that after exec(), module functions will still write into the original dictionary, without going through __setattr__(), since they access directly the module __dict__ via their func.__globals__ attribute.

There are too many direct accesses to a dictionary in Python for best performance. It’s likely that some functions will ignore the wrapper and write directly into the original module dictionary.

Using a custom dictionary type for module __dict__ would slow down all accesses to a module attributes. I don’t think that it’s worth it.

One possible objection is that it’s possible to customize setting/deleting attributes with the __class__ attribute of the module (how the documentation suggests).

First, this has an impact on other operations with the module, e.g. attribute access.

And this solution is more slow on setting/deleting ops as well.
# mod1.py

from sys import modules
from types import ModuleType

def __setattr__(name, value):
    globals()[name] = value
# mod2.py

from sys import modules
from types import ModuleType

class _Module(ModuleType):
    def __setattr__(self, name, value):
        super().__setattr__(name, value)

modules[__name__].__class__ = _Module
$ ./python -m timeit -r10 -uusec -s 'import mod1 as m' 'm.x=1'
500000 loops, best of 10: 0.507 usec per loop
$ ./python -m timeit -r10 -uusec -s 'import mod2 as m' 'm.x=1'
200000 loops, best of 10: 1.12 usec per loop

Not sure how far this could be workarounded… Here is an attempt: Fast attribute access for module subclasses · Issue #103951 · python/cpython · GitHub

I read the PEP and I don’t find the motivation compelling. The PEP lists a few categories of use cases (e.g., “To prevent setting an attribute at all (i.e. make it read-only)”), but gives no concrete examples. I haven’t personally felt a need for it or heard of a lot of cases where it is useful. Victor gives a good example above with sys, but I’m not sure a single module is enough to justify a language change.

Normally dunders are looked up on the class, not the instance. This PEP would add two exceptions to that rule. We have two similar exceptions already for modules (__getattr__ and __dir__), but any additional exceptions would further muddy the general rule. We shouldn’t do it without a strong motivation.

3 Likes

A tangentially related side note: the steering council just rejected PEP 713 to add __call__ to modules. Our main reason being not enough compelling evidence of seemingly important practical uses for the feature. That bar seems like what I’d expect us to want an answer to for this PEP as well. __getattr__ and __dir__ had specific benefits.

The current 726 text has a Motivation section that covers what one could do with it. But the thing I personally think is currently missing are specific use cases for why it’d be useful. ex: What problems are package or module maintainers having today that would be improved by having this in the future?

Things like your later comment about using the __class__ module attribute to do similar things should be included in the PEP in an Alternatives Considered or Existing Options style section.

3 Likes

Hmm, I think the code example in the Motivation section is very concrete. Or are you about real world use cases? Another one given in the old thread, see the current workaround. The problem is that this solution slowdown also attribute access (more slow setting/deleting could be expected). Same workarounds with the __class__ attribute could be used for Victor examples, but in case of the sys module the price even less acceptable…

On another hand (and this was already mentioned above), this will simplify the mental model of customizing the module attribute access, i.e. a very basic thing for modules (c.f. something like the __call__ dunder). Currently we have streamline variants only for reading, but not for setting/deleting (as for class instances). Instead, now we are forced to use different patterns to customize attribute access in different scenarios…

Should I list mentioned above real world examples in the PEP?
Edit:
Here is some (incomplete) list of examples from the quick GH search:

On another hand, I’ve no examples for using other customized dunder methods for modules (except for the __call__).

1 Like

One use-case I have where __setattr__ would be useful is for a configuration attribute that at one point had permissible values True, False and 'auto'. 'auto' was deprecated, but the DeprecationWarning is only emitted when someone tries to use a function that depends on that value. With __setattr__, I would be able to emit a warning at the point someone tries to set a deprecated value. One can imagine a case where a downstream tool sets the value, but only a few users ever trigger the behavior, so it might not get caught through normal inspection of test logs.

1 Like

In my mental model of modules ModuleType was more of a metaclass and when I write the module I kinda of create a

class Foo(metaclass=types.ModuleType):
    def __dir__(self) ...
    ...

And only on import I create an instance that is cached in sys.modules, but I still can create another instance of the same module. And the way it is in CPython is more an implementation detail.

One of my use-cases is to invalidate cache when some module attribute is changed, for example

# module.py
default_arg2 = 1

_result_cache = {}
def foo(arg1, arg2=None):
    arg2 = default_arg2 if arg2 is None else arg2
    if (result := _result_cache.get((arg1, arg2))) is not None:
        return result
    # slow computation.
    return _result_cache.setdefault((arg1, arg2), result)

def __setattr__(self, name, value):
    if name == "default_arg2":
        _result_cache = {k: v for k, v in _result_cache.items() if k[1] == default_arg2}
    globals()[name] = value

Right now I have a module with a class that holds default_arg, cache and computation method and instance of it. And I also have foo = class_instance.foo to make computation, and to change default I expect users to do module.class_instance.default_arg = .
It was written that way before I knew about module.__class__ trick, and for my case I would be ok to pay 2x slowdown for attribute access (it shouldn’t happen often enough to be noticeable for normal workloads), I can imagine cases where it’s more a concern.

Well, I think cache is one of the cases of more general “module have some invariant and atrribute mutation should keep the invariant” and I think almost any invariant imaginabe for regular class can exists for module.