Extend PEP 562 with setattr for modules?

skirpichev · April 6, 2023, 5:26am

Since CPython 3.5 it’s possible to customize setting module attributes by setting __class__ attribute. Unfortunately, this coming with a measurable speed regression for attribute access:

$ cat b.py
x = 1
$ python -m timeit -r11 -s 'import b' 'b.x'
5000000 loops, best of 11: 48.8 nsec per loop
$ cat c.py
import sys, types
x = 1
class _Foo(types.ModuleType): pass
sys.modules[__name__].__class__ = _Foo
$ python -m timeit -r11 -s 'import c' 'c.x'
2000000 loops, best of 11: 131 nsec per loop

For reading attributes this could be workarounded with __getattr__, but there is no similar support for __setattr__. Maybe it does make sense as well (or even for __delattr__)? If so, will this require
a PEP? (Draft implementation is in GitHub - skirpichev/cpython at module-setattr)

As a background story, this solution come from the mpmath issue ENH: guard against incorrect use of `mp.dps`? · Issue #657 · mpmath/mpmath · GitHub In short, people are trying to set precision attributes not on the context object (mpmath.mp, e.g. mpmath.mp.dps), but on the mpmath module. Probably, this is a very special scenario, but…

brettcannon · April 6, 2023, 10:06pm

To be clear, this is __setattr__ for module objects?

Not necessarily, but since it would potentially impact all modules and the setting of module attributes it could come to that in the end.

oscarbenjamin · April 7, 2023, 12:48am

Yes.

The context here is:

github.com/mpmath/mpmath

ENH: guard against incorrect use of `mp.dps`?

opened 11:30PM - 13 Feb 23 UTC

closed 11:36AM - 05 Jan 24 UTC

mdhaber

enhancement

I am a SciPy maintainer, and we frequently use mpmath to compute reference value…s. (Thank you for your work! It's very helpful.) A common mistake is for contributors to do, e.g.: ```python3 import numpy as np import mpmath as mp # should be from mpmath import mp mp.dps = 50 a, b = mp.mpf(1e-11), mp.mpf(1.001e-11) print(np.float64(mp.ncdf(b) - mp.ncdf(a))) # 3.885780586188048e-15 ``` This suffers from catastrophic cancellation just as: ```python3 from scipy.special import ndtr a, b = 1e-11, 1.001e-11 print(ndtr(b) - ndtr(a)) # 3.885780586188048e-15 ``` does, but the fact that `mpmath` is _not_ working with 50 digits is obscured by the conversion to float. (I've run into other, subtler reasons for not noticing the problem, too.) Another example in the wild: https://github.com/scipy/scipy/issues/18088#issuecomment-1712538038 I know that this is user error and not a bug in `mpmath`, but I wonder if it is possible to guard against this. For instance, `mpmath` doesn't currently have a `dps` attribute. Perhaps it could be added as a property, and a warning (or error) could be raised if modification is attempted, directing the user to `mpmath.mp.dps`?

Without going into the details of the issue it would be useful if there was someway to intercept a user setting an attribute on a module to either warn or raise an error. For example if a user does

import mpmath as mp
mp.dps = 100

then that will not work as intended. The correct way is

from mpmath import mp
mp.dps = 100

Here mp is a special non-module object that can interpret its dps attribute and do something useful with it.

In the interest of communicating errors helpfully to users it would be good to be able to make it an error to set an attribute on a module so e.g. mpmath.dps = 100 (where mpmath is a module) could be an error. Currently this is only possible by replacing the module object with a proxy in sys.modules but I expect that being able to protect a module from arbitrary attribute assignment would be a useful feature in general.

methane · April 7, 2023, 1:54am

__getattr__ has very little performance impact because it is called only when the attribute is not found.

On the other hand, __setattr__ would be called for every attribute access. So it might have performance impact like __class__ hack.

I agree that this is a very specail scenario.

Rosuav · April 7, 2023, 2:07am

With a class, we can define its slots and then any new attributes are instant errors. What if modules could do the same? It wouldn’t let you fully customize the error (and thus point people to the correct way to do things), but it’d prevent the scenario of “no error, just wrong behaviour” and would give people an error that they can ask about.

stoneleaf · April 7, 2023, 2:08am

__setattr__ would only be called when an assignment was attempted; __getattribute__ is the one that’s called all the time. (Unless it’s different at the C level.)

skirpichev · April 7, 2023, 2:13am

In the above example (wich __class__ hack) actually neither __getattr__ or __setattr__ was changed. Yet attribute reading is affected (~2x speed loss).

Only for setting attributes, here is a test on my branch (same timings as without __class__ hack):

$ cat d.py 
x = 1
def __setattr__(name, value):
    if name == 'spam':
        raise AttributeError
    globals()[name] = value
$ python -m timeit -r11 -s 'import d' 'd.x'
5000000 loops, best of 11: 48.8 nsec per loop

methane · April 7, 2023, 2:24am

Sorry, I meant all attribute assignment.

kknechtel · April 7, 2023, 3:39am

This thread looks to me like there might be some miscommunication going on, so I want to try to share my understanding first.

I think that OP is not concerned with slowing down attribute assignment here, because assigning an attribute to an imported module is much less common than looking up an attribute there. The goal is to have some way to intercept attribute assignment so that an exception can be raised to guard against programming errors; of course we should expect this to have some overhead. However, the current best solution for intercepting attribute assignment, means that attribute lookup will also be slowed down. (I assume this is because we now have to work through a Python implementation of the module class instead of using a built-in one.)

However, if we add a __setattr__ hook to the C implementation, although that doesn’t impact on attribute lookups (satisfying OP’s use case), it presumably would slow down assigning attributes to modules for everybody. OP isn’t bothered by this in a SciPy or mpmath context, but a change like that needs serious consideration since the overhead cost would be paid even by people who don’t plan to use the hook.

Personally, my intuition is that it might still be worthwhile. It’s hard for me to imagine a popular third-party library that expects the user to set attributes on modules imported from the library, enough times to become a performance concern. On the other hand, a tight loop repeatedly calling some library function is pretty easy to imagine (granted, the user could trivially cache this lookup).

On the other hand, it’s commonly enough incorrect to assign attributes to a module (after import) that I almost wonder if it needs to be supported by default. (If I had to choose, I find it more annoying not being able to set attributes on an ordinary object instance - that also makes it harder to teach beginners about the concept of attributes in general, at least with my teaching style).

Therefore, I like this suggestion, assuming it works as expected. It should address OP’s use case and, if anything, improve performance rather than making it worse. Meanwhile, people who want to do fancy things with setting attributes on modules still have the option of taking the performance hit of a user-defined subclass (and if they want to do really fancy things, the __setattr__ hook is right there for them, because they already defined a class in Python).

For bonus points, a bit of standard library support, along the lines of:

from types import ModuleType
import sys

def module_setattr_hook(func=None):
    """
    A decorator to enable custom logic for setting module attributes.
    Decorate a top-level function in a module, to make it implement
    attribute assignment for the module. It should accept three arguments:
    * self: types.ModuleType - the module being modified.
    * attr: str -> the name of the attribute being set.
    * value -> the new value for the attribute.
    Alternately, just call `module_setattr_hook()` to make it possible
    to set attributes with no special logic.
    """
    module = sys.modules[func.__module__]
    cls = module.__class__
    if cls is types.ModuleType:
        # base type, needs to be swapped with a user type
        cls = type('_UserModule', (types.ModuleType,), {})
        module.__class__ = cls
        # otherwise, just monkey-patch the existing type
    if func is not None:
        cls.__setattr__ = func
    return func

skirpichev · April 7, 2023, 4:41am

The problem is that I don’t see a measurable difference for naive tests with timeit (~same numbers in the main and on the branch):

$ cat b.py 
x = 1
$ python -m timeit -r11 -s 'import b' 'b.x=2'  # name exists in __dict__
5000000 loops, best of 11: 97.4 nsec per loop
$ python -m timeit -r11 -s 'import b' 'b.y=2'
5000000 loops, best of 11: 97.4 nsec per loop

That’s why this obvious concern wasn’t mentioned in my top post:)

The cost is an additional call PyDict_GetItemWithError() to check if there is a __getattr__ helper.

Also, as you pointed out, intensive using of attribute assignments for modules is a very exotic scenario, while reading attributes - is not. Unfortunately, the __class__ hack affects the second case even if you are using it to customize assignment of attributes… So, slight speed regression for exotic scenarios seems to be a fair trade off.

Only partially. It lacks a user-friendly exception message.

kknechtel · April 7, 2023, 6:25am

Well, yes; you’re comparing two use cases within the current existing functionality. If the built-in type had a __setattr__ hook, it would have to be invoked in both cases, and doing so would take more time than the current approach, which doesn’t.

Does the standard AttributeError not explain well enough, given the circumstance?
… Actually, I have some separate proposals there.

methane · April 7, 2023, 7:49am

Not only that. It makes:

Python semantics complex.
Cpython and other implementation complex.
Need special care for it when optimizing the interpreter.

Python is very dynamic language already. I am very conservative about making Python even more dynamic.

Emitting DeprecationWarning and lazy import are useful for many packages. That’s why I like module __getattr__. I don’t know how module __setattr__ is useful for packages other than this specific case.

skirpichev · April 7, 2023, 8:08am

You meant if the module had __setattr__ hook, right? That’s true, but we are talking about the cost for people who don’t plan to use the hook. In this case it should be safe to assume there is no such name in the module’s dict. From PEP 562:

This PEP may break code that uses module level (global) names __getattr__ and
__dir__. (But the language reference explicitly reserves all undocumented dunder
names, and allows “breakage without warning”

Well, it certainly doesn’t show to user how to fix the problem.

ntessore · April 7, 2023, 8:09am

Write-protecting module constants could be useful, if __setattr__ was called on all write access.

PS: But more appropriate and useful would be module-level descriptors.

methane · April 7, 2023, 8:13am

How is it important for Python future?
I think this change needs a PEP. The PEP can list such use cases.

kknechtel · April 7, 2023, 8:23am

It just occurred to me, trying to control __setattr__ on modules would also cause problems for packages. If I want to import a.b, it will be problematic if a is a module instance and the default module denies (via __slots__) permission to attach any attributes that weren’t defined during a’s creation.

Rosuav · April 7, 2023, 9:18am

I think the response to that would be “don’t do that, then”. Use of __slots__ would be incompatible with lazy loading of a package’s modules.

(Eager loading would be no harder to handle than any other names, you just import them all and then __slots__ = tuple(globals()) or whatever the recommended idiom is.)

kknechtel · April 7, 2023, 10:22am

I misunderstood your original suggestion, then. I thought you were proposing to add such a mechanism to the builtin type.

I figured that using __slots__ on non-package modules is already possible, if you derive from types.ModuleType. However, it seems that such a declaration is ignored.

methane · April 24, 2023, 4:35am

After reading PEP 713, I am +1 for both of __call__ and __setattr__.

skirpichev · April 24, 2023, 5:45am

Thanks for blessing. I’ll try to prepare a PEP. Unfortunately, probably now there is no chance that this could enter 3.12 if accepted…

Extend PEP 562 with __setattr__ for modules?

Extend PEP 562 with setattr for modules?