__getattr__ is much slower in Python3.11

I run this code on Python3.11.1 and Python3.10.4:

import time
import sys

class A:
    def foo(self):
        print("Call A.foo!")

    def __getattr__(self, name):
        return 2

    @property
    def ppp(self):
        return 3

class B(A):
    def foo(self):
        print("Call B.foo!")


class C(B):

    def __init__(self) -> None:
        self.pps = 1

    def foo(self):
        print("Call C.foo!")


def main():
    start = time.time()
    for i in range(1, 1000000):
        pass
    end = time.time()
    peer = end - start
    c = C()
    print(f"Python version of {sys.version}")
    start = time.time()
    for i in range(1, 1000000):
        s = c.pps
    end = time.time()
    print(f"Normal getattr spend time: {end - start - peer}")
    start = time.time()
    for i in range(1, 1000000):
        s = c.ppa
    end = time.time()
    print(f"Call __getattr__ spend time: {end - start - peer}")
    start = time.time()
    for i in range(1, 1000000):
        s = c.ppp
    end = time.time()
    print(f"Call property spend time: {end - start - peer}")

if __name__ == "__main__":
    main()

My environment is debian 10.3 and I got this result:

Python version of 3.11.1 (main, Dec 26 2022, 16:32:50) [GCC 8.3.0]
Normal getattr spend time: 0.03204226493835449
Call __getattr__ spend time: 0.4767305850982666
Call property spend time: 0.06345891952514648

Python version of 3.10.4 (main, Dec 15 2022, 11:24:32) [GCC 8.3.0]
Normal getattr spend time: 0.044233083724975586
Call __getattr__ spend time: 0.3127727508544922
Call property spend time: 0.08991670608520508

As you can see, Python3.11.1 is faster than Python3.10.4, except for __getattr__.
I compared slot_tp_getattr_hook in typeobject.c in Python3.11.1 with Python3.10.4, and changed _PyType_Lookup back to _PyType_LookupId. But it doesn’t work and is still slower.

Can anyone know why? Thanks.

More Infos.

I find a way to narrow down the possible range of the issue. What we need to is hacking slot_tp_getattr_hook. I change this function like this, notice begin hack and end hack.

    if (getattr == NULL) {
        /* No __getattr__ hook: use a simpler dispatcher */
        tp->tp_getattro = slot_tp_getattro;
        return slot_tp_getattro(self, name);
    }
    // begin hack
    PyObject *tp_name;
    tp_name = PyType_GetName(tp);
    if(strcmp(PyUnicode_AsUTF8(tp_name), "HackGet") == 0)
    {
        Py_DECREF(tp_name);
        return hack_slot_tp_getattr_hook(self, name);
    }
    // end hack
    Py_INCREF(getattr);

Then I add a new function:

static PyObject *
hack_slot_tp_getattr_hook(PyObject *self, PyObject *name)
{
    PyTypeObject *tp = Py_TYPE(self);
    PyObject *getattr, *res;
    getattr = _PyType_Lookup(tp, &_Py_ID(__getattr__));
    if (getattr == NULL) {
        Py_RETURN_NONE;
    }

    Py_INCREF(getattr);
    res = PyObject_GenericGetAttr(self, name);
    PyErr_Clear();
    res = call_attribute(self, getattr, name);

    Py_DECREF(getattr);
    return res;
}

Now we can rebuild python and use this testcase:

import time
import sys

NUM = 1000000

class HackGet(object):
    def __init__(self) -> None:
        super(HackGet, self).__init__()
        self.pps = 2

    def __getattr__(self, name):
        return 4

def main():
    start = time.time()
    for i in range(1, NUM):
        pass
    end = time.time()
    peer = end - start
    h = HackGet()
    print(f"Python version of {sys.version}")
    start = time.time()
    for i in range(1, NUM):
        s = h.ppp
    end = time.time()
    print(f"Call hack __getattr__ with error spend time: {end - start - peer}")
    start = time.time()
    for i in range(1, NUM):
        s = h.pps
    end = time.time()
    print(f"Call hack __getattr__ with no error spend time: {end - start - peer}")

if __name__ == "__main__":
    main()

Run this script and I get this result:

$ python test_getattr.py
Python version of 3.11.1 (main, Feb 22 2023, 21:14:02) [GCC 8.3.0]
Call hack __getattr__ with error spend time: 0.6272189617156982
Call hack __getattr__ with no error spend time: 0.1733100414276123

From this test, it seems that call_attribute is cheap, but when python3.11
call PyObject_GenericGetAttr , finding nothing, and call PyErr_Clear, it’s expensive.

I made the same modification to Python 3.10.4 and the comparison result is as follows:

$ /usr/local/bin/python3.11 test_getattr.py
Python version of 3.11.1 (main, Feb 22 2023, 21:14:02) [GCC 8.3.0]
Call hack __getattr__ with error spend time: 0.6420018672943115
Call hack __getattr__ with no error spend time: 0.1714766025543213

$ /usr/local/bin/python3.10 test_getattr.py
Python version of 3.10.4 (main, Feb 23 2023, 15:12:13) [GCC 8.3.0]
Call hack __getattr__ with error spend time: 0.4415709972381592
Call hack __getattr__ with no error spend time: 0.1987009048461914

From the above results, it seems the main performance gap between Python 3.10.4 and Python 3.11.1 is PyObject_GenericGetAttr. Python3.11.1 is more slower when it finds nothing.

I think I finally found the reason. _PyObject_GenericGetAttrWithDict is the key function. The normal test result is : spend time: 0.6592750549316406.
If I delete this code:

set_attribute_error_context(obj, name);

The result will change to: spend time: 0.47563862800598145.

And if I delete raising exception:

// PyErr_Format(PyExc_AttributeError,
//                     "'%.50s' object has no attribute '%U'",
//                     tp->tp_name, name);

it will become much faster: spend time: 0.19989752769470215.

If Python3 fails finding an attribute in normal ways, it will return NULL and raise an exception. Bug raising an exception has performance cost. Python3.11.1 add set_attribute_error_context to support ine Grained Error Locations in Tracebacks. It makes things worser.

Back to this question, when we define __getattr__, failed to find an attribute is what we expected. If we can get this result and then call __getattr__ without exception handling, it will be faster.
I tried to modify Python3.11.1 like this:

  1. add a new function in object.c:
PyObject *
PyObject_GenericTryGetAttr(PyObject *obj, PyObject *name)
{
    return _PyObject_GenericGetAttrWithDict(obj, name, NULL, 1);
}
  1. change typeobject.c :
    if (getattribute == NULL ||
        (Py_IS_TYPE(getattribute, &PyWrapperDescr_Type) &&
         ((PyWrapperDescrObject *)getattribute)->d_wrapped ==
         (void *)PyObject_GenericGetAttr))
        // res = PyObject_GenericGetAttr(self, name);
        res = PyObject_GenericTryGetAttr(self, name);
    else {
        Py_INCREF(getattribute);
        res = call_attribute(self, getattribute, name);
        Py_DECREF(getattribute);
    }
    if (res == NULL) {
        if (PyErr_ExceptionMatches(PyExc_AttributeError))
            PyErr_Clear();
        res = call_attribute(self, getattr, name);
    }
    Py_DECREF(getattr);
    return res;

Rebuild python, it really become faster: spend time: 0.13772845268249512.

Could you open an issue on GitHub?
Sign in to GitHub · GitHub
We would like to fix this, but it is hard to track on a discussion forum.

Thanks.

On my way, with pleasure.
Thank you!

For reference, the GitHub issue is Improve the Efficiency of Python3.11.1 __getattr__ · Issue #102213 · python/cpython · GitHub

This change causes a fatal crash in downstream.

See Maximum recursion depth exceeded in __getattr__(). · Issue #103272 · python/cpython · GitHub