Free-Threading instance method slow

Hi,
I have a question about free-threading. I’m not familiar with C/CPython, so I think the best way might be to ask here. My question is: why is the performance of accessing an instance method much slower than accessing a function?

Here is my benchmark code:

from threading import Thread
import time

class Cache:
    def get(self, i):
        return i

def get(i):
    return i

client = Cache()

def bench_run(runner):
    s = time.monotonic_ns()
    for i in range(500000):
        runner(i)
    d = time.monotonic_ns() - s
    print(f"{d:,}")


def bench_run_parallel(count, runner):
    tl = []
    for i in range(count):
        t = Thread(target=bench_run, args=[runner])
        tl.append(t)
        t.start()

    for t in tl:
        t.join()

if __name__ == '__main__':
    print("no threading class")
    bench_run(client.get)
    print("\nthreading class")
    bench_run_parallel(6, client.get)

    print("\nno threading function")
    bench_run(get)
    print("\nthreading function")
    bench_run_parallel(6, get)

Here are the results under Python 3.13 with GIL disabled PYTHON_CONFIGURE_OPTS='--disable-gil' pyenv install 3.13-dev and run with PYTHON_GIL=0 python benchmarks/run.py:

Python 3.13.0rc1+ experimental free-threading build (heads/3.13:364d366, Sep  2 2024, 11:15:09) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
no threading class
44,847,278

threading class
428,314,746
432,201,700
435,729,388
440,582,477
442,134,883
442,683,555

no threading function
41,123,279

threading function
57,099,611
56,514,552
58,396,146
60,016,758
60,897,913
61,631,976

And here are the results for Python 3.12:

no threading class
32,677,908

threading class
85,582,433
90,624,995
98,322,625
53,627,417
94,458,365
119,078,016

no threading function
31,259,819

threading function
104,334,277
92,584,059
159,836,531
156,181,817
138,702,641
77,736,503

Whilst it doesn’t answer your question, I would point out that this CPU bound activity is exactly what multi-threading does not make faster.