Performance of functools.singledispatch

The functools.singledispatch method has some overhead. On my system it is about 300-400 ns, estimated with the following code:

code="""
from functools import singledispatch

@singledispatch
def f(x):
    return x+1

def f0(x):
    return x+1
"""

import timeit
t1=timeit.timeit('f(0)', setup=code)
t2=timeit.timeit('f0(0)', setup=code)
print(f'plain {t1}, dispatch {t2}')

The overhead is in the wrapper method

    def wrapper(*args, **kw):
        if not args:
            raise TypeError(f'{funcname} requires at least '
                            '1 positional argument')

        return dispatch(args[0].__class__)(*args, **kw)

and the dispatch method that is called.

  • Would it be possible to implement the wrapper (and dispatch) in C for performance?
  • The dispatch overhead can be reduced a bit by using functools.cache. A branch is singledispatch_cache (it passed mosts tests, but 2 are failing). A possible problem with this might be that functools.cache does not use weak references, but I do not know whether that is an issue in this case. Would caching the dispatch using the existing cpython methods be an option?
2 Likes

I feel like caching is not gonna give us much of an edge for singledispatch. Re-implementing in C is an interesting idea. I can try doing that if core devs agree to it.

A reimplementation in C makes sense, just add it to _functools. I don’t use singledispatch so I have no good intuition about whether a cache makes sense. What tests failed? Could you prove that it matters by writing a small benchmark?

2 Likes

This is also something that I’ve thought about. It would be great to make more use of singledispatch (e.g. see the last part of my comment here) but it is just too slow right now. I agree that a reimplementation in C makes sense. Potentially if singledispatch was heavily used (as it is in some other languages) it would make sense to have its own CPython opcodes or something more deeply integrated than just a C implementation for calling the function.

If singledispatch was used heavily then there are two main things to consider for performance:

  1. Import time (if the singledispatch decorator was used many times during import of commonly used modules).
  2. Per-call overhead.

Your benchmark times the second of those two points but the first is also important (more important for many applications that don’t actually call the function!).

The overhead identified in the wrapper function should be something that can be made small by a C implementation but after that the basic dict lookup in dispatch_cache is potentially slow for common use cases. If singledispatch was used heavily then you would probably find that most functions would have only a small number of registered types so it’s possible that at the C level something faster than a dict lookup could be used for the dispatch.

I doubt that it will ever be more than relatively niche. Python just has enough patterns already for dispatching function calls.

2 Likes

Which dispatching pattern would you recommend for mathematical functions like sin, cos, exp etc.?

There could be methods like sin or dunders like __sin__ but that’s opening a can of worms to a lot of methods and they all need to be added to float, int, complex etc for the dispatching to be useful.

Using a builtin sin, cos, etc from math module or using a sequence of if statements

The failing tests are in the details below.

A small benchmark:

code="""
from functools import singledispatch

@singledispatch
def f(x):
    return x+1

@f.register(int)
def ff(x):
    return x-1

def f0(x):
    return x+1

f(1), f(1.), f0(1), f0(1.)
"""

import timeit
t1=timeit.timeit('f(0); f(0.)', setup=code)
t0=timeit.timeit('f0(0); f0(0.)', setup=code)
print(f'plain {t0}, dispatch {t1}')

Results in:

plain 0.09603433899974334, dispatch 0.9053597629999786 # master
plain 0.09667534300024272, dispatch 0.7156071749996045 # branch

So there is some value in the cache, but right now an implementation in C seems better.

Output of python Lib/test/test_functools.py
...........................................................................................................................................................................................................F...........F...................................
======================================================================
FAIL: test_cache_invalidation (__main__.TestSingleDispatch.test_cache_invalidation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eendebakpt/cpython/Lib/test/test_functools.py", line 2330, in test_cache_invalidation
    self.assertEqual(td.get_ops, [list, dict])
AssertionError: Lists differ: [] != [<class 'list'>, <class 'dict'>]

Second list contains 2 additional elements.
First extra element 0:
<class 'list'>

- []
+ [<class 'list'>, <class 'dict'>]

======================================================================
FAIL: test_mro_conflicts (__main__.TestSingleDispatch.test_mro_conflicts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eendebakpt/cpython/Lib/test/test_functools.py", line 2180, in test_mro_conflicts
    self.assertEqual(g(o), "set")     # because c.Set is a subclass of
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'sized' != 'set'
- sized
+ set


----------------------------------------------------------------------
Ran 251 tests in 0.148s

FAILED (failures=2)

The functions like sin in the math module do not dispatch on their types. They coerce everything to float and then return a float result that can have only 53 bits of precision and can’t represent intervals, exact results, etc. The reason there are so many different modules that define their own separate sin functions is precisely because math.sin does not do what is needed in many situations. Even the stdlib has its own separate cmath.sin function.

A bunch of if statements is not extensible. The idea is to have cooperation between many different libraries e.g. there can be libraries that define numeric types and libraries that define generic algorithms for operating on numeric types. For the if/elif to work then every library implementing a generic algorithm needs to know about and enumerate all of the possible numeric libraries and types that it might be used with and needs to have separate code to handle each one.

The potential for what can be done in generic code with singledispatch is so much greater than what is currently possible in Python (e.g. look at what you can do with this in Julia for comparison).