This is also something that I’ve thought about. It would be great to make more use of singledispatch
(e.g. see the last part of my comment here) but it is just too slow right now. I agree that a reimplementation in C makes sense. Potentially if singledispatch
was heavily used (as it is in some other languages) it would make sense to have its own CPython opcodes or something more deeply integrated than just a C implementation for calling the function.
If singledispatch
was used heavily then there are two main things to consider for performance:
- Import time (if the
singledispatch
decorator was used many times during import of commonly used modules). - Per-call overhead.
Your benchmark times the second of those two points but the first is also important (more important for many applications that don’t actually call the function!).
The overhead identified in the wrapper function should be something that can be made small by a C implementation but after that the basic dict lookup in dispatch_cache
is potentially slow for common use cases. If singledispatch
was used heavily then you would probably find that most functions would have only a small number of registered types so it’s possible that at the C level something faster than a dict lookup could be used for the dispatch.