Multiple dispatch based on typing.overload

I’ve updated pyright to remove the check for non-empty @overload implementations.

I’ll leave it to others to validate whether any changes are needed for pyre and pytest and coordinate with the maintainers of those type checkers if a change is needed.

2 Likes

I think that this is a mistake that encourages the wrong feature to be abused for situations for which it was not designed which will only lead to brokenness in the future.

From what I understand, there are a numerous multidispatch libraries that are already using @overload in this manner or are actively exploring the use of it. These libraries currently work (perhaps with some problematic edge cases) with pycharm and mypy but do not work with pyright because of the additional check that it implements. Is that a correct summary of the situation?

I’ll note that there is nothing in PEP 484 that indicates a type checker must implement the check that pyright implements. As I said above, I added it because I saw some pyright users express confusion about the intended use of @overload, and this error message was designed to guide them in the right direction. I don’t feel that strongly about retaining it.

@oscarbenjamin, if I understand you correctly, you would like to see a new stdlib mechanism for multidispatch (functools.dispatch) targeted for inclusion in Python 3.13 or later, and you’d like all type checkers and language servers to implement support for this new mechanism. Presumably, all third-party multidispatch libraries would be deprecated / abandoned after this functionality is standardized. Do I understand that correctly? As a long-term plan, that sounds reasonable, but making this happen will require significant time and effort. Are you signing up to drive that initiative? Just to set expectations, I think it will take at least two years before this functionality is available in the runtime and broadly available in type checkers and language servers.

In the short term, what is your preferred course of action?

  1. Make it clear that @overload should not be used for multidispatch implementations. Discourage any such experimentation in that direction, and recommend that mypy, pycharm, etc. implement the same error as pyright to further discourage such experimentation.
  2. Stick with the status quo where developers who want to experiment with the use of @overload for multidispatch must use pycharm or mypy but not pyright / pylance.
  3. Remove the check from pyright and allow multidispatch experimentation using the existing @overload mechanism. Don’t recommend (but also don’t forbid) the use of @overload for multidispatch. Allow library authors to decide whether it meets their needs. I admit this solution is not perfect, but it fills a short-term need, and the experimentation could inform the design for the stdlib mechanism.
  4. Some other option?

I don’t think you’ll have success in convincing pycharm, mypy, and the other type checkers to close off this use case, so option 1 probably isn’t viable. Option 2 isn’t appealing to me because it unnecessarily penalizes users of pyright and pylance. Number 3 seems like the most pragmatic option — unless there’s another option that I haven’t considered.

3 Likes

On the other hand, the third option is the one I think is least acceptable. In fact, it isn’t multiple dispatch at all, because you have to specify which choice to make. You might as well simply call my_func_list_int in the example you gave, and not bother with multiple dispatch at all.

I’m more neutral on whether you call it multi-dispatch vs something similar like runtime type based overloading. With only two maybe my_func_list_int works. With dozens of dispatch options/hundreds having them specifiable by type when needed becomes very helpful and allows you to build functions on top of them. Hundreds isn’t hyperbolic as I have written code that keeps registry of type based runtime dispatches and has several hundred dispatch options. cattrs would be closest open source equivalent that does runtime type selection based on registry of functions and you can register many converters.

One reason to add for why I’ve not fond of option 2 is with python broad types it’s fairly easy to have one object match multiple types.

class Foo:
  ...

class Bar:
  ...

class Baz(Foo, Bar):
  ...

If you define dispatch for Foo + Bar and pass Baz which one should it pick? Should it always pick same one? Sometimes I may view one dispatch as more useful then other depending on specific caller. In practice I tend to see this more with protocols then direct multiple inheritance.

While having default behavior of not specifying type is fine for simple cases, you will either have to restrict a lot of types as not possible/difficult to support or have weird rules. There are multiple libraries that try to support runtime isinstance on more complex types (typeguard/beartype) and they make very different design choices and come with their own restrictions to what types they support.

If tradeoff exclude many types (most generics/protocols) is fine, then option 1 is reasonable. Even non-generic/protocol types often have challenges handling runtime.

class RequestData(TypedDict):
  ...

r = RequestData()

At runtime r is of type dict, not RequestData. Possible to use heuristics to identify it but even TypedDict can quickly lead to challenges detecting at runtime.

Questions like this also lead me towards -0 for this feature existing in standard library. You can implement 3rd party multi-dispatch like functions fine and different libraries are free to pick different choices for how to decide the dispatch used. It can somewhat type check. Not all complex functions need to type check and there are always escape hatches. For a long time functools.partial didn’t type check well. I’m unsure type checkers today even handle functools.singledispatch well yet people happily use it.

edit: To clarify cattrs example this function is similar to option 3 described. You can pass it an object and let it decide dispatch automatically, but in harder cases you can pass type explicitly for which type to treat it as. So serialization simple cases runtime dispatch is fine, harder cases difficult to identify dispatch. For deserialization you always pass expected type. If you always pass expected type you can make that type check completely with pyright no changes needed to type system/standard library. With mypy this partially type checks (many types will, some more complex annotations won’t).

I agree that no one should prevent third-party libraries from developing in this area. Regardless of whether a future stdlib implementation might exist there will always be the third party libraries that already exist.

What I object to here is using @overload specifically rather than some other decorator or something else that is explicitly intended for the purpose of dispatch (perhaps like typing.dispatch that was suggested above).

This is an example where it seems like it almost works if @overload is abused for a purpose for which it was not designed, but if you follow that through then everything will end up being based on mis-designed foundations. The typing rules that are defined for @overload are not suitable for multiple dispatch. Allowing or encouraging @overload to be used like this will have the effect that the unsuitable semantics that currently exist in type checkers for @overload will end up being a sort of de facto standard that then makes it impossible to marry typing with the way that multiple dispatch actually should work at runtime. Regardless of whether there exists a stdlib implementation of multiple dispatch the right way forward here is to somehow have typing rules that are actually intended to be suitable for multiple dispatch rather than reusing @overload out of convenience.

Another possibility would be to change the rules for @overload. I bet that if you look closely at the corner cases though then you will see that just as it is undesirable to try to shoehorn multiple dispatch to the current semantics of @overload it would also not seem desirable to change @overload to match what real implementations of multiple dispatch do and would break much existing use of @overload.

I have said several times above that the first thing to do here is to explore the semantics that exist in third party multiple dispatch libraries in detail. Especially the multipledispatch library which is more mature and far more widely used then the other libraries mentioned here is the one that should be checked (it doesn’t use type hints because it predates typing but its semantics for how dispatching works are what should be checked). What is needed is to look at how different multiple dispatch implementations compare with each other and how they compare with the rules for @overload. I am sure that if anyone looks closely at this then they will see that the rules for @overload do not match what any complete implementation of multiple dispatch does and also that it cannot work correctly to try to shoehorn multiple dispatch into those rules.

Here is a basic example of how @overload cannot work correctly. The rules for @overload depend on the “order” of the overloads. How can it then handle a situation where the overloads are in different modules?

In multiple dispatch it is very likely that you will want to define different dispatches in different modules. A big part of the point of multiple dispatch when used in a good way is that it can make types extensible from the outside: you can customise behaviour for classes and functions that you do not own and whose methods or code you cannot change directly. In practice this means that your overloads will be distributed across different modules (and different projects etc).

4 Likes

Isn’t that the fundamental design decision for any multiple (or single!) dispatch system? And yes, it’s hard, which is why multiple dispatch is both uncommon and complex, but if you don’t solve it, then IMO you’re not implementing multiple dispatch.

It’s also worth noting that your example is single dispatch, which is already available in the stdlib. The stdlib implementation only dispatches on classes (using a type annotation list[int] gives “TypeError: Invalid annotation for ‘x’. list[int] is not a class”) and the reasons behind that are not just because you can’t tell at runtime if something is a list[int] - it’s also because the resolution rules rely on the MRO.

Furthermore, adding a parameter that lets the user choose the resolution doesn’t feel like “multiple dispatch” to me, it just feels like “two implementations that you choose between based on a parameter”. Of course, there are shades of grey here, and an API design may benefit from some level of type-based dispatch without being fully automatic, but that’s not (IMO) what people normally mean when they say “multiple dispatch”.

I think that before anyone starts making changes to the language (or to type checking rules, or any of the other typing-related stuff that has the force of a language change these days…) in order to support multiple dispatch, it’s necessary to come up with a proposal for what multiple dispatch even means in Python. It’s a significantly complex problem (even single dispatch is non-trivial) and while it’s OK for 3rd party libraries to make different trade-offs (in the spirit of experimentation), doing anything at the language level before a common design is established is very risky. And by that I even mean “using the standard @overload decorator”. Ideally, libraries could experiment using a @lib.overload decorator, but as we’ve established, even that requires some sort of meta-decorator to be standardised.

5 Likes

@oscarbenjamin

What I object to here is using @overload specifically rather than some other decorator or something else that is explicitly intended for the purpose of dispatch (perhaps like typing.dispatch that was suggested above).

I think this is very much in line with the original typing PEP 484, which also made some explicit notes about @overload and dispatch:

NOTE: While it would be possible to provide a multiple dispatch implementation using this syntax, its implementation would require using sys._getframe() , which is frowned upon. Also, designing and implementing an efficient multiple dispatch mechanism is hard, which is why previous attempts were abandoned in favor of functools.singledispatch() . (See PEP 443, especially its section “Alternative approaches”.) In the future we may come up with a satisfactory multiple dispatch design, but we don’t want such a design to be constrained by the overloading syntax defined for type hints in stub files. It is also possible that both features will develop independent from each other (since overloading in the type checker has different use cases and requirements than multiple dispatch at runtime – e.g. the latter is unlikely to support generic types).
PEP 484 – Type Hints | peps.python.org

1 Like

Could it be added as typing_extensions.dispatch for experimentation purposes?

I am not sure how the type checkers could handle it properly if the semantics for runtime dispatch are not defined or agreed anywhere. Certainly type checkers should not use the same rules as for @overload but perhaps there is some space to define more limited or incomplete rules that type checkers could follow while runtime behaviour is explored?

Actually, why do we need any typing changes to experiment with runtime behaviour? functools.singledispatch manages fine with no special typing support, after all.

2 Likes

Type checkers apparently need special support just to be able to ignore things that they don’t understand. In this case defining two functions with the same name:

from somewhere import dispatch

@dispatch
def f(x: int):
    return x + 1

@dispatch
def f(x: str):
    return x + '1'
$ mypy t.py
t.py:1: error: Cannot find implementation or library stub for module named "somewhere"  [import]
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:7: error: Name "f" already defined on line 3  [no-redef]
Found 2 errors in 1 file (checked 1 source file)

SymPy had hundreds of type: ignore for this that eventually went away by making all multipledispatch functions be called _:

1 Like

The way singledispatch works at type checking time depends on type checker. Mypy special cases it and partly understands it. Other type checkers I think mostly don’t understand it and Any is used as an escape hatch.

I think that’s fine for runtime experimentation. There are other library decorators that can’t be understood in type system today and they just fallback to using Any as needed.

Overload is currently special cased here. Normally defining same function name twice is considered an error. The underscore is one valid trick I think both mypy/pyright support although unsure its documented trick. I think for both mypy/pyright the error is a unique type so you could disable that specific error type (no-redef for mypy/obscuredFunctionDeclaration for pyright). Or you can use a small lie and do,

if TYPE_CHECKING:
  dispatch = overload
else:
  from somewhere import dispatch

Yeah, that’s how singledispatch does it, and it seems like a perfectly reasonable approach while people are still trying to come up with an algorithm that could be included in the stdlib as “the official approach” (which could then gain special typing support).

1 Like
@dispatch(checker=lambda ls: isinstance(ls, list) and len(ls) == 0 or isinstance(ls[0], int)
def my_func(ls: list[int]):
  ...

@dispatch
def my_func(ls: int):
  ...

What happens when you add a third dispatch type, that’s competing like def my_func(ls: list):?

My time spent trying to do stuff like is that its really difficult to come up with a system that would make sure the list[int] implementation is dispatched over list when ls = [1,2,3]. Naively I was originally just letting which ever was defined first be picked. What’s the correct dispatch though, the most specific? The least specific? Dispatching on nested types like this doesn’t seem straight forward in the slightest.

Its really easy in single dispatch case, take the type and look for it in a dictionary, which is doable in multidispatch case too. We throw in these nested types however and it gets tricky.

I’m also not a fan of the beartype methodology of verifying the nested type by only checking the first element. I’ve never understood how anyone can feel comfortable relying on that.

2 Likes

If the checker is a TypeGuard, then the type checker could reasonably do the inference too?

1 Like

In my opinion, it would be incredibly useful to enable some form of type-compliance for (users of) third-party implementations of multiple dispatch. Currently, the only pattern I’m aware of that sort of works uses overload.typing, but this pattern is very far from optimal. It is my impression that the inability to use a multiple dispatch library effectively with mypy or pyright is not appealing and even discouraging.

I agree with @oscarbenjamin that it feel wrong to coerce typing.overload to perform multiple dispatch, which it was not originally designed for.

This would go some way, but the use cases would still be limited because of (1) the inability to specify methods in multiple files and, related, (2) the inability to add additional @overloads after the implementation.

I think I’m more and more gravitating towards a typing.dispatch decorator that abides to the following semantics:

  • All methods decorated with typing.dispatch must have an implementation. This means that there is no separation between abstract specification of methods and the implementation, like there is for typing.overload.

  • Type checkers follow imports to find all methods specified by typing.dispatch. For example, if a.py had

from typing import dispatch

@dispatch
def f(x: int):
    ...

and b.py had

from typing import dispatch
from a import f

@dispatch
def f(x: str):
    ...

print(f("1"))
print(f(1))

then the print-statements should match the methods in the different files.

This would be far from a perfect solution, but it might already go a long way.

1 Like

You need to study carefully how this can work at runtime before proposing any rules for typecheckers. In your example here the runtime @dispatch mechanism would have no way to know that the function f was imported from a in this module.

Are you proposing here that every function called f that is decorated by @dispatch would be collected globally?

That is not how existing implementations of runtime dispatch work. Even single dispatch does not work like that:

from a import f

@f.register
def _(x: str):
    ...

Here it is the use of @f.register that informs the runtime dispatcher that this is a dispatch that is connected to the same runtime function f that was defined in the other module. This is the same information that the typechecker would need.

A problem both at runtime and for typecheckers is that a dispatch mechanism might be defined that is not traceable from the imports here. Suppose I add another module c.py with:

# c.py

from a import f

@f.register
def _(x: float):
    ...

Now if the b.py module does not import c then this registration might not actually happen at runtime so a call to f(2.0) in another module might not be dispatched to the function that is defined in c.py for float. Also a typechecker might have no way to know that it should even look in the .c.py module to find the dispatch rule for f(2.0).

The way to resolve this is that you have to guarantee that if both the types in question and the dispatching function have been imported into a module then all of the relevant dispatch rules will have also been imported. So if you import a dispatching function f from somewhere and define a new type C then you can implement dispatch for it but you do it in the same module or ensure that importing the module imports the dispatch rules:

# d.py

from a import f

class D: pass

@f.register
def _(arg: D):
    ...

# No one else can do "from d import D" without the dispatch
# rule for f with D being defined at runtime.

Alternately if you are defining the new dispatch function then you can implement it for any types that are defined elsewhere but no one else should do that:

# e.py

from d import D

@dispatch
def g(arg: Any):
    raise NotImplementedError

@g.register
def g(arg: D):
    ...

# No one else can do from e import g without the dispatch
# rule for g with D being defined at runtime.

There is no way to enforce this at runtime in Python but maybe there is a way that typecheckers could help with this.

This is analogous to Rust’s rules for implementing traits for types:

One restriction to note is that we can implement a trait on a type only if at least one of the trait or the type is local to our crate

This restriction is part of a property called coherence , and more specifically the orphan rule , so named because the parent type is not present. This rule ensures that other people’s code can’t break your code and vice versa. Without the rule, two crates could implement the same trait for the same type, and Rust wouldn’t know which implementation to use.

https://doc.rust-lang.org/book/ch10-02-traits.html

The same concept in Julia is referred to under the heading of “type piracy”:
https://docs.julialang.org/en/v1/manual/style-guide/#Avoid-type-piracy-1

In Julia type piracy is something that is discouraged but is also considered something that can be made to work if used with care so it is mentioned in the style guide rather than being restricted in the rules of the language.

In Python things like type piracy or monkeypatching that can work at runtime but are discouraged are often rejected by typecheckers. That means that the question of exactly how things work at runtime and what rules typecheckers implement is not always the same. It is problematic though to have divergence between runtime behaviour and typechecker rules because regardless of whether you think something should be discouraged we want to be able to have accurate type inference which cannot work if type rules do not match what happens at runtime.

Related to divergence of types vs runtime PEP 484 which introduced type hints effectively declares that int is a subtype of float as a “straightforward shortcut”. That is already problematic for typechecking now but also unworkable in the runtime dispatch case because we definitely do want to be able to define different dispatch rules for int and float at runtime. I suggest that any runtime or typechecking rules for multiple dispatch should not apply this special casing of int and float.

2 Likes

Also, what about

from typing import dispatch
from a import f as something_else

@dispatch
def f(x: str):
    ...

@dispatch
def something_else(x: str):
    ...

print(something_else("1"))
print(f(1))

Which override gets called?

1 Like

@oscarbenjamin, thank you for your very detailed reply. :slight_smile:

You’re completely right about this. In practice, there obviously is the f.dispatch pattern, where you import a function and then extend it explicitly. I’ve also seen a global pattern, where @dispatch identifies functions by name. In this more global pattern, if

# a.py
from typing import dispatch

@dispatch
def f(x: int):
    pass
# b.py
from typing import dispatch

@dispatch
def f(x: str):
    pass
# c.py
from typing import dispatch

import a
import b

@dispatch
def f(x: float):
    pass

# Now `f` has all three methods for `str`, `float`, and `int`, because `@dispatch`
# aggregates methods by name: `f`.

That is, you’re right.

This would be one possible mechanism.

The global pattern can be generalised to “namespaces”, where dispatch = Dispatcher(), and this @dispatch aggregates methods by name of the function.

The reason why I didn’t consider runtime initially is that I wasn’t completely sure what would be feasible and sensible from a type checking point of view. For example, the global pattern would be extremely difficult if not impossible to implement, because the type checker doesn’t know which files define methods.

I’m suspecting that, in practice, there might be always be a discrepancy between runtime and type checking. However, we might be able to mitigate these discrepancies by coming up with a mechanism and associated conventions that are reasonable from both a runtime and type checking point of view. For example, this sounds very reasonable to me:

The convention could be that, whenever you want to type check a function that uses multiple dispatch, you need to make sure that all methods that you’re using can be found with explicit imports of the form from module import f. The type checker could then dig into the imports and find all relevant methods. Perhaps this runs into computational issues…

It depends on how you approach it. One approach is to say that, regardless what comes before, whenever you @dispatch a function named f, you add a method for a global function f. To access that global function f, you would need a reference to an f that was previously @dispatched. In your example, with this global mechanism, the first print statement would call the implementation something_else(str), and the print statement would call the implementation f(int) in a. If a does not implement this, it would raise an error. The function f as something_else imported from a wouldn’t do anything, because it would be overwritten by the method for the global version of something_else.