What are the rules telling which two overloaded methods are identical?

FelixLeg · March 27, 2024, 6:39pm

I’m writing this post out of curiosity. A great curiosity that appeared when I’ve discovered the typing module

Let’s say we have a set of @overload-ed methods. How a static type checker can decide that two methods from the set are ambiguous (from the typing POV)? Or rather how to tell that function A having A_p parameters and function B having B_p parameters (and their types) are identical? (I assume that when they are and their return types are different, then the checker tells they are ambiguous)

erictraut · March 27, 2024, 7:41pm

The rules for overlapping overloads are not currently specified in the typing spec. It’s one of many areas that could benefit from clarification in the spec.

The general answer to your question is… if two overloads could both accept the same argument list, then they are partially overlapping. That’s a problem if the two overloads produce different return results. Such a condition is a bug because the implementation of the overloaded function clearly cannot return values of two different types.

If all argument lists accepted by a later overload are also accepted by an earlier overload, then the two overloads completely overlap each other, and the later one is redundant. This is another condition that is probably indicative of a bug in the overload definitions.

The exact algorithm used by type checkers to determine whether two overloads overlap partially or completely is not documented. It’s very complicated in practice because overloads can include class-scoped and function-scoped type variables, *args and **kwargs parameters, parameters with default argument values, etc.

The technique that pyright uses is to determine whether a later overload method is “assignable to” (i.e. a subtype of) an earlier overload, but it treats type variables and unions differently than normal (depending on whether it’s trying to detect partial or full overlap). It also ignores the return type for this assignability check.

I’ll note that this algorithm is expensive because it requires n! type comparisons where n is the number of overloads.

FelixLeg · March 28, 2024, 10:04am

I’ve thought of this overnight and I think I may have found an algorithm to check if a set of overloaded methods aren’t ambiguous

Some assumptions

In the main part of my algorithm description I assumed we have got no other knowledge of an overloaded methods set than their argument names, types and kinds (‘position only’,‘positional’,‘vararg’,‘keyword only’ and ‘keyword vararg’). Also, for simplicity, I assumed method’s arguments have got no default values (later I’ll show a version without this assumption).

Positional arguments

Let’s say we have got this:

class Klass:
	@overload
	def F(a:int, /, b:str)                     #1
	@overload
	def F(a:int, /, b:str, c:list, *args:bool) #2
	@overload
	def F(a:str)                               #3
	@overload
	def F(a:int, b:str, *args)                 #4

(1) From it we create a set of lists containing the types of arguments:

F = {
	[int, str],                    #1
	[int, str, list, (bool, ...)], #2
	[str],                         #3
	[int, str, (Any, ...)]         #4
}

Some notes:

Position only and positional arguments are treated the same and are added in the order they are appearing in the source code,
If an overload contains an *args argument, we make an “infinite” list. I mean we thread the list as if it had an infinite list of the type of *args appended at the end,
Per convention if the *args has got no written type, we give it the Any type

(2) We divide the set into subsets whose the first type in a list is identical in all lists in a subset:

F = {
	{
		[int, str],
		[int, str, list, (bool, ...)],
		[int, str, (Any, ...)]
	},
	{
		[str]
	}
}

(3) We remove the first item in every list (unless the item is our infinite ending):

F = {
	{
		[str],
		[str, list, (bool, ...)],
		[str, (Any, ...)]
	},
	{
		[]
	}
}

If after this one subset contains an empty list that is the only list in this subset, then we mark the owner method of it (in the above case it is #3) as “unambiguous”. We then remove the subset as it won’t be used in our future calculations.

Now we do the steps (2) and (3) again:

F = {
	[],
	[list, (bool, ...)],
	[(Any, ...)]
}

If we are at the situation when there are lists that are either empty or contain an infinite ending and there are more than one of such a list in a subset, then we mark every owner of such lists as “ambiguous”, and remove them from the set.

Now we have got this situation:

F = {
	[list, (bool, ...)]
}

Because it is the only member of the set we mark its owner method as “unambiguous” and finish the algorithm.

In the next post I’ll show how to deal with keyword arguments.

FelixLeg · March 28, 2024, 10:04am

Now we have got these methods:

class Klass:
	@overload
	def F(*, a:int, b:str)                        #1
	@overload
	def F(*, a:int, b:str, c:list, **kwargs:bool) #2
	@overload
	def F(*, a:str)                               #3
	@overload
	def F(*, a:int, b:str, **kwargs)              #4

We start with a set of dictionaries with the name of an argument as a key and the argument’s type as a value. We also make a set of all arguments’ names in every overloaded methods:

F = {
	{"a": int, "b": str},                        #1
	{"a": int, "b": str, "c": list, "**": bool}, #2
	{"a": str},                                  #3
	{"a": int, "b": str, "**": Any}              #4
}
N = {"a", "b", "c"}

(1) We pop one value from the names set and divide the args set into subsets, so each subset has got a dict with value at the key equal to the name:

F = {
	{
		{"a": int, "b": str},
		{"a": int, "b": str, "c": list, "**": bool},
		{"a": int, "b": str, "**": Any}
	},
	{
		{"a": str}
	}
}
N = ("b", "c")

(2) We then remove all values in dicts with keys equal to the name:

F = {
	{
		{"b": str},
		{"b": str, "c": list, "**": bool},
		{"b": str, "**": Any}
	},
	{
		{}
	}
}
N = ("b", "c")

We got an empty dict in one subset which is also the only one dict in that set. We thus mark the owner of this dict as “unambiguous” and we remove the subset.

We do the steps (1) and (2) again:

F = {
	{
		{},
		{"c": list, "**": bool},
		{"**": Any}
	}
}
N = ("c")

If we have got a situation where a subset contains an empty dict or a dict with only "**" key and there are at least two of such dicts, then we mark the owners of such dicts as “ambiguous” and remove them from the subset.

Now we have got this:

F = {
	{
		{"c": list, "**": bool}
	}
}

Again, because it is the only dict in a subset, we mark the owner of the dict as “unambiguous” and finish the algorithm.

FelixLeg · March 28, 2024, 10:05am

Some notes:

If there are “normal” arguments in a method (they are not “position-only”, “keyword-only” or var-args) then such arguments are used in both of the algorithm calculations,
If one calculation marks a method as “unambiguous” and the other as “ambiguous”, then “ambiguous” wins.

What about arguments with default values

To make my algorithm work with such arguments, we have to make these steps:

in the first calculation when we add a list of a method with a default argument, we add two list: one with the argument and another without it. We repeat this as many times as there are such arguments. For all the result lists we assign them the same method owner:

class Klass:
	#...
	@overload
	def F(a:int, b:str = "abc", c:bool = True) #F1
	@overload
	def F(a:int)                               #F2
	#...

#this results in:
{
	... ,
	[int], #owner: F1
	[int, str], #owner: F1
	[int, str, bool], #owner: F1
	[int], #owner: F2
	...
}

If at some point we mark the same method as both “ambiguous” and “unambiguous” then, of course, “ambiguous” wins.

In the second calculation we do something similar but now we add every permutations of “has name x” and “has not name x”:

class Klass:
	#...
	@overload
	def F(a:int, *, b:str = "abc", c:bool = True)

#this results in:
{
	...
	{"a": int},
	{"a": int, "b": str},
	{"a": int, "c": bool},
	{"a": int, "b": str, "c": bool},
	...
}

Again, “ambiguous” wins in the calculation.

erictraut · March 28, 2024, 1:53pm

What you’re describing here are the (very simplified) rules involved in determining whether one callable type is a subtype of another callable type. You should assume that type checkers already know how to determine subtype relationships in the common case (although it would be good to document the subtyping rules for callables). My point is that the algorithm for determining whether two overloads are overlapping should largely leverage other existing type checking algorithms. This is how both mypy and pyright implement such checks.

The algorithm also needs to take into account subtyping of parameter types (which are considered contravariant when considering the subtyping relationship of two callables).

The algorithm also needs to handle TypeVars (with their variance rules), ParamSpecs, and TypeVarTuples used in each signature (and take into account whether they are class-scoped or function-scoped).

FelixLeg · March 28, 2024, 2:35pm

This is an one-night algorithm, it is not yet a production quality one I know I’s asking in the top post about determining if two functions completely overlap, but now I think determining if some methods are ambiguous may be enough. For a static type checker at least.

Well, I said at the beginning I made an assumption about what two types are considered equal by my algorithm. However, I think we may change how that is determined without changing the body of my algorithm at all. If we could turn the relationships between types into a graph-like structure, then, in order to determine if two types are “equal” in my algorithm, we could “just” try to find if a path exist between nodes of those types. I assume now that static type checkers have this thing covered (otherwise they shouldn’t be called type checkers )

Could You elaborate on that? I the way I see the thing, we could threat T in the definition of a method in the same way we treat any other type, like int, str, etc. just without that subtyping calculations. T == T always, I think

What are the rules telling which two overloaded methods are identical?

Some assumptions

Positional arguments

Keyword arguments

Some notes:

What about arguments with default values