Make `@overload` less verbose

Overloading can become extremely verbose very fast, because all arguments have to be repeated every time. However, often only the type hints on a few arguments change.

It would be nice to be able to abbreviate overloads by skipping arguments, and deferring their type hints. Consider this example from pandas-stubs.

Series.reset_index type hint
    @overload
    def reset_index(
        self,
        level: Sequence[Level] = ...,
        *,
        drop: Literal[False] = ...,
        name: Level = ...,
        inplace: Literal[False] = ...,
        allow_duplicates: bool = ...,
    ) -> DataFrame: ...
    @overload
    def reset_index(
        self,
        level: Sequence[Level] = ...,
        *,
        drop: Literal[True],
        name: Level = ...,
        inplace: Literal[False] = ...,
        allow_duplicates: bool = ...,
    ) -> Series[S1]: ...
    @overload
    def reset_index(
        self,
        level: Sequence[Level] = ...,
        *,
        drop: bool = ...,
        name: Level = ...,
        inplace: Literal[True],
        allow_duplicates: bool = ...,
    ) -> None: ...
    @overload
    def reset_index(
        self,
        level: Level | None = ...,
        *,
        drop: Literal[False] = ...,
        name: Level = ...,
        inplace: Literal[False] = ...,
        allow_duplicates: bool = ...,
    ) -> DataFrame: ...
    @overload
    def reset_index(
        self,
        level: Level | None = ...,
        *,
        drop: Literal[True],
        name: Level = ...,
        inplace: Literal[False] = ...,
        allow_duplicates: bool = ...,
    ) -> Series[S1]: ...
    @overload
    def reset_index(
        self,
        level: Level | None = ...,
        *,
        drop: bool = ...,
        name: Level = ...,
        inplace: Literal[True],
        allow_duplicates: bool = ...,
    ) -> None: ...

The many overloads with many arguments make it hard to see at a glance what is going on. Compare this with the following, with some simplifications:

@overload
def reset_index(
    ...,
    *,
    inplace: Literal[True],
    ...,
) -> None: ...
@overload
def reset_index(
    ..., 
    *,
    drop: Literal[True],
    ...,
) -> Series[S1]: ...
@overload
def reset_index(
    ...,
    *,
    drop: Literal[False],
    ...,
) -> DataFrame: ...
@overload
def reset_index(
    self,
    level: Sequence[Level] | Level | None = ...,
    *,
    drop: bool = ...,
    name: Level = ...,
    inplace: bool,
    allow_duplicates: bool = ...,
) -> None | Series[S1] | DataFrame: ...

Which immediately makes it clear which arguments give which return. How this should work is when a ... is encountered in an @overload, then the type-hints of missing arguments are deferred to later overloads. Obviously this is just a very rough idea, but I wonder if other people feel the same.

Another benefit could be with refactoring. If an extra argument is added, we only have to modify the last overload instead of adding it to every existing overload.

2 Likes

While I do like the terseness of this idea, unfortunately it will not be possible without changes to the syntax, which makes it harder to justify, especially since ... outside of a function overload would be nonsensical but it would still have to be valid syntax otherwise. So I don’t think this makes sense unless overloads become their own AST node with their own grammar rules.

Maybe we can come up with a way that’s syntactically valid, non-ambiguous and still relatively terse.

Maybe something like *_: auto and **_: auto meaning the elided arguments would be determined by the following overloads. The issue with that approach however is that we can’t easily and non-ambiguously elide positional arguments, maybe we’d need something like _: ellide(n) to skip n positional arguments.

There’s an existing thread where I also shared some of my thoughts about how to at least reduce the number of overloads you have to write due to the complications introduced through arguments that can either be positional or keyword: Some improvements to function overloads

Python 3.3.7+ (default, Jun 21 2022, 18:59:46) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ...
Ellipsis

Problem solved. :slight_smile:

I think the point was that it would have to be valid syntax as part of the grammar for a function header, in the place where currently an identifier is expected.

>>> def test(...): pass
  File "<stdin>", line 1
    def test(...): pass
             ^
SyntaxError: invalid syntax

It occurs to me that if there were some kind of built-in type that represented the arguments to a function, along with a way to collect them (regardless of whether the parameters use *args/**kwargs, then one could use match-case for the dispatch instead of @overload. But maybe that isn’t particularly helpful. (It seems like that way would involve putting all the implementations inside the same function body, or else writing even more boilerplate.)

1 Like

Yes, sorry if I wasn’t being clear enough. I didn’t mean an ellipsis on its own would be nonsensical, I meant this proposed new syntax would be nonsensical in a function that has no overloads, so it would be strange to change the grammar for something that only does something in function signatures with the overload decorator, unless you also introduced a new syntax to specify overloads, so the grammar change could be applied only to overloads and not regular function signatures as well.

def foo(...): ...

This would need to be valid syntax, but what does that mean? It could make sense as a shorthand for:

def foo(*_, **_): ...

i.e. accepting any arguments but ignoring them inside the function body, but it becomes less obvious once you start mixing in other function signature syntax like / * and **. Also that would be kind of contradictory with the proposed meaning in overloads. In overloads it’s supposed to mean “Infer the ellided arguments from the other overloads”, which I’m not even sure is actually possible to do completely non-ambiguously, unless you specify how many arguments were ellided, since some overloads may omit a subset of the arguments entirely or choose different names for the same positional only argument to provide better documentation.

1 Like

I believe

def test(...):
    ...

could be valid syntax for parser, but calling such a function should fail. In practice the signature object of the parsed function could contain a parameter with a kind having some new special value like PLACEHOLDER. Then calling such a function would fail with a TypeError having a message like:

Cannot call a function with a placeholder in its signature

This would make the desired @overflow syntax valid, but could be used also for other purposes.

2 Likes

The ... syntax could have other uses as well. For example, currently it seems not possible to use the built-in abstract base classes and typing module to specify a Protocol / ABC that specifies that subclasses must implement a function foo, without putting any constraints on the signature of foo.

Consequently, libraries that need this like pytorch use workarounds like

class Module:
    forward: Callable[..., Any] = _forward_unimplemented

with the Ellipsis-syntax, something like the following should be possible

class Module
    @abstractmethod
    def forward(...) -> Any: ...

Which type checkers can interpret as weakest upper bound of all callables type[forward] = Callable[..., Any]. Similarly, it could enable specification of partial signatures, which is currently mostly unsupported.

2 Likes