Type resolvers for custom typing logic

javidcf · March 3, 2024, 3:30am

Consider some list-like type of which can be indexed by an int or a slice, giving you either a single element or a subsequence of the object in return respectively.

class MyList[T]:
    # ...
    def __getitem__(self, index: int|slice) -> T|MyList[T]:
        # ...

The annoying thing is when I index this list I will get a union type back no matter what, even though I know whether I should be getting an element or a subsequence just by looking at the type of the index I am giving.

It would be useful to have a way of explaining this to the type checker. With a placeholder syntax, I am thinking of being able to do, conceptually, something like this:

from typing import TypeResolver

def sequence_indexer(index_type: type, sequence_type: type, element_type: type) -> type:
    if issubclass(index_type, int):
        return element_type
    elif issubclass(index_type, slice):
        return sequence_type
    else:
        raise TypeError

# ...

class MyList[T]:
    # ...
    def __getitem__[Index: (int, slice)](self, index: Index) -> TypeResolver[sequence_indexer, Index, MyList[T], T]:
        # ...

Or maybe, sequence_indexer should be a subclass of TypeResolver, or whatever else. The point is being able to define type resolving logic for the type checker. Although common resolvers like this sequence_indexer maybe could be provided by the standard library.

I hope the benefits are obvious. If you could somehow check for Literal, you could even have accurate type resolution for functions that return different types depending on, say, a boolean parameter (when its value is constant, obviously).

Now for the cons or challenges:

This is effectively a way to write little plugins for type checkers. Type checkers generally just check your code, not run it, so this is asking something rather different from what they usually do. Likely many technical issues to address.
Ideally, TypeResolvers should be guaranteed not to break the type checker, even if they are badly written.
Type checkers can usually deal with a module that has syntax errors, but they will not be able to load a type resolver from a module with syntax errors.
A type resolver will likely want to check for types and protocols that are not runtime-checkable. Which is fair, because these type of resolvers wouldn’t run at runtime. But checking types with issubclass would still fail. I’m suppose these type checks would need to be done with a special function from typing that does nothing (or raises) at runtime but instructs the type checker to do the check.
As a more philosophical drawback, one could argue that a type resolver is another element of complexity in user code which needs to be maintained and kept “in sync” with the actual implementation, for no actual functionality in the program. And type hinting was never meant to be fully comprehensive anyway. Still, I feel the list indexing example I showed above is compelling.

In fact, as a simpler alternative I would also consider a TypeSwitch construct like the following:

from typing import TypeSwitch

class MyList[T]:
    # ...
    def __getitem__[Index: (int, slice)](self, index: Index) -> TypeSwitch[Index, [int, T], [slice, MyList[T]]]:
        # ...

Although obviously it only covers a particular subset of what an arbitrary type resolver could do, and its actual applicability would need to be evaluated.

EDIT: As quickly pointed out by @MegaIng, typing.overload already addresses the use case I first posted (and completely covers the use of TypeSwitch). Even a function like numpy.unique, which returns different things depending on some boolean arguments, should be possible to annotate correctly like this with Literal. While there are still relevant examples (like struct, mentioned in the same reply), the use case is more niche and the value/complexity ratio of the idea more reduced than I initially thought.

MegaIng · March 3, 2024, 4:03am

While a typesolver is an interesting idea for a different usecase, there usecase you presented here is fully covered by overload:

class MyList[T]:
    @overload
    def __getitem__(self, __i: int) -> _T: ...
    @overload
    def __getitem__(self, __s: slice) -> MyList[_T]: ...
    def __getitem__(self, index: int|slice) -> T|MyList[T]:
        # ...

In fact, this is what the stdlib type hints do:

github.com

python/typeshed/blob/23daf97ab349b8d7ecf29fe3db288deab826422c/stdlib/builtins.pyi#L998-L1001


      
          @overload
          def __getitem__(self, __i: SupportsIndex) -> _T: ...
          @overload
          def __getitem__(self, __s: slice) -> list[_T]: ...

Instead typesolvers would be interesting for something like the stdlib struct module, where the return types can be worked out depending on the string value. But I can’t imagine a good interface, and I am 90% sure that the approach you listed is too naive and doesn’t fit well with the more complex type system features, let alone the design of the current type checkers.

Jelle · March 3, 2024, 6:21am

I proposed a version of this a few years ago and implemented it in pyanalyze. I opened a thread on the typing-sig mailing list titled “Proposal: Type evaluation functions” and several people explained why this would be hard to implement in some type checkers, so I didn’t press the proposal further.

Pyanalyze still supports it though, and I’ve found it helpful for replacing some complicated sets of overloads. For example, we use it internally to provide better types for a few Pandas functions, like this:

@evaluated
def reset_index(
    __self: pandas.DataFrame,
    level: "Optional[Union[Hashable, Sequence[Hashable]]]" = None,
    *,
    drop: bool = False,
    inplace: bool = False,
    col_level: Hashable = 0,
    col_fill: Optional[Hashable] = "",
) -> Optional[pandas.DataFrame]:
    if inplace is True:
        return None
    elif inplace is False:
        return pandas.DataFrame
    return Optional[pandas.DataFrame]

NeilGirdhar · March 3, 2024, 8:01am

Cool, I proposed something similar too.

javidcf · March 3, 2024, 9:29am

Of course, I felt I was missing something here. The actual use case that triggered this idea was different, but in trying to give a minimal example I overlooked the obvious solution to this.

Yes, the idea is more about the concept itself of “type resolving logic provided by the user” rather than a particular syntax and semantics. I too agree it is difficult to think of a convenient interface for it, and while I don’t know enough about type checkers, I think you are probably right.

The struct use case is a good example - return types that depend on (usually literal) arguments would benefit from this most.