Function overloading (via typing.overload) provides the unique and powerful capability of mapping types (and even combinations of types) to other arbitrary types.
# Essentially {(str, bytes): bool, (int, bool): tuple[str, float]}
@overload
def func(x: str, y: bytes) -> bool: ...
@overload
def func(x: int, y: bool) -> tuple[str, float]: ...
Currently, this level of expressiveness is limited to function definitions.
I propose generalizing overload mechanics via a new TypeMap special type.
TypeMap maps input type expressions or tuples of type expressions (analogous to overloaded argument types) to output type expressions (analogous to overloaded return types).
Specializations of TypeMaps (i.e. MyTypeMap[SomeType]) will use the same semantics as overload, with type checkers treating the specialized type as the output type (or union of output types) as defined by the map.
Syntax
I propose allowing TypeMaps to be defined via both a functional and class-based syntax.
Functional syntax
MyTypeMap = TypeMap('MyTypeMap', {
str: list[str],
bytes: list[int],
})
x: MyTypeMap[str]
reveal_type(x) # revealed type is `list[str]`
y: MyTypeMap[str | bytes]
reveal_type(x) # revealed type is `list[str] | list[bytes]`
MultiTypeMap = TypeMap('MultiTypeMap', {
(str, bytes): bool,
(int, bool): tuple[str, float],
})
z: MultiTypeMap[str, bytes]
reveal_type(z) # revealed type is `bool`
This would be roughly equivalent to:
@overload
def _map_type(a1: str, a2: bytes) -> bool: ...
@overload
def _map_type(a1: int, a2: bool) -> tuple[str, float]: ...
type MyTypeMap[T1, T2] = TypeOf[_map_type(cast(T1, ...), cast(T2, ...))]
Class-based syntax
A class-based syntax would allow using PEP 695 type variables
class CollectionTypeMap[T](TypeMap):
__types__ = {
(list, T): list[T],
(set, T): set[T],
}
# See https://github.com/microsoft/pyright/discussions/4369
class CollectionTypePreservingClass[GenericCollectionT: (list, set), T]:
def __init__(self, coll: CollectionTypeMap[CollT, T]) -> None:
self.coll: CollectionTypeMap[CollT, T] = coll
def get_one(self) -> T:
if isinstance(self.coll, set):
return next(iter(self.coll))
return self.coll[0]
x = CollectionTypePreservingClass[set, int]()
reveal_type(x.coll) # revealed type is `set[int]`
reveal_type(x.get_one()) # revealed type is `int`
This would be roughly equivalent to:
@overload
def _map_type[T](a1: list, a2: T) -> list[T]: ...
@overload
def _map_type[T](a1: set, a2: T) -> set[T]: ...
type CollectionTypeMap[T1, T2] = TypeOf[_map_type(cast(T1, ...), cast(T2, ...))]
Real use cases
1. Tagged-data parsing
This is a common pattern when implementing communication protocols.
from enum import IntEnum
class DataType(IntEnum):
UINT8 = 0
UINT64 = 1
STRING = 2
class Message:
data_type: DataType
value: int | str
def encode(self) -> bytes:
if self.data_type is DataType.UINT8:
# Type checker error: Cannot access attribute "to_bytes" for class "str"
encoded_value = self.value.to_bytes(1)
elif self.data_type is DataType.UINT64:
# Type checker error: Cannot access attribute "to_bytes" for class "str"
encoded_value = self.value.to_bytes(8, 'little')
elif self.data_type is DataType.STRING:
# Type checker error: Cannot access attribute "encode" for class "int"
encoded_value = self.value.encode('utf-8')
else:
assert_never(self.data_type)
return self.data_type.to_bytes(1) + encoded_value
With TypeMap this becomes:
from enum import IntEnum
from typing import TypeMap
class DataType(IntEnum):
UINT8 = 0
UINT64 = 1
STRING = 2
DataTypeMap = TypeMap('DataTypeMap', {
Literal[DataType.UINT8]: int,
Literal[DataType.UINT64]: int,
Literal[DataType.STRING]: str,
})
class Message[DataTypeT: (
Literal[DataType.UINT8],
Literal[DataType.UINT64],
Literal[DataType.STRING],
)]:
data_type: DataTypeT
value: DataTypeMap[DataTypeT]
def encode(self) -> bytes:
if self.data_type is DataType.UINT8:
encoded_value = self.value.to_bytes(1)
elif self.data_type is DataType.UINT64:
encoded_value = self.value.to_bytes(8, 'little')
elif self.data_type is DataType.STRING:
encoded_value = self.value.encode('utf-8')
else:
assert_never(self.data_type)
return self.data_type.to_bytes(1) + encoded_value
2. More compact function overloads
subprocess.Popen is a great example of the limitations of overload even for functions.
Currently, usage such as the following requires an assertion or cast:
proc = Popen(..., stdin=PIPE)
# Type checker error: "write" is not a known attribute of "None"
proc.stdin.write(...)
This could be handled by the current type system by making Popen generic over StdinT, StdoutT, and StderrT and adding more overloads. But subprocess.pyi already defines six overloads just to handle the different ways to induce text mode. To handle all of the different combinations of stdin/stdout/stderr, there would need to be 36[1] overloads. (And this is only looking at the 3.11+ version.)
With TypeMap, all of these overloads can be expressed with a single declaration:
# Make PIPE a distinct type
@global_enum
class _PopenFileSpecial(Enum):
PIPE = -1
PIPE: Final = _PopenFileSpecial.PIPE
TextModeTypeMap = TypeMap('TextModeTypeMap', {
(Literal[False] | None, Literal[False] | None, None, None): bytes,
(Literal[True], Literal[True] | None, str | None, str | None): str,
(Literal[True] | None, Literal[True], str | None, str | None): str,
(Literal[True] | None, Literal[True] | None, str, str | None): str,
(Literal[True] | None, Literal[True] | None, str | None, str): str,
(bool | None, bool | None, str | None, str | None): str | bytes,
})
PipeTypeMap = TypeMap('PipeTypeMap', {
(Literal[_PopenFileSpecial.PIPE], bytes): IO[bytes],
(Literal[_PopenFileSpecial.PIPE], str): IO[str],
(int | IO[Any] | None, str | bytes): None,
})
class Popen[
AnyStr: (str, bytes),
StdinT: (IO[bytes], IO[str], None),
StdoutT: (IO[bytes], IO[str], None),
StderrT: (IO[bytes], IO[str], None),
]:
stdin: StdinT
stdout: StdoutT
stderr: StderrT
def __init__[
UniveralNewlinesT: (Literal[True], Literal[False], None),
TextT: (Literal[True], Literal[False], None),
EncodingT: (str, None),
ErrorsT: (str, None),
StdinArgT: (int | IO[Any] | None, Literal[PIPE]),
StdoutArgT: (int | IO[Any] | None, Literal[PIPE]),
StderrArgT: (int | IO[Any] | None, Literal[PIPE]),
](
self: Popen[
TextModeTypeMap[UniveralNewlinesT, TextT, EncodingT, ErrorsT],
PipeTypeMap[
StdinArgT,
TextModeTypeMap[UniveralNewlinesT, TextT, EncodingT, ErrorsT],
],
PipeTypeMap[
StdoutArgT,
TextModeTypeMap[UniveralNewlinesT, TextT, EncodingT, ErrorsT],
],
PipeTypeMap[
StderrArgT,
TextModeTypeMap[UniveralNewlinesT, TextT, EncodingT, ErrorsT],
],
],
args: _CMD,
bufsize: int = -1,
executable: StrOrBytesPath | None = None,
stdin: StdinArgT = None,
stdout: StdoutArgT = None,
stderr: StderrArgT = None,
preexec_fn: Callable[[], Any] | None = None,
close_fds: bool = True,
shell: bool = False,
cwd: StrOrBytesPath | None = None,
env: _ENV | None = None,
universal_newlines: UniveralNewlinesT = None,
startupinfo: Any | None = None,
creationflags: int = 0,
restore_signals: bool = True,
start_new_session: bool = False,
pass_fds: Collection[int] = (),
*,
text: TextT = None,
encoding: EncodingT = None,
errors: ErrorsT = None,
user: str | int | None = None,
group: str | int | None = None,
extra_groups: Iterable[str | int] | None = None,
umask: int = -1,
pipesize: int = -1,
process_group: int | None = None,
) -> None: ...
Required changes to CPython
The only required change to CPython would be adding TypeMap to typing. No special syntax is required. This change is also easily backported via typing_extensions.
(7 combinations of at least one pipe) * (4 text mode overloads + 1 binary mode) + (1 overload for no pipes, text/binary mode irrelevant). ā©ļø