The concept of associated types has come up in typing discussions for a while, in different forms.
In this discussion I’d like to propose adding associated types to the type system as an extension of overloads.
Specifically, I believe associated types provide a clean solution to two particular limitations of the current type system:
1. Expressing a relationship between the types of two or more fields.
This comes up relatively frequently when implementing decoding, such as for communication protocols.
Example:
from enum import IntEnum
class DataType(IntEnum):
UINT8 = 0
UINT64 = 1
STRING = 2
class Message:
data_type: DataType
value: int | str
def encode(self) -> bytes:
if self.data_type is DataType.UINT8:
# Type checker error: Cannot access attribute "to_bytes" for class "str"
encoded_value = self.value.to_bytes(1)
elif self.data_type is DataType.UINT64:
# Type checker error: Cannot access attribute "to_bytes" for class "str"
encoded_value = self.value.to_bytes(8, 'little')
elif self.data_type is DataType.STRING:
# Type checker error: Cannot access attribute "encode" for class "int"
encoded_value = self.value.encode('utf-8')
else:
assert_never(self.data_type)
return self.data_type.to_bytes(1) + encoded_value
2. Exponential overload explosion
This limitation is most apparent with subprocess.Popen. Currently, usage such as the following requires an assertion or cast:
proc = Popen(..., stdin=PIPE)
# Type checker error: "write" is not a known attribute of "None"
proc.stdin.write(...)
This could be handled by the current type system by making Popen generic over StdinT, StdoutT, and StderrT and adding more overloads. But subprocess.pyi already defines six overloads just to handle the different ways to induce text mode. To handle all of the different combinations of stdin/stdout/stderr, there would need to be 48 overloads. (And this is only looking at the 3.11+ version.)
Proposed solution
Associated types would be defined as class-level PEP 695 type aliases.
class UInt8MessageType:
type data_type = Literal[DataType.UINT8]
type value_type = int
class UInt64MessageType:
type data_type = Literal[DataType.UINT64]
type value_type = int
class StringMessageType:
type data_type = Literal[DataType.String]
type value_type = str
When the “root type” is used as the bound of a type variable, the associated types can be accessed via attribute access (similar to ParamSpec’s args and kwargs, but allowing any name).
class Message[
MessageT: (
UInt8MessageType,
UInt64MessageType,
StringMessageType,
)
]:
data_type: MessageT.data_type
value: MessageT.value_type
def encode(self) -> bytes:
if self.data_type is DataType.UINT8:
# Type of self.value inferred to be int
encoded_value = self.value.to_bytes(1)
elif self.data_type is DataType.UINT64:
# Type of self.value inferred to be int
encoded_value = self.value.to_bytes(8, 'little')
elif self.data_type is DataType.STRING:
# Type of self.value inferred to be str
encoded_value = self.value.encode('utf-8')
else:
assert_never(self.data_type)
return self.data_type.to_bytes(1) + encoded_value
The root type can also be generic:
class MessageType[DataT: DataType, ValueT]:
type data_type = DataT
type value_type = ValueT
class Message[
MessageT: (
MessageType[DataType.UINT8, int],
MessageType[DataType.UINT64, int],
MessageType[DataType.STRING, str],
)
]:
...
For Popen we can use something like this:
class _PipeSentinel:
pass
PIPE = _PipeSentinel()
class _PopenIOMode[UniversalNewlinesT, TextT, EncodingT, ErrorsT, StrT]:
type universal_newlines = UniversalNewlinesT
type text = TextT
type encoding = EncodingT
type errors = ErrorsT
type string = StrT
type _PopenBinaryMode = _PopenTextMode[Literal[False] | None, Literal[False] | None, None, None, bytes]
type _PopenTextMode = (
_PopenTextMode[Literal[True], Literal[True] | None, str | None, str | None, str]
| _PopenTextMode[Literal[True] | None, Literal[True], str | None, str | None, str]
| _PopenTextMode[Literal[True] | None, Literal[True] | None, str, str | None, str]
| _PopenTextMode[Literal[True] | None, Literal[True] | None, str | None, str, str]
)
class _PopenStdioPipe[StrT: (str, bytes)]:
type init_type = _PipeSentinel
type attr_type = IO[StrT]
class _PopenStdioNoPipe:
type init_type = int | IO[Any] | None
type attr_type = None
class _PopenStdios[IoModeT, StdinT, StdoutT, StderrT]:
type io_mode = IoModeT
type stdin = StdinT
type stdout = StdoutT
type stderr = StderrT
class Popen[
AnyStr: (str, bytes),
StdinT: (IO[bytes], IO[str], None),
StdoutT: (IO[bytes], IO[str], None),
StderrT: (IO[bytes], IO[str], None),
]:
stdin: StdinT
stdout: StdoutT
stderr: StderrT
def __init__[
StdioT: (
# No pipes mode
PopenStdios[_PopenBinaryMode | _PopenTextMode, _PopenStdioNoPipe, _PopenStdioNoPipe, _PopenStdioNoPipe],
# Binary mode
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[bytes], _PopenStdioNoPipe, _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioPipe[bytes], _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[bytes], _PopenStdioPipe[bytes], _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioNoPipe, _PopenStdioPipe[bytes]],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[bytes], _PopenStdioNoPipe, _PopenStdioPipe[bytes]],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioPipe[bytes], _PopenStdioPipe[bytes]],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[bytes], _PopenStdioPipe[bytes], _PopenStdioPipe[bytes]],
# Text mode
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[str], _PopenStdioNoPipe, _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioPipe[str], _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[str], _PopenStdioPipe[str], _PopenStdioNoPipe],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioNoPipe, _PopenStdioPipe[str]],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[str], _PopenStdioNoPipe, _PopenStdioPipe[str]],
PopenStdios[_PopenBinaryMode, _PopenStdioNoPipe, _PopenStdioPipe[str], _PopenStdioPipe[str]],
PopenStdios[_PopenBinaryMode, _PopenStdioPipe[str], _PopenStdioPipe[str], _PopenStdioPipe[str]],
)
](
self: Popen[
StdioT.io_mode.string,
StdioT.stdin.attr_type,
StdioT.stdout.attr_type,
StdioT.stderr.attr_type,
],
args: _CMD,
bufsize: int = -1,
executable: StrOrBytesPath | None = None,
stdin: StdioT.stdin.init_type = None,
stdout: StdioT.stdout.init_type = None,
stderr: StdioT.stderr.init_type = None,
preexec_fn: Callable[[], Any] | None = None,
close_fds: bool = True,
shell: bool = False,
cwd: StrOrBytesPath | None = None,
env: _ENV | None = None,
universal_newlines: StdioT.io_mode.universal_newlines = None,
startupinfo: Any | None = None,
creationflags: int = 0,
restore_signals: bool = True,
start_new_session: bool = False,
pass_fds: Collection[int] = (),
*,
text: StdioT.io_mode.text = None,
encoding: StdioT.io_mode.encoding = None,
errors: StdioT.io_mode.errors = None,
user: str | int | None = None,
group: str | int | None = None,
extra_groups: Iterable[str | int] | None = None,
umask: int = -1,
pipesize: int = -1,
process_group: int | None = None,
) -> None: ...
While this doesn’t fully solve the exponential growth problem (since stdin etc. are dependent on the I/O mode), it does allow us to factor out text mode,resulting in 15 “explicit” overloads instead of 47.
(And if we allow type variables to be generic, we can fully eliminate the multiplicative factor here, and have each stdio parameter be generic over the IO mode.)
We also don’t need to repeat the entire signature for each overload.
Required changes to CPython
The only required language-level change would be adding a __getattr__ method to TypeVar that returns AssociatedTypeAlias (with appropriate __repr__ and its own __getattr__).
For backporting, a new AssociatedType special type can be added to typing_extensions, providing the missing __getattr__:
class Message[
MessageT: (
MessageType[DataType.UINT8, int],
MessageType[DataType.UINT64, int],
MessageType[DataType.STRING, str],
)
]:
data_type: AssociatedType[MessageT].data_type
value: AssociatedType[MessageT].value_type