`TypeMap`: Generalizing `overload` semantics

The unsafety is already there. The fact that overlapping overloads cannot express the way that existing libraries work is a bug in the type system. Those libraries are not going to change their handling of these types:

>>> import numpy as np
>>> np.array([True])
array([ True])
>>> np.array([1])
array([1])
>>> np.array([1.0])
array([1.])
>>> np.array([1.0j])
array([0.+1.j])

Those things are just facts that Python’s type system needs to accommodate.

I think it is better if TypeMap can express an exact mapping between types as the dict syntax it uses suggests. It can just be a map A -> B, C -> D, ... and I think my_type_map[A | C] -> my_type_map[A] | my_type_map[C] is really the only symbolic manipulation that is needed.

In practice I would expect to make a really big TypeMap that includes many seemingly redundant subtypes and other things so that it isn’t necessary for the type checker to do much more than map the types directly.

1 Like

I understand, and I agree that this is an issue with the type system. Where I disagree is that this is the place to fix it. I think there are a few possible solutions, but they should be implemented for both overload and TypeMap.

For numpy, I think the better solution would be to introduce ExactInt, ExactFloat, and ExactComplex NewTypes. That would solve the bigger issue that bool/int/float/complex overlap in the first place:

# Inferred as the type of int literals, the return type of `int(...)`,
# and the result of integer math operations between two `ExactInt`s.
# In addition, `type(x) is int` narrows `x` to `ExactInt`.
ExactInt = NewType('ExactInt', int)

# Inferred as the type of float literals, the return type of `float(...)`,
# and the result of math operations that return `float`.
# In addition, `type(x) is float` narrows `x` to `ExactFloat`.
ExactFloat = NewType('ExactFloat', float)

@overload
def to_scalar(x: bool) -> np.bool_: ...
@overload
def to_scalar(x: ExactInt) -> np.int_: ...
@overload
def to_scalar(x: ExactFloat) -> np.float64: ...
@overload
def to_scalar(x: float) -> np.int_ | np.float64 | np.bool_: ...

That’s the problem though, “exact” doesn’t really exist in the type system.
TypeMap is most useful when used with TypeVars, whose types are usually inferred (and in the case of functions, always inferred). So we’re usually not talking about C[float](), but rather f(x), where x is inferred to be some value assignable to float.

The exceptions to this are @final, Literal, and NewType—which is why I’d prefer something like the above example, instead of trying to define what the “exact type” of some arbitrary type-checker-internal type should be.

2 Likes

This would force numpy users to wrap all of their builtin scalars in these NewTypes, which isn’t an option.

There’s another workaround for this; optype.Just (docs), which effectively behaves as an invariant wrapper, but in practice it’s only useful for input positions in stubs. In scipy-stubs I heavily rely on this. But because it relies on unspecified (by the typing spec) behavior, using this trick in NumPy would be a bit too risky for my taste. Because if for some reason a major type-checker decides to change how it treats it, all numpy users would be affected, we won’t be able to publish a patch release as quickly as for scipy-stubs, and users wouldn’t be able to opt-out of these bundles stubs, like they could with scipy-stubs.

Why not? TypeMap generalized @overload, so why not make it even more powerful while we’re at it? It’d still be backwards compatible as far as I can tell, just not forwads-compatible, which is of course no problem at all.
Also, I don’t consider this to be a “fix”, but rather as the ideal place to add a feature that many people have been asking for for a long time now.
If you can kill two birds with two stone, why only kill one? At the very least let’s not repeat the same mistake.

This won’t work for most people with this problem, for the same reasons that negation won’t work for it (and more…). These types need to be accurately represented, rather than the lie of int/float/complex that exists for typing. There’s been significant pushback to this in the past, but the arguments for why an Exact form or negation don’t work for this use case already have been explored in depth. The eaxct-ness requires re-proving (with a runtime check) at every api boundary that doesn’t switch to using exact in it’s return types itself, which requires more work than just changing instances of float that are meant to accept int to SupportsFloat (which does not require a runtime check to re-prove)

Sorry if I wasn’t clear, I think these should be added to typing and handled specially by type checkers. Specifically, many instances of int/float can be inferred to be their Exact counterparts (literals, result of int(…)/float(…), mathematical operations on Exact types, etc.). And when you do need to coerce an int to ExactInt, it’s just int(x).

Because it requires defining a new set of semantics, and one that is not actually type sound.

True, the overload semantics aren’t entirely type sound either. But adding another, arguably less sound system, seems like a move in the wrong direction.

And given that the semantics of overlapping overloads are currently undefined, I don’t think tying TypeMap to overload prevents us from improving the situation in the future.

On the other hand, there’s a reason overlap semantics haven’t been added to the spec. I really don’t want a potential PEP to get snagged on trying to solve this issue.

But it’s not just a feature, it’s taking a stance on a fairly hairy typing issue. Specifically, a stance that places ease-of-use over type safety. I’m not saying it’s definitely the wrong stance, but I do think that it’s a stance that deserves its own PEP.

1 Like

There would be some time where APIs would need to be updated, but I think it’s fair to assume that any API that doesn’t feel the need to commit to Exact return types, shouldn’t be trusted to not mix ints with floats. And in that case, wrapping your values with float is a reasonable safeguard anyway.

I agree that the current typing semantics for int/float/complex were a big mistake, but I don’t see that changing anytime soon. (It is a great example, though, of choosing convenience over correctness causing a big problem later on…)

I think you are right that it deserves its own PEP and it is reasonable that you don’t want to approach that issue here. What I am not sure about though is if this proposal to match @overload could make that problem harder to resolve in future or might end up making TypeMap itself less useful than it could otherwise be.

It should probably be considered off-topic here so I won’t go into it any further than this but I will just say that ExactFloat doesn’t solve anything. The only reason it might seem better to have ExactFloat rather than just float-means-float is that it sounds like a non-breaking change when you think about bumping your type checker or Python version. ExactFloat isn’t actually better for compatibility though because it isn’t compatible with float so when a library decides to use it that breaks all of their users who are using float in their own annotations. The end result would be that everyone has to use ExactFloat but the breakage would come from newer versions of dependent libraries and stubs (having contradictory annotations during the transition) rather than coming in a newer version of Python or of the type checkers.

1 Like

From what I can tell the use cases under discussion here are currently unspecified behavior in the overload spec, so I don’t think there’s really anything preventing future proposals from resolving this.

On the other hand, I understand the hesitation around permanently tying TypeMap to possibly undesired semantics, when it could possibly be used to introduce alternative semantics in the future.

As a compromise, I suggest the following:

  • We propose a new OverloadMap type, using the overload-style syntax I proposed above:
    class DataTypeMap(OverloadMap):
      def _(_: Literal[DataType.UINT8, DataType.UINT64]) -> int: ...
      def _(_: Literal[DataType.STRING]) -> str: ...
    
  • We record the concerns brought up here as part of the PEP, including the rejection of the TypeMap name, in order to preserve it for more dict-like semantics in the future.

I think this OverloadMap is different enough from the dict-based TypeMap proposal as to leave room for an alternative system in the future.

At the same time, the closer this proposal sticks to existing semantics, the more likely it is to progress.

I like the idea and this would indeed be a very useful feature.

Coming from a Typescript background, adding an “index”-access to TypedDict feels a bit more natural and versatile than introducing a completely new type map. See Documentation - Indexed Access Types for the TS documentation.

The example above could then look like

class DataType(IntEnum):
    UINT8 = 0
    UINT64 = 1
    STRING = 2

class DataTypeMap(TypedDict):
    UINT8: int
    UINT64: int
    STRING: str

class EnumTypeMap(TypedDict):
    UINT8: DataType.UINT8
    UINT64: DataType.UINT64
    STRING: DataType.STRING

DataTypeT = KeysOf[DataTypeMap] # i.e "UINT8" | "UINT64" | "STRING"

class Message(Generic[DataTypeT]):
    data_type: EnumTypeMap[DataTypeT]
    value: DataTypeMap[DataTypeT]

This is example is a bit more verbose, since one cannot use enums as keys in a TypedDict, so one has to take the detour via string-based keys and EnumTypeMap. But indexed access in TypedDicts would enable more use cases, which are not maps from types to types.

1 Like

Would this idea of type mapping help expressing the following better?

type AsyncFunc[**P, T] = Callable[P, Awaitable[T]]
# fmt: off
@overload
async def gather[T1, T2](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], /) -> tuple[T1, T2]: ...

@overload
async def gather[T1, T2, T3](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], /) -> tuple[T1, T2, T3]: ...

@overload
async def gather[T1, T2, T3, T4](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], c4: AsyncFunc[[], T4], /) -> tuple[T1, T2, T3, T4]: ...

@overload
async def gather[T1, T2, T3, T4, Tn](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], c4: AsyncFunc[[], T4], /, *cs: AsyncFunc[[], Tn]) -> tuple[T1, T2, T3, T4, *tuple[Tn, ...]]: ...
# fmt: on

async def gather[T](*coros: AsyncFunc[[], T]) -> tuple[T, ...]:  # pyright: ignore[reportInconsistentOverload]
    out: list[T] = [type_cast("Any", None)] * len(coros)

    async def worker(index: int, coro: AsyncFunc[[], T]) -> None:
        out[index] = await coro()

    async with anyio.create_task_group() as tg:
        for i, coro in enumerate(coros):
            tg.start_soon(worker, i, coro)

    return tuple(out)

I was going to reply “probably not.” But you got me thinking, and I realized, recursive OverloadMaps open the door to some C++ TMP-style shenanigans:


type AsyncFunc[T] = Callable[[], Awaitable[T]]

# Functional-programming style list,
# to work around the one TypeVarTuple rule
type TypeList[H, T: TypeList] = tuple[H, T] | tuple[()]


class CoroReturn(OverloadMap):
  def _[T](_: AsyncFunc[T]) -> T: ...
  # Required because bounds on TypeVarTuple are not supported yet
  def _(_: object) -> object: ...


class MapCoroReturn(OverloadMap):
  # Initial
  def _[*Ts](
    _: tuple[*Ts],
  ) -> MapCoroReturn[
    tuple[()],
    tuple[*Ts],
  ]: ...
  
  # Intermediate
  def _[
    ProcessedT: TypeList,
    HeadT, 
    *TailT,
  ](
    processed: ProcessedT,
    q: tuple[HeadT, *TailT],
  ) -> MapCoroReturn[
    tuple[CoroReturn[HeadT], ProcessedT],
    tuple[*TailT],
  ]: ...
  
  # Terminal
  def _[
    ProcessedT: TypeList,
  ](
    processed: ProcessedT,
    q: tuple[()],
  ) -> FlattenTypeList[ProcessedT]: ...


class FlattenTypeList(OverloadMap):
  # Initial
  def _[
    TListT: TypeList,
  ](
    tlist: TListT,
  ) -> FlattenTypeList[
    TListT,
    tuple[()],
  ]: ...
  
  # Intermediate
  def _[
    HeadT,
    TailT: TypeList,
    *ProcessedT,
  ](
    tlist: tuple[HeadT, TailT],
    processed: tuple[*ProcessedT]
  ) -> FlattenTypeList[
    TailT,
    tuple[HeadT, *ProcessedT],
  ]: ...
  
  # Terminal
  def _[
    *ProcessedT,
  ](
    tlist: tuple[()],
    processed: tuple[*ProcessedT]
  ) -> tuple[*ProcessedT]: ...


def gather[*Ts](*cs: *Ts) -> MapCoroReturn[tuple[*Ts]]: ...

There are definitely some errors here, and technically this accepts any argument types (with anything that’s not an AsyncFunc becoming just object), but this could allow for some very interesting complex type transformations.

I’ve gone ahead and written up a draft PEP (rendered, source).

Looking forward to feedback!

Also, if anyone has any tips for next steps (especially getting a sponsor), I’d love some advice :slightly_smiling_face:.

2 Likes

I managed to implement an extremely crude POC of this PEP in Pyright.
There are basically no diagnostics for incorrect usage and no edge case handling, but both basic and recursive OverloadedTypes should work.

Example:

from collections.abc import Callable
from typing import OverloadedType, reveal_type


class AppendToTuple(OverloadedType):
    def _[*Ts, U](tup: tuple[*Ts], new: U) -> tuple[*Ts, U]: ...


class MapCallable(OverloadedType):
    """Transforms `tuple[T0, T1, ..., Tn]` into `tuple[Callable[[], T0], Callable[[], T1], ..., Callable[[], Tn]]`."""

    # Initial
    def _[*Ts](ts: tuple[*Ts]) -> MapCallable[tuple[*Ts], tuple[()]]: ...

    # Terminal
    def _[OutT](ts: tuple[()], out: OutT) -> OutT: ...

    # Intermediate
    def _[HeadT, *Ts, OutT: tuple](ts: tuple[HeadT, *Ts], out: OutT) -> MapCallable[
        tuple[*Ts],
        AppendToTuple[OutT, Callable[[], HeadT]],
    ]: ...


def make_callables[*Ts](*args: *Ts) -> MapCallable[tuple[*Ts]]:
    return tuple((lambda: arg) for arg in args)

reveal_type(make_callables(1, 'a', False, (1.0, b'')))

Pyright output:

...
information: Type of "make_callables(1, 'a', False, (1, b''))" is "tuple[() -> int, () -> str, () -> bool, () -> tuple[float, Literal[b""]]]"
...

@jorenham What do you think of the current proposal? I imagine the recursive pattern could help with annotating some of numpy’s variadic functions.