`TypeMap`: Generalizing `overload` semantics

The unsafety is already there. The fact that overlapping overloads cannot express the way that existing libraries work is a bug in the type system. Those libraries are not going to change their handling of these types:

>>> import numpy as np
>>> np.array([True])
array([ True])
>>> np.array([1])
array([1])
>>> np.array([1.0])
array([1.])
>>> np.array([1.0j])
array([0.+1.j])

Those things are just facts that Python’s type system needs to accommodate.

I think it is better if TypeMap can express an exact mapping between types as the dict syntax it uses suggests. It can just be a map A -> B, C -> D, ... and I think my_type_map[A | C] -> my_type_map[A] | my_type_map[C] is really the only symbolic manipulation that is needed.

In practice I would expect to make a really big TypeMap that includes many seemingly redundant subtypes and other things so that it isn’t necessary for the type checker to do much more than map the types directly.

1 Like

I understand, and I agree that this is an issue with the type system. Where I disagree is that this is the place to fix it. I think there are a few possible solutions, but they should be implemented for both overload and TypeMap.

For numpy, I think the better solution would be to introduce ExactInt, ExactFloat, and ExactComplex NewTypes. That would solve the bigger issue that bool/int/float/complex overlap in the first place:

# Inferred as the type of int literals, the return type of `int(...)`,
# and the result of integer math operations between two `ExactInt`s.
# In addition, `type(x) is int` narrows `x` to `ExactInt`.
ExactInt = NewType('ExactInt', int)

# Inferred as the type of float literals, the return type of `float(...)`,
# and the result of math operations that return `float`.
# In addition, `type(x) is float` narrows `x` to `ExactFloat`.
ExactFloat = NewType('ExactFloat', float)

@overload
def to_scalar(x: bool) -> np.bool_: ...
@overload
def to_scalar(x: ExactInt) -> np.int_: ...
@overload
def to_scalar(x: ExactFloat) -> np.float64: ...
@overload
def to_scalar(x: float) -> np.int_ | np.float64 | np.bool_: ...

That’s the problem though, “exact” doesn’t really exist in the type system.
TypeMap is most useful when used with TypeVars, whose types are usually inferred (and in the case of functions, always inferred). So we’re usually not talking about C[float](), but rather f(x), where x is inferred to be some value assignable to float.

The exceptions to this are @final, Literal, and NewType—which is why I’d prefer something like the above example, instead of trying to define what the “exact type” of some arbitrary type-checker-internal type should be.

2 Likes

This would force numpy users to wrap all of their builtin scalars in these NewTypes, which isn’t an option.

There’s another workaround for this; optype.Just (docs), which effectively behaves as an invariant wrapper, but in practice it’s only useful for input positions in stubs. In scipy-stubs I heavily rely on this. But because it relies on unspecified (by the typing spec) behavior, using this trick in NumPy would be a bit too risky for my taste. Because if for some reason a major type-checker decides to change how it treats it, all numpy users would be affected, we won’t be able to publish a patch release as quickly as for scipy-stubs, and users wouldn’t be able to opt-out of these bundles stubs, like they could with scipy-stubs.

Why not? TypeMap generalized @overload, so why not make it even more powerful while we’re at it? It’d still be backwards compatible as far as I can tell, just not forwads-compatible, which is of course no problem at all.
Also, I don’t consider this to be a “fix”, but rather as the ideal place to add a feature that many people have been asking for for a long time now.
If you can kill two birds with two stone, why only kill one? At the very least let’s not repeat the same mistake.

This won’t work for most people with this problem, for the same reasons that negation won’t work for it (and more…). These types need to be accurately represented, rather than the lie of int/float/complex that exists for typing. There’s been significant pushback to this in the past, but the arguments for why an Exact form or negation don’t work for this use case already have been explored in depth. The eaxct-ness requires re-proving (with a runtime check) at every api boundary that doesn’t switch to using exact in it’s return types itself, which requires more work than just changing instances of float that are meant to accept int to SupportsFloat (which does not require a runtime check to re-prove)

Sorry if I wasn’t clear, I think these should be added to typing and handled specially by type checkers. Specifically, many instances of int/float can be inferred to be their Exact counterparts (literals, result of int(…)/float(…), mathematical operations on Exact types, etc.). And when you do need to coerce an int to ExactInt, it’s just int(x).

Because it requires defining a new set of semantics, and one that is not actually type sound.

True, the overload semantics aren’t entirely type sound either. But adding another, arguably less sound system, seems like a move in the wrong direction.

And given that the semantics of overlapping overloads are currently undefined, I don’t think tying TypeMap to overload prevents us from improving the situation in the future.

On the other hand, there’s a reason overlap semantics haven’t been added to the spec. I really don’t want a potential PEP to get snagged on trying to solve this issue.

But it’s not just a feature, it’s taking a stance on a fairly hairy typing issue. Specifically, a stance that places ease-of-use over type safety. I’m not saying it’s definitely the wrong stance, but I do think that it’s a stance that deserves its own PEP.

1 Like

There would be some time where APIs would need to be updated, but I think it’s fair to assume that any API that doesn’t feel the need to commit to Exact return types, shouldn’t be trusted to not mix ints with floats. And in that case, wrapping your values with float is a reasonable safeguard anyway.

I agree that the current typing semantics for int/float/complex were a big mistake, but I don’t see that changing anytime soon. (It is a great example, though, of choosing convenience over correctness causing a big problem later on…)

I think you are right that it deserves its own PEP and it is reasonable that you don’t want to approach that issue here. What I am not sure about though is if this proposal to match @overload could make that problem harder to resolve in future or might end up making TypeMap itself less useful than it could otherwise be.

It should probably be considered off-topic here so I won’t go into it any further than this but I will just say that ExactFloat doesn’t solve anything. The only reason it might seem better to have ExactFloat rather than just float-means-float is that it sounds like a non-breaking change when you think about bumping your type checker or Python version. ExactFloat isn’t actually better for compatibility though because it isn’t compatible with float so when a library decides to use it that breaks all of their users who are using float in their own annotations. The end result would be that everyone has to use ExactFloat but the breakage would come from newer versions of dependent libraries and stubs (having contradictory annotations during the transition) rather than coming in a newer version of Python or of the type checkers.

1 Like

From what I can tell the use cases under discussion here are currently unspecified behavior in the overload spec, so I don’t think there’s really anything preventing future proposals from resolving this.

On the other hand, I understand the hesitation around permanently tying TypeMap to possibly undesired semantics, when it could possibly be used to introduce alternative semantics in the future.

As a compromise, I suggest the following:

  • We propose a new OverloadMap type, using the overload-style syntax I proposed above:
    class DataTypeMap(OverloadMap):
      def _(_: Literal[DataType.UINT8, DataType.UINT64]) -> int: ...
      def _(_: Literal[DataType.STRING]) -> str: ...
    
  • We record the concerns brought up here as part of the PEP, including the rejection of the TypeMap name, in order to preserve it for more dict-like semantics in the future.

I think this OverloadMap is different enough from the dict-based TypeMap proposal as to leave room for an alternative system in the future.

At the same time, the closer this proposal sticks to existing semantics, the more likely it is to progress.

I like the idea and this would indeed be a very useful feature.

Coming from a Typescript background, adding an “index”-access to TypedDict feels a bit more natural and versatile than introducing a completely new type map. See Documentation - Indexed Access Types for the TS documentation.

The example above could then look like

class DataType(IntEnum):
    UINT8 = 0
    UINT64 = 1
    STRING = 2

class DataTypeMap(TypedDict):
    UINT8: int
    UINT64: int
    STRING: str

class EnumTypeMap(TypedDict):
    UINT8: DataType.UINT8
    UINT64: DataType.UINT64
    STRING: DataType.STRING

DataTypeT = KeysOf[DataTypeMap] # i.e "UINT8" | "UINT64" | "STRING"

class Message(Generic[DataTypeT]):
    data_type: EnumTypeMap[DataTypeT]
    value: DataTypeMap[DataTypeT]

This is example is a bit more verbose, since one cannot use enums as keys in a TypedDict, so one has to take the detour via string-based keys and EnumTypeMap. But indexed access in TypedDicts would enable more use cases, which are not maps from types to types.

1 Like

Would this idea of type mapping help expressing the following better?

type AsyncFunc[**P, T] = Callable[P, Awaitable[T]]
# fmt: off
@overload
async def gather[T1, T2](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], /) -> tuple[T1, T2]: ...

@overload
async def gather[T1, T2, T3](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], /) -> tuple[T1, T2, T3]: ...

@overload
async def gather[T1, T2, T3, T4](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], c4: AsyncFunc[[], T4], /) -> tuple[T1, T2, T3, T4]: ...

@overload
async def gather[T1, T2, T3, T4, Tn](c1: AsyncFunc[[], T1], c2: AsyncFunc[[], T2], c3: AsyncFunc[[], T3], c4: AsyncFunc[[], T4], /, *cs: AsyncFunc[[], Tn]) -> tuple[T1, T2, T3, T4, *tuple[Tn, ...]]: ...
# fmt: on

async def gather[T](*coros: AsyncFunc[[], T]) -> tuple[T, ...]:  # pyright: ignore[reportInconsistentOverload]
    out: list[T] = [type_cast("Any", None)] * len(coros)

    async def worker(index: int, coro: AsyncFunc[[], T]) -> None:
        out[index] = await coro()

    async with anyio.create_task_group() as tg:
        for i, coro in enumerate(coros):
            tg.start_soon(worker, i, coro)

    return tuple(out)

I was going to reply “probably not.” But you got me thinking, and I realized, recursive OverloadMaps open the door to some C++ TMP-style shenanigans:


type AsyncFunc[T] = Callable[[], Awaitable[T]]

# Functional-programming style list,
# to work around the one TypeVarTuple rule
type TypeList[H, T: TypeList] = tuple[H, T] | tuple[()]


class CoroReturn(OverloadMap):
  def _[T](_: AsyncFunc[T]) -> T: ...
  # Required because bounds on TypeVarTuple are not supported yet
  def _(_: object) -> object: ...


class MapCoroReturn(OverloadMap):
  # Initial
  def _[*Ts](
    _: tuple[*Ts],
  ) -> MapCoroReturn[
    tuple[()],
    tuple[*Ts],
  ]: ...
  
  # Intermediate
  def _[
    ProcessedT: TypeList,
    HeadT, 
    *TailT,
  ](
    processed: ProcessedT,
    q: tuple[HeadT, *TailT],
  ) -> MapCoroReturn[
    tuple[CoroReturn[HeadT], ProcessedT],
    tuple[*TailT],
  ]: ...
  
  # Terminal
  def _[
    ProcessedT: TypeList,
  ](
    processed: ProcessedT,
    q: tuple[()],
  ) -> FlattenTypeList[ProcessedT]: ...


class FlattenTypeList(OverloadMap):
  # Initial
  def _[
    TListT: TypeList,
  ](
    tlist: TListT,
  ) -> FlattenTypeList[
    TListT,
    tuple[()],
  ]: ...
  
  # Intermediate
  def _[
    HeadT,
    TailT: TypeList,
    *ProcessedT,
  ](
    tlist: tuple[HeadT, TailT],
    processed: tuple[*ProcessedT]
  ) -> FlattenTypeList[
    TailT,
    tuple[HeadT, *ProcessedT],
  ]: ...
  
  # Terminal
  def _[
    *ProcessedT,
  ](
    tlist: tuple[()],
    processed: tuple[*ProcessedT]
  ) -> tuple[*ProcessedT]: ...


def gather[*Ts](*cs: *Ts) -> MapCoroReturn[tuple[*Ts]]: ...

There are definitely some errors here, and technically this accepts any argument types (with anything that’s not an AsyncFunc becoming just object), but this could allow for some very interesting complex type transformations.

I’ve gone ahead and written up a draft PEP (rendered, source).

Looking forward to feedback!

Also, if anyone has any tips for next steps (especially getting a sponsor), I’d love some advice :slightly_smiling_face:.

2 Likes

I managed to implement an extremely crude POC of this PEP in Pyright.
There are basically no diagnostics for incorrect usage and no edge case handling, but both basic and recursive OverloadedTypes should work.

Example:

from collections.abc import Callable
from typing import OverloadedType, reveal_type


class AppendToTuple(OverloadedType):
    def _[*Ts, U](tup: tuple[*Ts], new: U) -> tuple[*Ts, U]: ...


class MapCallable(OverloadedType):
    """Transforms `tuple[T0, T1, ..., Tn]` into `tuple[Callable[[], T0], Callable[[], T1], ..., Callable[[], Tn]]`."""

    # Initial
    def _[*Ts](ts: tuple[*Ts]) -> MapCallable[tuple[*Ts], tuple[()]]: ...

    # Terminal
    def _[OutT](ts: tuple[()], out: OutT) -> OutT: ...

    # Intermediate
    def _[HeadT, *Ts, OutT: tuple](ts: tuple[HeadT, *Ts], out: OutT) -> MapCallable[
        tuple[*Ts],
        AppendToTuple[OutT, Callable[[], HeadT]],
    ]: ...


def make_callables[*Ts](*args: *Ts) -> MapCallable[tuple[*Ts]]:
    return tuple((lambda: arg) for arg in args)

reveal_type(make_callables(1, 'a', False, (1.0, b'')))

Pyright output:

...
information: Type of "make_callables(1, 'a', False, (1, b''))" is "tuple[() -> int, () -> str, () -> bool, () -> tuple[float, Literal[b""]]]"
...

@jorenham What do you think of the current proposal? I imagine the recursive pattern could help with annotating some of numpy’s variadic functions.

Sorry it took me a while to get to, but I’ve had a look at the PEP and I think it looks pretty good :slight_smile: There are a couple of suggestions that came to mind:

  • You say that this generalizes @overload, but did not show how it does this exactly. For that you could describe the algorithm that coverts an overloaded callable into a non-overloaded one that uses OverloadedType
  • It’s could be worth to describe that this proposal also generalizes TypeVars with constraints.
  • The “overload explosion” has a lot of parameters that don’t directly motivate the problem, that make it difficult to read. A simple function with three boolean flags that influence the output type in a different way should be enough here. In scipy-stubs there are many real-world examples of this that are a bit more compact (in the no. of params), for example logsumexp.
  • In the specification you refer to the overload spec. But as I tried to explain before, the overload spec is not complete. It for example doesn’t specify when exactly two overloads are overlapping. There’s also an issue in the spec that causes overloads to not (always) distribute over union types. So if you have a valid overloaded function like A -> X, B -> Y and A | B -> X | Y | Z, then if the input type is A | B, the return type will be X | Y | Z. If it would instead distribute over unions, then the last overload would never be reached, and the return type would instead be X | Y. This probably cannot be fixed anymore at this point, as that would be backwards incompatible.
    So are you sure you want to build upon the incomplete and sometimes ambiguous overload spec? I think that starting from scratch here would be easier than you might think (especially if you define it in terms of intersection types).
  • What are the valid use locations? The specification talks about generating an overloaded function definition, but if you use it in e.g. a type alias, then that wouldn’t be valid, even though that seems like an important use-case to me. And how about as bounds to generic type parameters, e.g. f[T: MyOverloadedType[str]](x: T)?
  • Your examples don’t work at runtime because they use the same method names. So runtime type-checkers would not be able to use it like this.
  • I’m not a fan of the class-based API because it requires metaclass voodoo to implement. Instead I strongly prefer something that’s more like a TypeAliasType. But not with a dict, but rather a tuple of (*types_in, type_out) pairs. For example Overloaded("Spam", (((str, T), A[T]), ((bytes, T), B[T])), type_params=(T, )). Using a tuple instead of a avoids having to hash the types, which in rare cases might lead to issues (unhashable types can be defined).
  • I’ll probably use it all over the place, so I prefer a more concise name such as TypeMap or Overloaded if you decide to stick with the overload-based definition.
2 Likes

I like the idea, though the syntax seems somewhat overwhelming at the moment.

However, IMHO the statement

There is currently no way to express to type checkers the relationship between data_type and value

(first motivating example) isn’t fully correct if we employ @Tinche’s ideas on Algebraic Data Types. Then, Message is a sum type.

First, let’s define the sum type’s parts, that express the relationship between data_type and value:

class IntMessage:
    data_type: Literal[DataType.UINT8, DataType.UINT64]
    value: int

class StrMessage:
    data_type: Literal[DataType.STRING]
    value: str

Then, let’s change Message to use this. Unfortunately, that needs an extra attribute level (msg) as we cannot make Message a subtype of IntMessage | StrMessage.

class Message:
    msg: IntMessage | StrMessage

    def encode(self) -> bytes:
        if self.msg.data_type is DataType.UINT8:
            # Type checker okay
            encoded_value = self.msg.value.to_bytes(1, 'little')
        elif self.msg.data_type is DataType.UINT64:
            # Type checker okay
            encoded_value = self.msg.value.to_bytes(8, 'little')
        elif self.msg.data_type is DataType.STRING:
            # Type checker okay
            encoded_value = self.msg.value.encode('utf-8')
        else:
            assert_never(self.msg.data_type)
        return self.msg.data_type.to_bytes(1) + encoded_value

This type checks okay on MyPy (1.19.1).

Having said that, I think OverloadedType still can bring value. However, for the PEP I suggest:

  • rephrasing the impossibility to relate data_type to value with current typing means
  • explaining the relationship to sum types in the PEP

For another real-world example where mapping types is only solvable with tons of overloads and a lot of additional book-keeping that could be forgotten. See setuptools commands: Add commands overloads from typeshed by Avasam · Pull Request #5020 · pypa/setuptools · GitHub

2 Likes