Union Type Broadcasting

zhangyx · November 7, 2024, 12:15am

This seems like a low-hanging fruit to me:

# Instead of this
x: list[str] | set[str]
# Why not this?
x: (list | set)[str]

i.e.

Union[A, B][T] = Union[A[T], B[T]]

Most obvious benefit will be you don’t have to write str twice in the example above.
This is advantage will be more significant if your type arg is very long.

Also, it saves you from modifying the same type arg in multiple places - this is prone to human mistakes.

mikeshardmind · November 7, 2024, 12:26am

There’s a couple potential wrinkles here. If we say it is just broadcasting, then most of those issues don’t exist. If this is instead meant to be distributive and also possible to re-expand (which isn’t something you’ve indicated), the potential issues with this are with the variance mismatches possible, but I think we could iron those out (this already mostly works as expected with other means of writing this)

I don’t see a strong reason against it if defined as broadcasting, but I’m not super inclined to support adding another way to do this right now unless there’s a benefit beyond a few characters saved.

3.12+ type statements provide some relief from this repetition, as does use of typing.TypeAlias

zhangyx · November 7, 2024, 12:46am

Pseudo python code demonstrating how it behaves:

class UnionType(...):
    def __getitem__(self, p: type | tuple[type]):
        return Union[tuple(t[p] for t in get_args(self))]

– Just let it error out if something inside the union does not accept p.

Not sure if I understood the difference between broadcasting and distributive correctly. But if distributive looks something like this, then it is definitely not what I want:

(tuple[A] | tuple[B])[C] = tuple[A, C] | tuple[B, C]

This has gone too far to be covered by the syntax.

Edit: Upon second thought it could support both broadcasting and distributing at the same time. The behavior should be deterministic.

Pseudo python code again:

def extend(t, p):
    origin = get_origin(t)
    if origin is not None:
        if not isinstance(p, tuple):
            p = (p,)
        return origin[*get_args(t), *p]
    else:
        return t[p]

class UnionType(...):
    def __getitem__(self, p: type | tuple[type]):
        return Union[tuple(extend(t, p) for t in get_args(self))]

Can you point me to some threads on “variance mismatches”? I’m not familiar with it.

mikeshardmind · November 7, 2024, 1:24am

The potential variance mismatches shouldn’t be a real problem, but it’s something that anyone implementing this would need to be aware of.

The easy example here is

x: (list | tuple)[str]

if x is a list[str], the str component is invariant, but if it’s tuple[str], then covariant.

so for the full expression (list | tuple)[str], is str invariant or covariant? (answer: invariant) and more difficultly, does that pose a problem when narrowing in the same way that Sequence | MutableSequence can? (answer: currently unresolved in the type system)

I think you can sidestep all of those questions though, and that this isn’t really needed

# 3.12+
from typing import reveal_type
type X[T] = list[T] | tuple[T]  # reduce repetition

def ex(foo: X[str]):
    reveal_type(foo)  # (pyright): Type of "foo" is "list[str] | tuple[str]"

# you can also do these
type Y = X[str]
type Z = list[str] | tuple[str]

Jelle · November 7, 2024, 1:41am

To me this just doesn’t feel worth the extra complexity. Your proposal can make some code slightly shorter, but it doesn’t really become clearer to readers. There is cost in making the runtime and type checkers more complex, and in making readers of typed code learn this new syntax.

As Michael noted, you can use type aliases to shorten code that repeats some pattern.