Union Type Broadcasting

Moved from why not…?

This seems like a low-hanging fruit to me:

# Instead of this
x: list[str] | set[str]
# Why not this?
x: (list | set)[str]

i.e.

Union[A, B][T] = Union[A[T], B[T]]

Most obvious benefit will be you don’t have to write str twice in the example above.
This is advantage will be more significant if your type arg is very long.

Also, it saves you from modifying the same type arg in multiple places - this is prone to human mistakes.

1 Like

There’s a couple potential wrinkles here. If we say it is just broadcasting, then most of those issues don’t exist. If this is instead meant to be distributive and also possible to re-expand (which isn’t something you’ve indicated), the potential issues with this are with the variance mismatches possible, but I think we could iron those out (this already mostly works as expected with other means of writing this)

I don’t see a strong reason against it if defined as broadcasting, but I’m not super inclined to support adding another way to do this right now unless there’s a benefit beyond a few characters saved.

3.12+ type statements provide some relief from this repetition, as does use of typing.TypeAlias

Pseudo python code demonstrating how it behaves:

class UnionType(...):
    def __getitem__(self, p: type | tuple[type]):
        return Union[tuple(t[p] for t in get_args(self))]

– Just let it error out if something inside the union does not accept p.


Not sure if I understood the difference between broadcasting and distributive correctly. But if distributive looks something like this, then it is definitely not what I want:

(tuple[A] | tuple[B])[C] = tuple[A, C] | tuple[B, C]

This has gone too far to be covered by the syntax.


Edit: Upon second thought it could support both broadcasting and distributing at the same time. The behavior should be deterministic.

Pseudo python code again:

def extend(t, p):
    origin = get_origin(t)
    if origin is not None:
        if not isinstance(p, tuple):
            p = (p,)
        return origin[*get_args(t), *p]
    else:
        return t[p]

class UnionType(...):
    def __getitem__(self, p: type | tuple[type]):
        return Union[tuple(extend(t, p) for t in get_args(self))]

Can you point me to some threads on “variance mismatches”? I’m not familiar with it.

The potential variance mismatches shouldn’t be a real problem, but it’s something that anyone implementing this would need to be aware of.

The easy example here is

x: (list | tuple)[str]

if x is a list[str], the str component is invariant, but if it’s tuple[str], then covariant.

so for the full expression (list | tuple)[str], is str invariant or covariant? (answer: invariant) and more difficultly, does that pose a problem when narrowing in the same way that Sequence | MutableSequence can? (answer: currently unresolved in the type system)

I think you can sidestep all of those questions though, and that this isn’t really needed

# 3.12+
from typing import reveal_type
type X[T] = list[T] | tuple[T]  # reduce repetition

def ex(foo: X[str]):
    reveal_type(foo)  # (pyright): Type of "foo" is "list[str] | tuple[str]"

# you can also do these
type Y = X[str]
type Z = list[str] | tuple[str]
2 Likes

To me this just doesn’t feel worth the extra complexity. Your proposal can make some code slightly shorter, but it doesn’t really become clearer to readers. There is cost in making the runtime and type checkers more complex, and in making readers of typed code learn this new syntax.

As Michael noted, you can use type aliases to shorten code that repeats some pattern.

11 Likes

As I perceive it, (in most cases) the extra performance cost only applies once during “compile time” - a.k.a byte code generation. And it only occurs when user elects to use this syntax. Performance sensitive code can intentionally avoid using this.

Overall I agree with you that it does not worth a new syntax to just make the code a bit shorter. This proposal can be left as-is until some more persuasive use cases occurs which demands such syntax.

Thank you so much for this great explanation!

I do have some questions for it. I will try to work them out myself before I come back and ask.

The biggest problem of this proposal is that at least at runtime it conflicts with existing behavior, specifically because the Union type automatically does type variable expansion if (some of) the inner types are generic.

T = TypeVar('T')

ListOrSet: TypeAlias = list[T] | set[T]
ListOrSetOfInt = ListOrSet[int]

For single type variables, this is mostly fine (i.e. the rule describe in OP would result in the same type), but not always

T = TypeVar('T')

ListGenericOrSetOfStr: TypeAlias = list[T] | set[str]
ListOfIntOrSetOfStr = ListOrSet[int]

If the rule described in OP were to be applied, set[str][int] would be an error.

This would be even worse to the IMO point of impossibility if multiple type variables are involved:

U = TypeVar('U')

DictOrReverseDict = dict[T, U] | dict[U, T]

DictOrReverseDict[str, int] == dict[str, int] | dict[int, str]

According to the rules of OP, this would instead be dict[T, U][str, int] | dict[U, T][str, int], which is dict[str, int] | dict[str, int] - since the type variables are filled in by appearance order.

So not only do I think this isn’t really worth the small amount of typing this saves, it also doesn’t work in a consistent way without introducing a lot of decently complex rules.

Note that this means that at least at runtime you could write (list[T] | set[T])[str] - but this is not valid for static type checkers AFAIK. It would probably be annoying to keep track of the scope of these highly temporary type variables, but this[1] AFAICT at least wouldn’t have most of the issues described until now.


Interesting. While playing around I learned that pyright actually already accepts (list | set)[int] and treats it as described in OP.

@erictraut is there a reason for this?


  1. I.e. making static type checkers recognize highly-local type variables ↩︎

This is an interesting example - the order of parameters is not clear.

My local experiment shows simply swapping the order of union args changes order of parameters.

Alias = dict[U, T] | tuple[T, U]
print(Alias.__parameters__)  # (~U, ~T)

Alias = tuple[T, U] | dict[U, T]
print(Alias.__parameters__)  # (~T, ~U)

Is this behavior even standardized?

Yes, that is expected. And the order is clear and well defined - order of appearance in expression.

Note that there are probably edge cases where this breaks down - but this simple rule is the idea.

Okay so the following will work really close to my OP. And it’s already a valid syntax:

T = TypeVar("T")
(list[T] | set[T])[int] == list[int] | set[int]

Then why not allow omitting the extra declaration and specification of [T] (when possible)?

Interesting +1. My bare metal python (3.13.0) does not allow it. Can you share a link?

  • It is valid code at runtime, but not a valid type expression as far as static type checkers are concerend - they would have to learn about this.
  • “when possible” is doing a lot of heavy lifiting - precisely defining when it’s fine to do this is a tricky thing, and probably impossible to do at runtime.

In general, typing, the runtime library supports a superset of valid type expressions - you should never look at it first when wanting to add new features, always at mypy and/or pyright (or maybe one of the other static type checkers).

Yes, bare metal python doesn’t support it, I never said that. pyright does:

https://pyright-play.net/?reportMissingModuleSource=true&code=GYJw9gtgBALgngBwJYDsDmUkQWEMoAqiApgGoCGIANISQIIA2S5AzgFBsFQC8tCZlABQByAsICUHOgC4%2BxRsxY8ogpi3wAfKC2IxxAbVQwAuhzYATYsCjBB4qAFoAfFBlsoHqADofHEMQA3YnIGAH14fkFbcXEgA

I think this is a bug — a missing check. This isn’t valid type expression syntax according to the typing spec.

I’ve filed a bug report in the pyright issue tracker.

3 Likes