Syntactic sugar for union cases in match statements

Given these definitions:

from typing import NamedTuple, TypeAlias

class A(NamedTuple):
    x: int
    y: str

class B(NamedTuple):
    x: int
    y: str

it would be nice if I could write the following:

AB: TypeAlias = A | B

match ab:
    case AB(x, y):
        print(f"{x=}, {y=}")

instead of having to write:

match ab:
    case A(x, y) | B(x, y):
        print(f"{x=}, {y=}")

That is, it would be great if the union type was expanded into an or-pattern in match-case.

Motivation: the above example is a bit silly. Where this would in practice be helpful is if you have a union over subclasses from a single base class:

from abc import abstractmethod
from dataclasses import dataclass

@dataclass(frozen=True)
class Base:
    x: bool
    y: int | str

    @abstractmethod
    def f(self) -> str:
        raise NotImplementedError()

@dataclass(frozen=True)
class A(Base):
    y: int  # narrowed type

    def f(self) -> str:
        return "I'm class A"

@dataclass(frozen=True)
class B(Base):
    def f(self) -> str:
        return "I'm class B"

# This gives us a type which can be used for exhaustiveness checking.
# We can't use `Base` for that because it might have other subclasses.
AB: TypeAlias = A | B

When we want to distinguish A and B, we can exhaustively check over AB:

def f(ab: AB):
    match ab:
        case A(x, y):
            print(f"an int: {y}")
        case B(x, y):
            print(f"either str or int: {y}")

But we can also match on the whole thing at once:

def f(ab: AB):
    match x:
        case AB(x, y):
            print(f"either str or int: {y}")

Specification: There would be a runtime error if the __match_args__ of the union elements aren’t all the same.

Other justification: unions already work with isinstance:

isinstance(ab, AB)

so why not case as well?

5 Likes

Naming a TypeAlias makes it reusable, unlike Or patterns. It’s logical, given isinstance. Unions give a bit more control than searching down the mro

Isn’t this extremely niche? Is it very often the case that two child classes have matching members?

Anyway, isn’t this also an unnecessary restriction if you use keyword arguments in the case statement? If you’re going to go this route, then all of the keywords passed to the case statement just need to be an intersection of the members for each element of the union.

I think it would also help if there were motivating examples that match against AB. And for each example, to compare the case where type AB = A | B with having AB as explicit base class.

Right, it probably makes sense to make this less strict. The following seems pretty safe:

from typing import NamedTuple, TypeAlias

class A(NamedTuple):
    x: int  # only `x` this time

class B(NamedTuple):
    x: int
    y: str

AB: TypeAlias = A | B

match ab:
    case AB(x):  # only `x`
        print(f"{x=}")

as syntactic sugar for

match ab:
    case A(x) | B(x, _):  # note the wildcard usage
        print(f"{x=}")

which would work as long as all the __match_args__ start with an identical sequence.

And then you could loosen it even further by allowing different names in the __match_args__:

class A(NamedTuple):
    x: int

class B(NamedTuple):
    x: int

AB: TypeAlias = A | B

match ab:
    case AB(x):
        print(f"{x=}")

That’s probably also fine, though it could lead to hard-to-notice bugs.

I’m not sure I’m understanding correctly, but to re-iterate, the main motivation is as an alternative to the sealed class proposal; as an alternative way of making static exhaustiveness checks work for subclasses.

We can of course define it like this:

@dataclass
class AB: ...

@dataclass
class A(AB):
  x: int

@dataclass
class B(AB):
  x: int

ab: AB = A(3)

match ab:
  case AB(x):
    print(f"{x=}")

This works, but then when we want to distinguish A and B, static type checkers will complain:

match ab:  # match is not exhaustive
  case A(x):
    print("this is A")
  case B(x):
    print("this is B")

So, we would like to use the type A | B, so that exhaustiveness checking works, but when we do that and use a type alias of A | B everywhere, we encounter the problem that you can’t use that type alias as a case:, which is something you might sometimes want to do.

class ExpressionBase:
    def shared_functionality(self): ...

@dataclass
class Name(ExpressionBase):
    name: str

@dataclass
class Operation(ExpressionBase):
    left: "Expression"
    op: str
    right: "Expression"

Expression: TypeAlias = Name | Operation

def f(node: Expression | Statement):
    match node:
        case Expression():  # we want to match on any `Expression` here
            print("it's an expression")
        case Statement():
            print("it's a statement")

def g(expr: Expression):
    match expr:
        case Name(name):
            print(f"{name=}")
        case Operation(left, op, right):
            print("it's an operation")

Hmm, typing out this example, it seems that it’s in practice probably most useful to use the union type alias in a case that does not bind any variables.

So, we should definitely do the thing where it’s fine when all the __match_args__ aren’t perfectly identical.

Right, I wasn’t sure if you had another motivation in mind or not. I understand now why you want this.

I think it’s a lot of effort to fix a minor inconvenience. But I do think case statements that support unions are a cool idea. Maybe it’s worth keeping these in mind in case other use cases pop up over the years?

Assuming you meant to have different names in the two classes, then no, this can’t really be done with the current model of pattern matching since __match_args__ is statically looked up on AB without being able to know the type of ab. Everything else can already be implemented:

class UnionType(type):
    def __init__(self, *_):
        pass

    def __new__(cls, *classes):
        self = type.__new__(cls, 'Union', (), {})
        self.classes = classes
        return self

    def __str__(self):
        return ' | '.join(cls.__name__ for cls in self.classes)

    @property
    def __match_args__(self):
        common_prefix = None
        for cls in self.classes:
            cls_ma = getattr(cls, '__match_args__', ())
            if not cls_ma:
                return ()
            if common_prefix is None:
                common_prefix = cls_ma
                continue
            if len(cls_ma) < len(common_prefix):
                common_prefix, cls_ma = cls_ma, common_prefix
            if common_prefix == cls_ma[:len(common_prefix)]:
                continue
            common_prefix = common_prefix[:next(i for i, (a, b) in enumerate(zip(common_prefix, cls_ma)) if a != b)]
        return common_prefix

    def __instancecheck__(self, instance):
        return isinstance(instance, self.classes)


AB: TypeAlias = UnionType(A, B)

Behaves correctly, setting __match_args__ to the common prefix of all arguments.

Ofcourse, a small change to pattern matching so that it isn’t necessary to subclass and misuse type here would be nice, but otherwise this feature is “pure python”, just requiring implementation in typing and telling type checkers about it.

2 Likes

Ah yes, that makes sense. Ignore that idea then.

Thank you for working this out! Nice to see that this works.

I think adding support for this would be a mistake. The original thread wanted ADTs, and while this solves a thing they wanted while avoiding the issues the sealed decorator proposed, adding an ad-hoc way to solve part of ADTs by special casing unions would create another special case related to typing without fully helping with adding ADTs.

It’s going to better if those who want ADTs work on full syntax level support for it that avoids the issues of the sealed decorator and of special casing unions here.

3 Likes

ADTs are already “solved” with unions. They are called “algebraic” precisely because they let you add arbitrary types together! We shouldn’t be trying to introduce a new way to spell a union that has a different feature set from the existing way to spell unions.

While I would disagree that Unions are ADTs as people are familiar with the term from other languages, I would agree that the problems ADTs help solve are already solvable with things python has.

Unions are a type-system exclusive construct, yes you can do a limited number of things with them at runtime, but that’s not their primary purpose and very little exists around that. This is partially intentional, as having runtime behavior on typing features that isn’t clearly separated has lead to problems of inconsistency like…

and

Even isinstance’s second parameter accepting a Union has been considered a mistake by a few people in hindsight. This recently came up in another discussion with adding isinstance support to other type-sytem parts (quoted below).

After some consideration, I’m negative on adding match support to Union but positive on an ADT construct that works with or without the type system and Unions (though it should be compatible for typing users, it should work for non-typing uses given the motivations)

3 Likes

I’m essentially in agreement with everything you said, but I’d add a further proviso here. It feels to me that the way discussions around ADT proposals are framed often takes the form of “here is a useful construct from another language (often a strongly typed one that typically borrows ideas from pure functional programming[1]), let’s add it to Python”. While this isn’t necessarily bad, I think that such proposals would benefit from a much more balanced consideration of how Python currently solves the problems the new proposal is targeting and what new advantages the proposal brings. At this point, Python is perfectly capable of solving most problems, and we should be less interested in what problems a proposal targets, and more interested in why we need a new way of solving those problems. Otherwise we risk adding things just because they are currently trendy.


  1. OK, often Rust… ↩︎

8 Likes

I believe the only distinction is whether the union is tagged, and you can trivially construct a tagged union from named tuples and untagged unions.

This is a pretty significant difference. you can’t use class methods or constructors on a union, and that’s a good thing. The ADTs of other languages offer construction and the ability to implement methods/traits/etc for the ADT.

Not in a type safe manner in python, and not ergonomically. You’re better off inverting the problem and switching on the type in code that uses the value, rather than in the container holding the value. But that’s actually possible in python, and we don’t have a compiler doing fancy things with ADTs. We don’t need ADTs to solve problems in python.

from typing import NamedTuple
import enum

class _(enum.Enum):
    EMPTY = enum.auto()

EMPTY = _.EMPTY

class ADT(NamedTuple):
    tag: type[A] | type[B]
    a_value: A | Literal[EMPTY] = EMPTY
    b_value: B | Literal[EMPTY] = EMPTY

Even with placing EMPTY there, there’s no way to know if a_value or b_value is safe from the tag statically, and you also can’t change that to just be:

class ADT(NamedTuple):
    tag: type[A] | type[B]
    value: A | B

either one would need dependent types.

But you can just do:

value: A | B = some_called_thingamajig()
match value:
    case A:
        ...
    case B:
        ...
class TagA(NamedTuple):
  a: TypeA

class TagB(NamedTuple):
  b: TypeB

TaggedUnion = TagA | TagB

This is a fully type-safe tagged union, and the pattern is extremely common. It tends to crop up also with TypedDict, the other common building block of ADTs.

That’s not a tagged union, and it’s also completely unnecessary. In that case, you need to switch on the type of the named tuple. switch on the type of the value instead, it’s available to you

1 Like

This doesn’t look like a tagged union to me either. I don’t see any benefit to this construction over just

SomeUnion = TypeA | TypeB

This isn’t the same as an ADT as provided by many compiled languages You’ve created two structures that are disjoint here and said you have one or the other, not one unified structure with disjoint fields that are checked for and that the language knows about. With this difference, there’s no benefit, and I’m inclined to think there’s no need to support ADTs if there isn’t a stronger motivation than something currently solved like this.

The only reason I could see using this would be if you have some library modeling a web API you have no say over, and the library can’t handle transforming data into a more useful representation, but that’s not a language limitation, that would be a library design choice.

I can’t help that :confused: That’s what a tagged union is - a union with a unique tag attached to each option so you can distinguish when two overlapping types get tagged. For instance, in my example, if you add a type that’s a subclass of both TypeA and TypeB, you can’t tell which branch of TypeA | TypeB it is supposed to be. With the tagged union, you can, because the associated tag is either a or b. (Usually it’s more obvious, two of the types just match for two different tags.)

The Wikipedia page may help: Tagged union - Wikipedia

Now often tagged unions are represented in memory as an explicit type int plus a pointer. That’s not what a tagged union is, though. It’s just a way of storing it.

This is exactly my point, though. We don’t need the tagged union that, say, OCAML provides, because it’s almost never necessary versus a simple union (which OCAML does not have). We already have ADTs fully available in Python.

Right, but it’s not valid to discriminate unions in python based on the presence of attributes, except in the case of runtime checkable protocols. There could be a hypothetical subclass of TypeA that has an attribute b. That’s why this isn’t a tagged union in python.

1 Like

Ah, I see, you may need to make the tuple classes final to fix that. I forgot about it because pyright stopped needing final for TypedDicts when used in this manner, and that’s how I normally write it. (Commit where pyright changed this behaviour, which I always struggle to track down.)

@final
class TagA(NamedTuple):
  a: TypeA

@final
class TagB(NamedTuple):
  b: TypeB

TaggedUnion = TagA | TagB
1 Like

I don’t know if this counts as a tagged union or how you’re meant to operate on it, but both this proposal and sealed look rather unergonomic compared to the simplicity of e.g. enums in Rust. That’s what I personally imagine a tagged union to look like in Python - an enum-like construct whose members are instantiable.

3 Likes