Another attempt at clean sum types/ADTs in Python

(skip to the heading The Proposal if you’re already familiar with the problem)

The Problem

Consider this code in Swift:

enum BgColor {
    case transparent
    case name(String)
    case rgb(Int, Int, Int)
    case hsv(Int, Int, Int)
}

var backgroundColor = BgColor.rgb(39, 127, 168)

switch backgroundColor {
    case BgColor.transparent:
        print("no color")
    case let BgColor.name(colorName):
        print("color name: \(colorName)")
    case let Barcode.rgb(red, green, blue):
        print("RGB: \(red), \(green), \(blue).")
    case let Barcode.hsv(hue, saturation, value)
        print("HSV: \(hue), \(saturation), \(value).")
}

it let’s you precisely express that there are 4 possibilities and that different values are associated with each possibility.

In Python, you might express this like this:

background_color = {"type": "rgb", "val": (39, 127, 168)}

match background_color:
    case {"type": "transparent"}:
        print("no color")
    case {"type": "name", "val": color_name}:
        print(f"color name: {color_name}")
    case {"type": "rgb", "val": (red, green, blue)}:
        print(f"RGB: {red}, {green}, {blue}")
    case {"type": "hsv", "val": (hue, saturation, value)}:
        print(f"HSV: {hue}, {saturation}, {value}")

This works if you’re not interested in static type checking (which is fine), but if you are interested in static type checking then you’ll find that the current ways to make this type-safe are not as convenient as the Swift code.

You could type the above code with TypedDict but I think most would probably do it with NamedTuples:

from typing import NamedTuple, TypeAlias

class Transparent:
    pass
class Name(NamedTuple):
    color_name: str
class Rgb(NamedTuple):
    red: int
    green: int
    blue: int
class Hsv(NamedTuple):
    hue: int
    saturation: int
    value: int

BgColor: TypeAlias = Transparent | Name | Rgb | Hsv

background_color: BgColor = Rgb(39, 127, 168)
assert isinstance(background_color, BgColor)

match background_color:
    case Transparent():
        print("no color")
    case Name(color_name):
        print(f"color name: {color_name}")
    case Rgb(red, green, blue):
        print(f"RGB: {red}, {green}, {blue}")
    case Hsv(hue, saturation, value):
        print(f"HSV: {hue}, {saturation}, {value}")

As you can see, this has become very verbose.

You can do it a bit shorter, but it’s still not that elegant:

class Transparent: ...
Name = NamedTuple("Name", [("value", str)])
Rgb = NamedTuple("Rgb", [("value", tuple[int, int, int])])
Hsv = NamedTuple("Hsv", [("value", tuple[int, int, int])])

BgColor: TypeAlias = Transparent | Name | Rgb | Hsv

The Proposal

To solve this problem, I would like to propose a new construct: TypeEnum. It’s like an Enum but instead of its elements being values, they’re types.

(Alternative names for this concept: NamedUnion, TaggedUnion, NamedTupleUnion.)

It works like this:

from typing import TypeEnum

class BgColor(TypeEnum):
    transparent = ()
    name = (str,)
    rgb = (int, int, int)
    hsv = (int, int, int)

background_color = BgColor.rgb(39, 127, 168)
assert isinstance(background_color, BgColor)
assert not isinstance(BgColor.rgb, BgColor)  # different from Enum

match background_color:
    case BgColor.transparent:
        print("no color")
    case BgColor.name(color_name):
        print(f"color name: {color_name}")
    case BgColor.rgb(red, green, blue):
        print(f"RGB: {red}, {green}, {blue}")
    case BgColor.hsv(hue, saturation, value):
        print(f"HSV: {hue}, {saturation}, {value}")

Under the hood, TypeEnum does something like this:

class BgColor:
    transparent = 0
    name = NamedTuple("name", [("item0", str)])
    rgb = NamedTuple("rgb", [("item0", int) , ("item1", int), ("item2", int)])
    hsv = NamedTuple("hsv", [("item0", int) , ("item1", int), ("item2", int)])

However, this doesn’t include the magic necessary to make isinstance(BgColor.name("cerulean"), BgColor) work.

I’m using NamedTuple here because I need something that will populate __match_args__ for the pattern matching, and NamedTuple is an easy way to get that. In the actual implementation, it probably wouldn’t be really NamedTuple but it needs to be something that you can pattern-match on.

It would also be nice if there was an option for named fields, but it’s not a must from my side:

from typing import TypeEnum

class BgColor(TypeEnum):
    name = (str,)
    rgb = {"red": int, "green": int, "blue": int}  # named args

background_color = BgColor.rgb(red=39, green=127, blue=168)
print(background_color.red)

This syntax with named fields was actually previously suggested in this mypy issue:

So, what do you think?

2 Likes

Whenever you find yourself switching on related types, you should always consider using polymorphism. In my opinion, your different color should all inherit from some Color base class, which would have a display method that does what you want it to do here.

Consider what would happen if you added a RGBWithAlpha class. Then, all of you match statements would have to add a case. With polymorphism, you can have RGBWithAlpha decide which behaviour it should inherit from its parent RGB.

Another benefit of polymorphism is that you can define an interface with its appropriate contracts, which can be checked for all subclasses.

Unless you want iterability, I think you should replace NamedTuple with dataclass. At the very least, Name should not be a NamedTuple. Keep APIs as small as possible without sacrificing utility.

Your motivation seems to be that you want to switch on types. Switching on related types is generally a code smell. I think you should use polymorphism instead. I think the match statement is more appropriate for switching on unrelated types.

1 Like

I think this is a cool idea. :wink: (Sorry I don’t have more valuable input here as I don’t have a use case for this just yet.)

There are 3rd-party libraries like GitHub - dusty-phillips/match-variant: Python variant types that work with match which implement this today, albeit without syntax support (disclaimer: I am a co-creator of that project). To get something like this farther, either the community is going to need to pick it up and then convince the typing community to support it like enums and dataclasses, or you have to get into Python itself, and that’s going to take a PEP.

6 Likes

Decades of functional programming experience indicate otherwise. Sum types and match go together like a horse and carriage:

data Tree a = Node a (Tree a) (Tree a) | Leaf a

find_in_sorted_tree x t = case t of
  Node y l r where y == x -> t
  Node y l r where y < x -> find_in_sorted_tree x r
  Node y l r -> find_in_sorted_tree x l
  Leaf y -> y

OOP is nice, but Python is a multiparadigm language and we shouldn’t discount non OO ways of doing things. Especially nice ways that we already have 90% of.

4 Likes

I don’t see what your example has to do with what I wrote? Are you trying to show switching on types being an example of elegant code?

Yes, being elegant, and specifically being a nice complement for match. These are features that were designed to be used together, I’d even say made for each other.

Also, seems like I was tired and forgot the crux of the example. Edited.

1 Like

Well, it’s a matter of opinion, but I think the structural version is flimsier than the polymorphic version. If you add a type that belongs in your tree, you’ll have to redo all of the calling code. On the other hand, if you have a small interface, you can implement operations like find and replace through that interface. If you add a type that belongs in your tree, you only have to implement the small interface.

Also, while Python may be “multiparadigm”, I think there are some things that are just un-pythonic. For example, chaining: x.some_function().some_other_function().some_third_function(). This is popular in functional languages, but it’s usually bad code in Python.

Python 3.12 can run (and pyright can check) this recursive ADT:

from dataclasses import dataclass

type Exp = Var | Ap | Fn | Let

@dataclass
class Var:
    name: str

@dataclass
class Ap:
    target: Exp
    arg: Exp

@dataclass
class Fn:
    name: str
    ret: Exp

@dataclass
class Binding:
  name: str
  exp: Exp

@dataclass
class Let:
  bindings: list[Binding]
  body: Exp


def format_exp(exp: Exp) -> str:
    match exp:
        case Var(name):
            return name
        case Ap(target, arg):
            return f"({format_exp(target)} {format_exp(arg)})"
        case Fn(name, ret):
            return f"(lambda ({name}) {format_exp(ret)})"
        case Let(bindings, body):
            bindings_text = ' '.join(map(format_binding, bindings))
            return f"(let ({bindings_text}) {format_exp(body)})"


def format_binding(binding: Binding) -> str:
    return f"[{binding.name} {format_exp(binding.exp)}]"


print(format_exp(Var("x")))
print(format_exp(Ap(Var("f"), Var("x"))))
print(format_exp(Fn("x", Ap(Var("f"), Var("x")))))
print(format_exp(Let([
    Binding("x", Ap(Var("f"), Var("x"))),
    Binding("y", Ap(Var("f"), Var("x")))
], Ap(Var("x"), Var("y")))))
4 Likes