A small implementation of Algebraic Data Types

This is split from Syntactic sugar for union cases in match statements

This is becoming off-topic, but I did previously implement such a syntax in pure python. Ofcourse, this is a high-magic version, that, while reading very nicely, probably shouldn’t be used in production, IMO this is closer to Iimagine if people say “ADT support”.

I am personally still in favor of crutches like the one described in this thread unless Core Devs are actually willing to add support for syntax that magic in the language/stdlib itself.

That sounds like “support” is being used as code for “syntactic sugar”? Your example can already be written with enums, unions and dataclasses, so an obvious question would be “why do we need another way to write the same code that makes Python look more like OCAML?” You wouldn’t be able to do a lot of the things you can do with explicit enums and dataclasses (e.g. class methods), so now you’re having to rewrite the whole thing back into regular Python if you need to add one of those. OCAML doesn’t face that problem because it doesn’t have methods.

Ofcourse, support means syntactic sugar. Python is Turing Complete, anything can already be encoded somehow. dataclass “support” is also just syntactic sugar, so are Enums and everything else you mentioned. We don’t need more syntax, but that means that in contrast to enums and dataclasses, ADTs are second class citizens. This isn’t fundamentally bad, but it means that you are saying that they are less valuable than these other abstractions. The syntax show in the linked example is also the most extreme version I can think of. Slightly more reasonable syntax would be something like this:

class Calc(ASD):
    class Code:
        stms: list[Stmt]
    class Stmt: ...
    class Assign(Stmt):
        name: str
        value: Expr
     class Print(Stmt):
        value: Expr
    # ...

This is just an example, slightly variations of it would IMO also be very nice. [1]

This would then allow composing with most other python features I can think of. We (or at least I) are fundamentally only talking about syntactic sugar, and ofcourse the ability to teach this to static type checkers which are too stupid to understand dynamic python code like is required to make any of the above work. [2]

  1. this exact syntax can’t actually be implemented with the current semantics of python in a way such that issubclass(Calc.Code, Calc) is true, which is something I would expect. ↩︎

  2. IMO, a universal plugin system would also be a good investment so that fewer of these features need to be added to the stdlib, but at least some type checker devs are fundamentally opposed to that. ↩︎

Dataclasses and enums are just libraries. They explicitly do not have syntactic support.

1 Like

And I don’t think any suggestion here is requesting syntactic support? OP is requesting a small extension to the Union builtin type (no syntax change required), and my links above are already possible pure-python implementations, similar to enum. But having stdlib support means a proper PEP, proper specs, better interactions with the rest of the typing system, better than anything that any third party library can possibly provide.

Tuples and unions are ADTs, they aren’t “second class”. What is “second class” is a particular way of writing ADTs peculiar to ML. This needs a strong justification before adding all that extra complexity to every static type checker.

ADT is an overloaded term with many different interpretations. In particular, what I mean with ADTs here, is rust-style enums, which in a similar style also exists in haskell. Do you have a better term for this? I don’t actually know or really care about OCaml or other more classical ML languages where you seem to be pulling your definitions from. In fact, I see the value of this feature completely independent of other languages: I have on multiple occasions written families of dataclasses that represent for example the AST of a programming language. Writing dozens of dataclasses in the same file is an annoying task, and then not even gaining the typing benefits like completeness checks without extra work doesn’t make this any more appealing.

One of my motivations/inspirations is the Abstract Grammar as used in the python docs: ast — Abstract Syntax Trees — Python 3.12.2 documentation . Even the python docs seem to be of the opinion that Python doesn’t have a nice and concise way of defining such a type family. (and there is even a generator tool to take such a description and turn it into type objects, although that targets C and not python)

And ofcourse I am using a tagged unions: That is literally the exact implementation strategy behind all python objects in CPython. Specially, tagged union are an implementation strategy (for a kind of ADT), not by itself a type-system feature.

Edit: Also, this is getting really off-topic. If a mods wants to split out the last few messages (probably everything after me posting the github link), I think that would be a good idea.

If you are going to systemically ignore what I am saying, I don’t see much value in continuing this discussion.

1 Like

Aha, that would explain it. I am talking about Algebraic data types, not Abstract data types.

And you did not understand what I am talking about. I am not talking about what the typing site calls Union. And my implementation of ADTs linked above doesn’t use it.

What I am saying is that in CPython PyObject* is a tagged union, with the type of the object acting as a tag. This is undisputable unless you redefine what union means or for some reason disqualify the type from acting as a tag. Which is why I said that you aren’t actually reading what I am saying.

As I said, this has little to do with OP, which is talking about typing’s Union at this point.

Ok, I can work with that. Then is typing.Union not a tagged union?

I assume this explains why your example code has a fallback case despite having exhaustively covered every possibility?

Hm, this is my third rewrite of this paragraph xD. I hadn’t actually fully thought out my position on this.

Yes, it is a tagged union: You can always figure out which branch you are [1]. This is contrast to a C-union which isn’t tagged (by default) which means you don’t know which fields are valid. However, the syntax for the Python union doesn’t quite mean the same as you would expect from proper ADTs, i.e. you are allowed to write A | A and it neither errors (“unique tags required”) nor does it create two distinct branches (Left and Right). I guess this is what I fundamentally mean with ADTs not being first class: There is not simple, quick and very clear way to create runtime and typing distinct branches for the one tagged union python contains.

Well, type checkers don’t like the syntax I am using anyway and I haven’t bothered explaining it to them. The raise at least guarantees that I get a runtime error if I forget something. I could also have used assert False or a similar “unreachable” marker.

  1. Assuming only basic types at least ↩︎

So, as I see it, the features of this sugar are:

  • enumerations via “zero-data” classes instead of singleton objects
  • “enforced” tagged unions
  • types are data-only

Is that right?

Yes, for the exact syntax linked that looks complete. But IMO the conciseness of this is an important factor. If the equivalent to the linked Syntax Definition doesn’t fit in one page it’s IMO too verbose.

1 Like

Given it’s “just” syntactic sugar, I’d suggest making a code generator to take the concise definition and turn it into verbose Python code. That way you get a usable product that works with existing type checkers. Not as ideal as native support, but you’d be able to see if there are any missing language features besides the sugar. If you get a lot of usage, it could then go the route of attrs.

edit or perhaps you can just code generate a stub file for your existing magic implementation? I’ve never tried that.

I am not going to add a compile step to normal python code if metaprogramming features are already an option, I don’t care about static type checking support for this so much that I am willing to add extra code that might go out-of-sync. This also wouldn’t play nicely with IDEs

If there was a way to have a generic config so that type checkers automatically call my code generator if they need it, then I might invest effort into writing one. This is actually an idea I had a few times already, specifically based on the idea that most things people want to do with metaprogramming are things that can already be expressed in pure python code with just a lot longer definitions. AFAIK, this would cover basically all of dataclass_transform

Only by giving them different tags. The Python equivalent of that is a
union of two different classes, that both happen to have an attribute of
type A.

In Python, the class of the union member IS the tag.

1 Like