Atom-like Enums in Python

Pathos315 · March 2, 2024, 1:33pm

Currently, to create an Enum, it looks like this:

from enum import Enum

class Color(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

And to use an Enum’s value, you’d need to enter Color.RED.value all to return a string literal of "red". It’s a bit cumbersome.

The coding language Elixir employs a data-structure called an atom, written like this → :foobar, which sets the name as its value (i.e. :foobar returns "foobar"), and is immutable/frozen.

I’d like to propose adding atom-like enums to Python.

By writing :red, you would create an object whose name “red” is equal to its value “red”.

This would not replace Enums. It would likely be syntactic sugar for a common enum use case, but I suspect more uses are possible. For more specific uses, Enums would remain.

ericvsmith · March 2, 2024, 1:49pm

Is this really related to enums at all? Are multiple atoms gatherable in a group, like RED, GREEEN, and BLUE in your enum are?

Or is this really similar to RED = ‘RED’? Its seems very similar to a named string constant, although maybe the atom couldn’t be rebound?

Some more details would help.

Rosuav · March 2, 2024, 2:04pm

That actually sounds more similar to PEP 661’s sentinels than to enums, would they be of interest to you?

chepner · March 2, 2024, 2:12pm

The main purpose of an Enum is that Color.RED is a usable value in and of itself, without caring what “underlying” value is associated with it. if your primary interest is in the underlying value, then you may want to use an ordinary class with class attributes instead:

class Color:
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

Here, Color.RED is just another name for the value "red". (type(Color.RED) is still str, not Color). In the enumerated type, Color.RED and "red" are distinct values that are not the same (although Color.RED.value and "red" are the same).

There is also StrEnum, which is sort of a combination of the two approaches. Color is still a distinct type, but it is a subclass of str, so all its values are strs as well.

javidcf · March 3, 2024, 1:55am

auto() provides that functionality for StrEnum types. And you can always use the name property of an enum to get its string representation.

JamesParrott · March 3, 2024, 11:43am

If you want a string literal, then just write a string literal. They and atoms, are both immutable and unassignable. Atoms are bound to get abused, simply to avoid writing quotes around string literals.

If atoms are enum members, then when one is created, firstly some implicit Enum class in that scope needs to be instantiated behind the scenes, for the new atom to be a member of. Then secondly when another atom is defined, the scope’s implicit enum class must be mutated to add the new member (currently Enums’re immutable as one would expect).

I can see atoms help you reason about Elixir code. But we can already reason about Python code by examining the namespace. Normally (without globals or locals hacking) new names are only introduced by the assignment statement (=) or the import statement. Atom syntax would be an extra thing to remember, that doesn’t add much and breaks a fundamental useful rule of thumb.

Elixir uses Enum to mean something far more like a Python Iterable. But in Python “enum” means “one of a finite number of possibilities”. Where as atoms could be any valid Python name, and are only defining one possibility.

A huge benefit of Atoms and Python enum members, comes from using them in conjunction with type hints. The best practise for Enum function args, would be to still define the ‘class’, and then use that as a type annotation.

Does Elixir have an identity (e.g. is) operator? The behaviour of is with enum members in Python has been carefully implemented. It’s easy enough to adjust __eq__ to ensure that as in Elixir :red == 'red', but I think the best thing to do would be to have :red is not 'red'. Otherwise then presumably I could call any string method on it, e.g. :red.upper() etc. If red: is 'red' then red: behaves differently to a Python enum member.

Anyway, I may not like the colon syntax currently, and I see problems in breaking immutability, and in using it both as sugar for an enum member, and having it to be a string literal. But I still like the general idea.

I’d just prefer a const key word. const red = "red" is self explanatory, and more in line with “explicit is better than implicit” than :red

Pathos315 · March 3, 2024, 2:02pm

Some more details would help.

Question: what additional details would be most helpful for developing this idea further? I confess I’m a bit new to proposing ideas of this sort

On Gatherability — I could see them as being gatherable, but with tuples instead of an enum class. Currently, via Enums, that’ would look like:

Color = Enum('Color', ['RED', 'GREEN', 'BLUE'])

In this proposal, it would look like:

Color = (:red, :green, :blue)

But it’d be a bit silly, as Color[0] and :red would both return “red”, just that the latter would have fewer steps.

On Named String Constant: this concept is the : operator at the start of a variable name removes the need to set the variable as equal to anything else. By starting a variable name with :, the variable becomes a global, immutable string, whose value is set to its name.

Pathos315 · March 3, 2024, 2:15pm

I’d just prefer a const key word. const red = "red" is self explanatory, and more in line with “explicit is better than implicit” than :red

I’m open to that, or anything else, as an alternative if it’s deemed more user friendly. And const might win out, as I suspect that Python might adopt some features of Mojo over time.

Really the use case that inspired this idea is that I have a program that needs to determine if an academic paper has a “doi” identifier or an “arxiv” identifier. Setting those two terms up as an Enum — which seems like a good idea as it can only be either of those two — would, to my knowledge, require me calling something like Identifier.DOI.value or Identifier.ARXIV.value. My very naïve understanding is that anytime there’s a . in a variable, that’s slowing things down. It also just looks inelegant.

I’m also learning about Elixir, in an unrelated project. The idea of an Atom, or something Atom-like, strikes me as elegant. Instead of the prior overhead, I could just have :doi or :arxiv and it would return “doi” or “arxiv”, and treat it as though it were an Enum.

But I admit there may be huge gaps in my knowledge of Python, Atoms, and just code generally. What I’m hoping to resolve with this idea might very well be fixed by other means. However, Elixir does have the atom-syntax as a feature, so perhaps — I think — there could be merit to it after all.

JamesParrott · March 3, 2024, 2:32pm

You can test for that like:

 if id_str in Identifier.__members__

and then I = Identifier[id_str] and then e.g. match on I.

jamestwebber · March 3, 2024, 3:38pm

As of python 3.12:

You don’t even need to use __members_! But in ≤3.11 this is a TypeError.

stoneleaf · March 4, 2024, 1:30am

It sounds like enums are a good solution here, but how you compare with the paper depends on how that paper is represented. Is it a class with various attributes, or a giant blob of text? If a class with attributes, were those attributes created with the enums you defined? For example:

class AcademicPaper:
    def __init__(self, blob_of_text):
        if "identifier: doi" in blob_of_text:
            self.identifier = DOI
        elif "identifier: arxiv" in blob_of_text:
            self.identifier = ARXIV
        else:
            self.identifier = None

In that case, you can test your document with:

ap = AcademicPaper(some_text)
if ap.identifier is ARXIV:
    do_something()

On the other hand, if the AcademicPaper just uses strings for the attributes, then make your enum based on strings and then you won’t need to access the .value:

ap = AcademicPaper(some_text):
if ap.identifier == ARXIV:
    do_something()

A string enum would look like:

class Identifier(StrEnum):
    DOI = auto()
    ARXIV = auto()

# put members in global namespace
DOI, ARXIV = Identifier

# quick equality test
assert DOI == 'doi'

Agreed.