Built-in StrEnum

Many python libraries use magic string constants to represent enums. The fact that enums exist at all, in any language, is evidence of their superiority in these use cases: it’s more discoverable, better for type checking, easier to guarantee the interface, and easier to test. Python considers enums worthwhile, which is why they’re in the standard library, but in downstream libraries it’s much more common to see magic string constants. In the standard library itself, this pattern is also common, but in much of the core there is good reason for that (things which happen before the import machinery has kicked in, or where you want to avoid importing enum and its dependencies), so I’m happy to let that slide.

Downstream libraries, however, do not have a consistent upgrade path to go from magic strings to enums. Having to support both the new enum and the old string all the way through their library is a pain, and involves double the equality checks. For some libraries, there are so many code snippets available in so many places that you could never get rid of the magic string interface.

For wrapping underlying libraries which use integers for enums, there is the IntEnum and IntFlag: these come with a warning against their use because the integers don’t semantically mean anything, they’re just a convenient crutch for languages without good enum support. But they prove that the python standard library sees the value in wrapping unergonomic, undiscoverable constants in enums, where they’re being used to semantically represent enums anyway.

An Enum subclass which also subclasses str would provide one (“and preferably only one”) way forward, where new code and documentation could refer to the enum and old code/examples would still work. Compatibility would be maintained (even including where those magic strings are passed through to underlying libraries, based on my experience with rust/pyo3), and everyone benefits from the advantages of enums.

Such a strategy is already possible. Several downstream libraries include their own implementation of StrEnum. The code required is very small, which means many would prefer not to include another dependency for it: this means dozens of different implementations with slightly different features and inconsistencies. It also means many people wasting a lot of time and space. The code being so small means that very little API surface/ maintenance overhead would be added to the standard library.

Here is the example I tend to use (originally a fork of the unmaintained StrEnum):

import enum

class StrEnum(str, enum.Enum):
    def __new__(cls, *args):
        for arg in args:
            if not isinstance(arg, (str, enum.auto)):
                raise TypeError(
                    "Values of StrEnums must be strings: {} is a {}".format(
                        repr(arg), type(arg)
                    )
                )
        return super().__new__(cls, *args)

    def __str__(self):
        return self.value

    # The first argument to this function is documented to be the name of the
    # enum member, not `self`:
    # https://docs.python.org/3.6/library/enum.html#using-automatic-values
    def _generate_next_value_(name, *_):
        return name

I believe it to be more complete than StrEnum and fastapi_utils.enums.StrEnum, and more standard/ lower maintenance than AnyStrEnum.

tl;dr Python already accepts that enums are worthwhile, lots of people use magic string constants where they should use enums, a built-in StrEnum would provide a consistent and easy upgrade path where everyone benefits.

2 Likes

This is an interesting idea. Can you give an example where *args is more than a single str?

I inherited that from the StrEnum package - I think you’re right, it doesn’t need that. Strictly speaking, the whole __new__ override isn’t needed.

IntEnum silently casts the value to an integer, only throwing an error for a value which can’t be cast. This isn’t too surprising for the case of an integer, where people have e.g. numpy number types, float/int confusion, etc.: not to mention, if you create an IntEnum using a string, you probably meant that string as a number - basically, there’s a limited number of things which can be turned into integers, and most of them are pretty integer-like. On the other hand, everything can be turned into a string, even things which aren’t very string-y, so we might prefer to explicitly check that the passed arguments are strings (and throw the associated error) rather than doing any implicit casting.

Very good points.

FWIW, when I upgrade from magic constants to Enum, this is how I often do it to maintain backward compatibility:

# old code
FILE = "file"
FOLDER = "folder"


def foo(ft):
    if ft == FILE:
        print("it is a file")
    elif ft == FOLDER:
        print("it is a folder")
    else:
        raise TypeError(f"Invalid file type: {ft}")

# usage
foo(FILE)
foo(FOLDER)
foo("invalid")

Here’s the output:

it is a file
it is a folder
Traceback (most recent call last):
  File "foo.py", line 22, in <module>
    foo("invalid")
  File "foo.py", line 17, in foo
    raise TypeError(f"Invalid file type: {ft}")
TypeError: Invalid file type: invalid
# new code
class FileType(Enum):
    file = "file"
    folder = "folder"

FILE = FileType.file
FOLDER = FileType.folder

def foo(ft):
    ft = FileType(ft)
    if ft == FILE:
        print("it is a file")
    elif ft == FOLDER:
        print("it is a folder")
    else:
        raise TypeError(f"Invalid file type: {ft}")  # never reached anymore

# usage
foo(FILE)
foo(FOLDER)
foo("invalid")

Here’s the output:

it is a file
it is a folder
ValueError: 'invalid' is not a valid FileType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "foo.py", line 45, in <module>
    foo("invalid")
  File "foo.py", line 34, in foo
    ft = FileType(ft)
  File "C:\Users\Bruno\AppData\Local\Programs\Python\Python37\lib\enum.py", line 310, in __call__
    return cls.__new__(cls, value)
  File "C:\Users\Bruno\AppData\Local\Programs\Python\Python37\lib\enum.py", line 564, in __new__
    raise exc
  File "C:\Users\Bruno\AppData\Local\Programs\Python\Python37\lib\enum.py", line 548, in __new__
    result = cls._missing_(value)
  File "C:\Users\Bruno\AppData\Local\Programs\Python\Python37\lib\enum.py", line 577, in _missing_
    raise ValueError("%r is not a valid %s" % (value, cls.__name__))
ValueError: 'invalid' is not a valid FileType

Not sure if it covers all your use case though.

Cheers,

Yes, that’s one way of doing it. IMO it’s not ideal because it requires the cast to FileType everywhere that the enum is passed into a function, potentially adds to or changes the error types thrown by the function, and changes the types of possibly public constants.

The upgrade path isn’t impossible today, it would just be smoother and more consistent with a built-in StrEnum (in my opinion).

Definitely, just wanted to leave that example because I think it is the cleaner option from the POV of the user; for the developer not so much because as you say you need now to update every function that deals with the constants.

This of course does not invalidate your proposal, it is just another workaround. :+1: