How to reduce Enum.Flag size?

Enum.Flag is very convenient and I used it successfully in a NumPy array, but I realized it took up too much space. I tried to create a Flag that inherits from np.uint8 to address that concern but I came across other troubles. Let’s see it with an example.

from enum import auto, Flag
import numpy as np

Let’s start with an ordinary Flag:

class Color(Flag):
    RED = auto()
    GREEN = auto()
    BLUE = auto()
color_array = np.array([Color.RED, Color.BLUE])
color_array
array([<Color.RED: 1>, <Color.BLUE: 4>], dtype=object)

The size of one item:

color_array.itemsize
8

The size of the whole array:

color_array.nbytes
16

Now with a flag that inherit from np.uint8:

class ColorNp(np.uint8, Flag):
    RED = auto()
    GREEN = auto()
    BLUE = auto()
colornp_array = np.array([ColorNp.RED, ColorNp.BLUE])
colornp_array
array([1, 4], dtype=uint8)

The size of one item:

colornp_array.itemsize
1

The size of the whole array:

colornp_array.nbytes
2

The size of the array has been greatly reduced! But now, the array is truly an array of np.uint8 and I lost the convenient representation of Flag. Let’s try to fake it:

np.array2string(colornp_array, formatter={"int_kind": lambda x: repr(ColorNp(x))}, separator=", ")
'[<ColorNp.RED: 1>, <ColorNp.BLUE: 4>]'

This simple example works but if I try to combine colors, it fails:

colornp_array2 = np.array([ColorNp.RED, ColorNp.BLUE, ColorNp.RED|ColorNp.BLUE])
colornp_array2
array([1, 4, 5], dtype=uint8)
np.array2string(colornp_array2, formatter={"int_kind": lambda x: repr(ColorNp(x))}, separator=", ")
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

ValueError: 5 is not a valid ColorNp


During handling of the above exception, another exception occurred:


TypeError                                 Traceback (most recent call last)

Cell In[12], line 1
----> 1 np.array2string(colornp_array2, formatter={"int_kind": lambda x: repr(ColorNp(x))}, separator=", ")


File <__array_function__ internals>:200, in array2string(*args, **kwargs)


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:736, in array2string(a, max_line_width, precision, suppress_small, separator, prefix, style, formatter, threshold, edgeitems, sign, floatmode, suffix, legacy)
    733 if a.size == 0:
    734     return "[]"
--> 736 return _array2string(a, options, separator, prefix)


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:513, in _recursive_guard.<locals>.decorating_function.<locals>.wrapper(self, *args, **kwargs)
    511 repr_running.add(key)
    512 try:
--> 513     return f(self, *args, **kwargs)
    514 finally:
    515     repr_running.discard(key)


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:546, in _array2string(a, options, separator, prefix)
    543 # skip over array(
    544 next_line_prefix += " "*len(prefix)
--> 546 lst = _formatArray(a, format_function, options['linewidth'],
    547                    next_line_prefix, separator, options['edgeitems'],
    548                    summary_insert, options['legacy'])
    549 return lst


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:889, in _formatArray(a, format_function, line_width, next_line_prefix, separator, edge_items, summary_insert, legacy)
    885     return s
    887 try:
    888     # invoke the recursive part with an initial index and prefix
--> 889     return recurser(index=(),
    890                     hanging_indent=next_line_prefix,
    891                     curr_width=line_width)
    892 finally:
    893     # recursive closures have a cyclic reference to themselves, which
    894     # requires gc to collect (gh-10620). To avoid this problem, for
    895     # performance and PyPy friendliness, we break the cycle:
    896     recurser = None


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:853, in _formatArray.<locals>.recurser(index, hanging_indent, curr_width)
    850 if legacy <= 113:
    851     # width of the separator is not considered on 1.13
    852     elem_width = curr_width
--> 853 word = recurser(index + (-1,), next_hanging_indent, next_width)
    854 s, line = _extendLine_pretty(
    855     s, line, word, elem_width, hanging_indent, legacy)
    857 s += line


File ~/Projects/QTIS/Pandora/pandora2d-working/venv/lib/python3.8/site-packages/numpy/core/arrayprint.py:799, in _formatArray.<locals>.recurser(index, hanging_indent, curr_width)
    796 axes_left = a.ndim - axis
    798 if axes_left == 0:
--> 799     return format_function(a[index])
    801 # when recursing, add a space to align with the [ added, and reduce the
    802 # length of the line by 1
    803 next_hanging_indent = hanging_indent + ' '


Cell In[12], line 1, in <lambda>(x)
----> 1 np.array2string(colornp_array2, formatter={"int_kind": lambda x: repr(ColorNp(x))}, separator=", ")


File ~/.local/share/mise/installs/python/3.8.18/lib/python3.8/enum.py:339, in EnumMeta.__call__(cls, value, names, module, qualname, type, start)
    314 """
    315 Either returns an existing member, or creates a new enum class.
    316 
   (...)
    336 `type`, if set, will be mixed in as the first base class.
    337 """
    338 if names is None:  # simple value lookup
--> 339     return cls.__new__(cls, value)
    340 # otherwise, functional API: we're creating a new Enum type
    341 return cls._create_(
    342         value,
    343         names,
   (...)
    347         start=start,
    348         )


File ~/.local/share/mise/installs/python/3.8.18/lib/python3.8/enum.py:670, in Enum.__new__(cls, value)
    665             exc = TypeError(
    666                     'error in %s._missing_: returned %r instead of None or a valid member'
    667                     % (cls.__name__, result)
    668                     )
    669         exc.__context__ = ve_exc
--> 670         raise exc
    671 finally:
    672     # ensure all variables that could hold an exception are destroyed
    673     exc = None


File ~/.local/share/mise/installs/python/3.8.18/lib/python3.8/enum.py:653, in Enum.__new__(cls, value)
    651 try:
    652     exc = None
--> 653     result = cls._missing_(value)
    654 except Exception as e:
    655     exc = e


File ~/.local/share/mise/installs/python/3.8.18/lib/python3.8/enum.py:798, in Flag._missing_(cls, value)
    796 if value < 0:
    797     value = ~value
--> 798 possible_member = cls._create_pseudo_member_(value)
    799 if original_value < 0:
    800     possible_member = ~possible_member


File ~/.local/share/mise/installs/python/3.8.18/lib/python3.8/enum.py:815, in Flag._create_pseudo_member_(cls, value)
    813     raise ValueError("%r is not a valid %s" % (value, cls.__name__))
    814 # construct a singleton enum pseudo-member
--> 815 pseudo_member = object.__new__(cls)
    816 pseudo_member._name_ = None
    817 pseudo_member._value_ = value


TypeError: object.__new__(ColorNp) is not safe, use numpy.uint8.__new__()

Indeed, this is the same error as if I do ColorNp(5).

But with a classical Flag it works:

Color(5)
<Color.BLUE|RED: 5>

Can someone explains why there is a TypeError and how can I solve it?

How do you expect ColorNp(ColorNp.RED|ColorNp.BLUE) to behave?

The Enum being worked with doesn’t have have an option called Purple.

They’re bit flags.
ColorNp.RED|ColorNp.BLUE = 0b0101.

Things go wrong because in the class with numpy ints mixed in, '_flag_mask_': 7, '_singles_mask_': 7, '_all_bits_': 7, are all set to 0 rather than 7, so the program concluded that 5 is “too big”.

You get similar errors with Color(16)

I don’t know if its fixable.

1 Like

I expect it to behave like regular flag:

Color(Color.RED|Color.BLUE)

gives:

<Color.BLUE|RED: 5>

So I expect to get:

<ColorNp.BLUE|RED: 5>

Oh I see. That’s pretty smart, in the docs it looked to me like they combined to lists of enum members. I didn’t realise | multiple flags were also members.

Anyway, it should be still be possible to bump up the number of bits to however many you need, and map those to the combination of enum flags (and vice versa). The code’s just not quite as elegant.

I don’t know what are _flag_mask_ etc in your answer, could you elaborate?

Color(16) does not gives similar error: it just says that 16 is not a valid Color.
Indeed, the maximum value I can get is 7:

<Color.RED: 1>
<Color.GREEN: 2>
<Color.GREEN|RED: 3>
<Color.BLUE: 4>
<Color.BLUE|RED: 5>
<Color.BLUE|GREEN: 6>
<Color.BLUE|GREEN|RED: 7>

There might be some dark magic possible by writing your own __new__ but I can’t figure it out at the moment.

A workaround is to use bare np.uint8 in the arrays and convert to a normal Flag class whenever you need a nice representation–you will have the space savings when you’re storing a large number of values, at the cost of conversions later. You don’t get any nice enum-like behavior when working with the array, of course.

Hello,

have you considered that you may also have to take into account the total memory usage which includes the class itselff? As an alternative, have you considered using slots to lower the total memory requirements?

The following script compares your original class and an alternative slots based class. This shows that the class flag attribute size by itself is not the only object that determines the total memory footprint.


import sys
import numpy as np

class Flags:

    __slots__ = 'GREEN', 'BLUE', 'RED'

    def __init__(self):

        self.GREEN = 0
        self.BLUE = 0
        self.RED = 0

class ColorNp(np.uint8, Flag):

    RED = auto()
    GREEN = auto()
    BLUE = auto()

# Using original class using 'auto' and 'Flag'
print(f'\nSize of ColorNp class: {sys.getsizeof(ColorNp)} bytes')
colornp_array = np.array([ColorNp.RED, ColorNp.BLUE])
print(f'Size of colornp_array.nbytes: {colornp_array.nbytes}')
print(f'Size of colornp_array.itemsize: {colornp_array.itemsize}')

# Using slots
print('\nSize of Flags (slotted) class: ', sys.getsizeof(Flags()))
Color = Flags()
print(f'Size of Color instance: {sys.getsizeof(Color)} bytes')
color_array = np.array([Color.RED, Color.BLUE])
print(f'Size of colornp_array.nbytes: {colornp_array.nbytes} bytes')
print(f'Size of color_array.itemsize: {color_array.itemsize}')


Color.BLUE = 1
Color.GREEN = 0
Color.RED = 1

# Use flags via slots
print(Color.BLUE, Color.GREEN, Color.RED)

After running the script, I got the following output:

Original class used for flags:
Size of ColorNp class: 1712 bytes
Size of colornp_array.nbytes: 2
Size of colornp_array: 1

Size of Flags (slotted) class:  56
Size of Color instance (slotted): 56 bytes
Size of color_array.nbytes: 16 bytes
Size of color_array.itemsize: 8
1 0 1

Note the difference between memory footprints between the original class and the slots based class.

Okay, now compare the size for an array of 10, 100, … 1000000 elements of each type. Does the size of the class still matter?

I am basing my feedback on the OPs original post where he stated that he had three flags and not: “10, 100, … 1000000 elements of each type.

fyi …

Using slots makes sense if using less than 76 flags.

Equation for original class:

Total bytes = 1712 + 3 * x 

Equation for slots based class:

Total bytes = 112 + 24 * x  

Above this value, then original class is preferred or some other alternative.

Yes, three flags. But the array of values can be much larger than that! It can contain many instances of each flag (or combinations).

OP I think you’re mixing things that shouldn’t be mixed.

When you’re putting that color enum into the numpy array, numpy is storing pointers to the enum members.

What’s the use case for wanting to store the enums in a numpy array? If you really need element wise bit operations numpy has functions for that. You just are not going to get the pretty repr from the enum or the other benefits.

https://numpy.org/doc/2.1/reference/routines.bitwise.html

When I go to numpy for the speed/memory reasons I try to forgo using python objects in them all together.

With my NumPy maintainer hat on:

What’s really missing in NumPy is first-class support for categorical data, enums are a natural extension of that concept. Here a categorical is a dtype whose values are one of a fixed, limited set of possible values. Pandas has categoricals, bur they’re done “on top” of NumPy and IMO NumPy should have first-class categorical support.

It’s actually less hard right now than ever to make first-class support for categoricals work in NumPy thanks to the new dtype system in NumPy 2.0: Array API — NumPy v2.2 Manual

That said, I’m not aware of anyone working on categoricals at the moment, and for most things you can just use integer arrays and get the same effect with a little more boilerplate and no error checking for invalid values.

3 Likes

You can inspect the attributes of a (non-slot) class or object as

class C: pass

o = C()

print(o.__dict__)
print(C.__dict__)

when I inspected the dicts of your Color and ColorNp classes, the former dict contained the terms
'_flag_mask_': 7, '_singles_mask_': 7, '_all_bits_': 7, whilst the latter contained the terms '_flag_mask_': 0, '_singles_mask_': 0, '_all_bits_': 0,

It seems likely something went wrong in the init because numpy.uint8 and Flag share a method name, or because numpy.uint8 does things in C, but I’m not an expert on those things.

You could try manually setting the attributes correctly as

class ColorNp(np.uint8, Flag):
    RED = auto()
    GREEN = auto()
    BLUE = auto()
ColorNp._flag_mask_ = 0b0111
ColorNp._singles_mask_= 0b0111
ColorNp._all_bits_= 0b0111

but that’s not guaranteed to work by any means.


What I’d probably do for your situation is to create a mapping from uint8 to string and a mapping from uint8 to Color, and just use a int8 array. This can be as simple as

color_repr = [repr(Color(x)) for x in range(8)]

np.array2string(int8_array, formatter={"int_kind": lambda x: color_repr[x], separator=", ")