Pattern matching optimization. Comparison of values specified through "|"

Without further ado, let’s take a look at an example:

import dis, timeit

a = 'lctrl'


def match_case_ors():
    match a:
        case 'ctrl' | 'lctrl' | 'rctrl': ...
        case _: ...


def if_contains():
    if a in {'ctrl', 'lctrl', 'rctrl'}: ...
    else: ...


dis.dis(match_case_ors)
print('-' * 84)
dis.dis(if_contains)

match_case = timeit.timeit(match_case_ors)
if_stmt = timeit.timeit(if_contains)

print('\n'
      f'{match_case = :g}\n'
      f'{if_stmt = :g}\n'
      f'speedup {1. - if_stmt / match_case:.1%}'
      )

We can see that if is ~15-30% (and higher) faster. And the more elements, the faster if works; while the pattern matching gets slower and slower.

The solution is simple. Instead of

  9          14 COPY                     1
             16 LOAD_CONST               1 ('ctrl')
             18 COMPARE_OP               2 (==)
             24 POP_JUMP_IF_FALSE        3 (to 32)
             26 POP_TOP

 10          28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

  9     >>   32 COPY                     1
             34 LOAD_CONST               2 ('lctrl')
             36 COMPARE_OP               2 (==)
             42 POP_JUMP_IF_FALSE        3 (to 50)
             44 POP_TOP

 10          46 LOAD_CONST               0 (None)
             48 RETURN_VALUE

  9     >>   50 COPY                     1
             52 LOAD_CONST               3 ('rctrl')
             54 COMPARE_OP               2 (==)
             60 POP_JUMP_IF_FALSE        3 (to 68)
             62 POP_TOP

 10          64 LOAD_CONST               0 (None)
             66 RETURN_VALUE

Generate bytecode like:

 16           2 LOAD_GLOBAL              0 (a)
             14 LOAD_CONST               1 (frozenset({'ctrl', 'lctrl', 'rctrl'}))
             16 CONTAINS_OP              0
             18 POP_JUMP_IF_FALSE        2 (to 24)

 17          20 LOAD_CONST               0 (None)
             22 RETURN_VALUE

In such cases, it is the use of a set or a frozen set that will give such an increase, otherwise it is not significant.

And since the behavior of the set in this case is predictable and does exactly what we need, it seems to me a good idea to implicitly create a set in such pattern matching.

If someone does not like this option, there is a less optimized, but quite obvious solution.
Instead of POP_JUMP_IF_FALSE, use the opposite POP_JUMP_IF_TRUE, just like lazy or does.

 16           2 LOAD_GLOBAL              0 (a)
             14 LOAD_CONST               1 ('ctrl')
             16 COMPARE_OP               2 (==)
             22 POP_JUMP_IF_TRUE        22 (to 68)
             24 LOAD_GLOBAL              0 (a)
             36 LOAD_CONST               2 ('lctrl')
             38 COMPARE_OP               2 (==)
             44 POP_JUMP_IF_TRUE        11 (to 68)
             46 LOAD_GLOBAL              0 (a)
             58 LOAD_CONST               3 ('rctrl')
             60 COMPARE_OP               2 (==)
             66 POP_JUMP_IF_FALSE        2 (to 72)

 18     >>   68 LOAD_CONST               0 (None)
             70 RETURN_VALUE

Otherwise, using if in such cases is more performant, but not as elegant as pattern matching.

This is my first thread, so I may have misunderstood something. Also, my native language is Russian, not English.

Thanks.

I haven’t got a ton of experience with pattern matching but from what I’ve seen people do with it, and then note that it’s slower, that it’s being used incorrectly… Or at the very least compared incorrectly.

I’m on mobile and cannot check but what’s the generated byte code for this

def if_any_equals():
     if any(a == x for x in ('ctrl', 'lctrl', 'rctrl')):
         ...
     else:
         ...

I think that’s a more valid comparison as you alluded to with your notes about using containment checks in a set

Is there a way to check, without a lot of overhead, that all elements of | are hashable to simply place them in a set and do a containment check for the match-case?

 15           0 RESUME                   0

 16           2 LOAD_GLOBAL              1 (NULL + any)
             14 LOAD_CONST               1 (<code object <genexpr> at 0x0000016F5FD45890, file "D:\Documents\Projects\PycharmProjects\newvertest\main.py", line 16>)
             16 MAKE_FUNCTION            0
             18 LOAD_CONST               2 (('ctrl', 'lctrl', 'rctrl'))
             20 GET_ITER
             22 CALL                     0
             32 CALL                     1
             42 POP_JUMP_IF_FALSE        2 (to 48)

 17          44 LOAD_CONST               0 (None)
             46 RETURN_VALUE

 19     >>   48 LOAD_CONST               0 (None)
             50 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x0000016F5FD45890, file "D:\Documents\Projects\PycharmProjects\newvertest\main.py", line 16>:
 16           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0
              6 LOAD_FAST                0 (.0)
        >>    8 FOR_ITER                15 (to 42)
             12 STORE_FAST               1 (x)
             14 LOAD_GLOBAL              0 (a)
             26 LOAD_FAST                1 (x)
             28 COMPARE_OP               2 (==)
             34 YIELD_VALUE              2
             36 RESUME                   1
             38 POP_TOP
             40 JUMP_BACKWARD           17 (to 8)
        >>   42 LOAD_CONST               0 (None)
             44 RETURN_VALUE

match_case = 0.0502488
if_stmt = 0.378934
speedup -654.1%

As you can see, this implementation slowed down the code by more than 6.5 times, it creates an additional generator object and a call to the built-in function, so this is definitely not what suits us, any is used in other cases (when you have for example a large array of values to be checked, this is definitely not the case).

By the way, you can. Using PyDroid or Termux (mini linux)
dis module for bytecode.

Ok so the generator adds significant overhead. Then the closest if statement equivalent might be

def if_else_eq_or():
    if a == 'ctrl' or a == 'lctrl' or a == 'rctrl':
        ...
    else:
        ...

Which has a lot less flexible than any() but it closer to the match case

Hi @nikdissv-Forever! Thanks taking the time to look into improving match performance.

This is definitely an interesting idea, and we have considered it. Unfortunately, it’s not equivalent to the current behavior: we currently don’t require hashability for matching against value patterns. Checking for containment in a set on the other hand, will raise a TypeError if the subject is unhashable, and also causes problems for subjects with custom __eq__ implementations. This is the same reason why we don’t turn if a in ('ctrl', 'lctrl', 'rctrl'): ... into a frozenset either.

It’s certainly possible to consider adding a “fast path” for subjects that are instances of “native” Python types like int or str. However, that adds more conditional branches, complexity, and code bloat, and might not result in a significant improvement for common cases (especially on newer Python versions, which already contain bytecode-level optimizations for things like conditional branches based on string equality, which your example features).

Just a note: the code you provide here isn’t equivalent either (and it’s probably slower). You’re reloading the global name a for each sub-pattern.

However the bytecode compiler’s decision to inline the return block in the original code is sort of interesting. I’d like to take a look into why the control flow is a bit unusual here.

If the set is replaced with a tuple, the performance improvement weakens (and becomes a bit chaotic), but is still there. This is true even when matching the last element provided, and when failing to match.

It would, however, be a small change in behaviour: instead of just doing an equality check, it would do an identity-or-equality check. Consider:

>>> class Const:
...     nan = float("nan")
... 
>>> match Const.nan:
...     case 'foo' | 'bar' | Const.nan: print("Gotcha")
...     case _: print("Nope")
... 
Nope
>>> Const.nan in ('foo', 'bar', Const.nan)
True

So if a fast path were added, it would have to exclude floats for this reason (unless it’s decided that that change is acceptable, but that’s not just a performance question any more). So I guess the question is: how common is it to match against multiple strings/ints, and would it be worth optimizing this case at the cost of complexity?

I’ll tell you specifically about myself. I needed this approach in one project, but due to performance, I preferred to write it using ifs.


I perfectly understood the problems of the method I proposed
Indeed, the check for the hashability of the object in this case will be superfluous and we most likely will not predict cases when this is the case, this is not optimal.

Later I will experiment with different cases, see which jumps the compiler generates.
So far, it seems to me that the match case does not use lazy optimization at all and always checks all values. Perhaps of course this is not always the case.


Nevertheless, this cannot be left without optimization, because the use of ifs where the matching pattern could be is not preferable.
Does anyone have any ideas?

I managed to get it to do it lazily. All it took was to add a couple of branches. To be fair, what the compiler generated works well this time.

def match_case_ors(key='alt'):  # I decided to transfer it to local variables, it did not give a visible effect.
    match key:
        case ('ctrl' | 'lctrl' | 'rctrl'):
            result = 'control'
        case ('alt' | 'lalt' | 'ralt'):
            result = 'alt'
        case _:
            result = 'default'
    return result

If we rewrite this in ifs:

def if_contains(key='ctrll'):
    # if key in {'ctrl', 'lctrl', 'rctrl'}:
    if key == 'ctrl' or key == 'lctrl' or key == 'rctrl':
        result = 'control'
    # elif key in {'alt', 'lalt', 'ralt'}:
    elif key == 'alt' or key == 'lalt' or key == 'ralt':
        result = 'alt'
    else:
        result = 'default'
    return result

It will be the fastest by 5-10%, this is not such a big difference at all to optimize something in a match case with this.


 # if_contains
 18           2 LOAD_FAST                0 (key)
              4 LOAD_CONST               1 ('ctrl')
              6 COMPARE_OP               2 (==)
             12 POP_JUMP_IF_TRUE        12 (to 38)
             14 LOAD_FAST                0 (key)
 ...
 ...
 19     >>   38 LOAD_CONST               4 ('control')
             40 STORE_FAST               1 (result)

 25          42 LOAD_FAST                1 (result)
             44 RETURN_VALUE
 ...

At the same time, the order of jumps in the match case is slightly different.

 ...
              6 COPY                     1
              8 LOAD_CONST               1 ('ctrl')
             10 COMPARE_OP               2 (==)
             16 POP_JUMP_IF_FALSE        1 (to 20)
             18 JUMP_FORWARD            16 (to 52)
        >>   20 COPY                     1
 ...

As you can see, the match does POP_JUMP_IF_FALSE, and if true then JUMP_FORWARD.
That is, it ends up jumping anyway, whereas if does one jump if true (instead of JUMP_FORWARD).

this gives it a slight difference in speed, as I said, this behavior is quite acceptable, but as you can see it is still not ideal. If someone finds time to fix it, then it’s good, if not, then it’s okay.