Understanding Pattern Matching side effects

I’m in the process of trying to reimplement structural pattern matching for Cython. I’m just trying to understand some of the implications of the specification.

Let’s suppose we have a sequence pattern. As I understand it the implementation can look up the length once, and cache it:

  • “The length of the subject sequence is obtained using the builtin len() function (i.e., via the __len__ protocol). However, the interpreter may cache this value in a similar manner as described for value patterns.”
  • The description for value patterns “the interpreter may cache the first value found and reuse it, rather than repeat the same lookup. (To clarify, this cache is strictly tied to a given execution of a given match statement.)”. I’m noting that it says “match statement” rather than “case statement”

However, guards are explicitly allowed to have side effects.

Therefore, if we have code like this:

def f(x):
   match x:
     case [1, _] if (hasattr(x, "pop") and x.pop()):
       return 1
     case [1, _]:
       return 2
     case _:
       return 3

f([1, 0])

What do we consider an acceptable outcome? I can see three options:

  1. return 3 - the second case fails because the length is no longer 2. The is what CPython currently looks to do.
  2. IndexError on case 2. The interpreter uses the cached length and then fails to read the second element.
  3. Hard crash on case 2. The interpreter uses the cached length, and directly accesses the memory of the second element.

Option 3 (a hard crash or segfault) is never an acceptible option. That would be an interpreter bug.

Options 1 and 2 would be acceptible. You should not rely on either behaviour.

When you say that guard clauses are “explicitly” allowed to have side-effects, can you link to the specific documentation you are referring to please?

Options 1 and 2 would be acceptible. You should not rely on either behaviour.

I’m happy with that. I definitely wasn’t planning to rely on it myself

When you say that guard clauses are “explicitly” allowed to have side-effects, can you link to the specific documentation you are referring to please?

I’m mostly working from the specification PEP: PEP 634 – Structural Pattern Matching: Specification | peps.python.org

Since guards are expressions they are allowed to have side effects

FWIW I came to the conclusion that “return 2” would also probably be an acceptable outcome - it’d be OK to determine that the pattern matches before considering the guard on the previous case.