Add Structural Pattern Matching to Exception objects

By default, the BaseException object in Python defines an args attribute which is assigned the tuple of the arguments given to Exception objects at instantiation:

>>> err = ValueError("error msg")
>>> err.args
('error msg', )

It doesn’t seem unreasonable to be able to pattern match against that attribute using a match statement. This currently isn’t an option, as Exceptions (to my knowledge) don’t define a __match_args__ class attribute for the underlying infrastructure to hook into.

I’d like to propose adding a __match_args__ = ("args", ) class attribute to the BaseException object, to allow for structural pattern matching of Exception objects.

There are probably multiple ways to do this, but as a shoddy proof-of-concept I went ahead and cloned the cpython repo and added a getter and a setter to the BaseException_getset table under the name __match_args__, where the getter returns an ("args", ) tuple and setter returns -1. Then in ceval.c under the match_class function, there’s a hook that calls the getter function if the __match_args__ attribute is a getset_descriptor object. That function call replaces the match_args value with the returned tuple, and everything there on "just works"™.

The result allows you to use structural pattern matching on returned error objects:

# ./script.py

def fn(x):
    if x > 10:
        return ValueError("x is way too big", 10)
    if x > 5:
        return ValueError("x is too big", 5)
    return x

match fn(11):
    case ValueError((msg, 10)):
        print(f"error 1: {msg}")
    case ValueError((msg, 5)):
        print(f"error 2: {msg}")
    case _:
        print("PASS")
root@7556897c20fe:/code# ./python script.py
error 1: x is way too big

Wondering if there is any interest in adding this type of functionality to the language. Errors as values is a concept well received in other languages such as Rust and Go; it has the potential to benefit static type analysis; and adding __match_args__ functionality to Exceptions would allow developers to elegantly integrate this paradigm into python mechanisms for structural pattern matching.

Repo:

Short description and image:

1 Like

Did some more hacking* in the CPython interpreter’s match_class function, it’s also possible to rig it so that you don’t require that encompassing tuple, and case instantiation better matches object creation:

commit:

results:

def fn(x):
    if x > 10:
        return ValueError("x is way too big", 10)
    if x > 5:
        return ValueError("x is too big", 5)
    return x

match fn(6):
    case ValueError(msg, 10):
        print(f"error 1: {msg}")
    case ValueError(msg, 5):
        print(f"error 2: {msg}")
    case _:
        print("PASS")
root@7556897c20fe:/code# ./python script.py
error 2: x is too big

Another option might be to add a MATCH_EXCEPTION handler like there is for MATCH_CLASS, but that’s outside my abilities at the present and would need some guidance on implementing.

*emphasis on hacking, probably leaking memory and there are considerations to be made regarding subclasses overriding a default __match_args__ statement and how that would all work

Wouldn’t this be good just for try-execpt? You have one block for certain exception types then do a match case based on the message and arguments? I guess you could just access the message and do a literal match on that if you really wanted to

Yes the standard paradigm of try/except is an option, however there are some benefits to not using a two-track exception system; mainly that errors are handled locally and aren’t unbound to the context of the stack frame that is calling the function which is reporting an error, and the responsibility of handling the error is explicit in the function signature.

consider the following:

def fn(x: int) -> int:
    if not x:
        raise ValueError("value falsy")
    return x + 1

if __name__ == "__main__":
    try:
        print(fn(1))
    except ValueError as err:
        print("handle error: {err.args[0]}")

This is the best you can do in this instance with the current state of static type analysis, as mypy does not introspect the potential for throwing an error, because it’s non-trivial to do, especially as you increase the complexity of the call stack.

However if you make it easier to localize error handling, you should be able to write your code in a type-safe manner.

def fn(x: int) -> int|ValueError:
    if not x:
        return ValueError("value falsy")
    return x + 1

if __name__ == "__main__":
    match fn(1):
        case int() as x:
            print(x)
        case ValueError(msg):
            print(f"handle error: {msg}")

And to note, while added type-safety is a main benefit, you also get the benefit of controlling how errors are localized, instead of having to worry about catching deeply nested errors which will propagate up the call stack without regard.

Is there anything stopping you from subclassing ValueError and creating a new, unique exception type for this? That’s what exception hierarchies shine at.

If you can’t because you don’t control the code that raises the exception, file a bug report against the library that’s raising it.

The problem with matching on exception text is that it is inherently fragile. ANY change to that text is now a backward compatibility problem - even if it would otherwise be a vast improvement. It also makes localization impossible, although I’m aware that that’s not a goal for many projects.

Sure you can, this is perfectly valid as of 3.10+:

class MyValueError(ValueError):
    __match_args__ = ("args", )

def fn(x: int) -> int|MyValueError:
    if not x:
        return MyValueError("x is falsy")
    return x + 1

match fn(0):
    case MyValueError((msg, )):
        print("handle error: {msg}")
    case int(x):
        print(x)

IMO it’s just not elegant. If you were to add proper SPM hooks for exceptions and it would be a lot cleaner. Which isn’t really hard because the .args attribute that you would SPM against is defined in the BaseException class. Furthermore there would be no need to subclass all needed exceptions and you can also eliminate the tuple that is currently required.

I’m not sure I follow in terms of the fragility argument? it’s just a variable assignment to a string type. Its fragility is as fragile as changing any other return value for a function. I don’t think it’s any more fragile than the above example, which is valid now.

That completely misses the benefits of subclassing though :slight_smile: Try this instead:

class FalsyValueError(ValueError): pass

def fn(x: int) -> int:
    if not x:
        raise FalsyValueError("x is falsy")
    return x + 1

if __name__ == "__main__":
    try:
        print(fn(1))
    except FalsyValueError as err:
        print("handle error: {err.args[0]}")```

This is what Python’s exception handling shines at. You create types, not string values, to carry the information needed to decide whether to handle the error or not. At very worst, use machine-readable codes:

try: open(".......")
except OSError as e:
    if e.errno == ???: ...

rather than matching on precise strings. (Note that the most common OS errors have their own subclasses, like FileNotFoundError, IsADirectoryError, PermissionError, etc; but for all others, you can use the errno to distinguish them.)

If the string description for the error message changes - which has happened many times with Python’s core exceptions - your code is broken.

1 Like

Structural pattern matching of exceptions would simplify a lot of exception usage if it could be combined with the except statement:

try:
    subprocess.run([myapp], check=True)
except case subprocess.CalledProcessError(returncode=(2 | 3) as ret):
    logging.warning(f"Data processing failed due to {EXIT_CODES[ret]}")
except subprocess.CalledProcessError:
    logging.error("Unexpected error in data processing")
6 Likes

I don’t think he’s talking about matching the text, but to add the capability to match that an arg exists and to ve able to “extract” it using pattern matching?

1 Like

ISTM that you are trying to emulate Rust-style error handling in Python. Maybe you should take a look at the result package? I know it doesn’t help with your request butmaybe it’ll be useful to you (I’ve never used it myself, although I’m perfectly happy with how error handling works in Python)

1 Like

right, it’s about the distinction of return-ing a value instead of raise-ing it. Instead of creating a value that gets placed in a global store outside the call stack, you keep the error within the call stack so as to better control error flow. IMO you get most, if not all, of the benefits here just by streamlining how you would match against an Exception objects (I don’t think you necessarily need to wrap the successful case in an Option or a Result or an Ok structure).

I was playing around with the CPython interpreter a little more over the weekend, another option here would be to create a separate control flow in the match_class function in ceval.c that unpacks the arguments of a match case into the list used for SPM, doing that you wouldn’t need to set a __match_args__ variable in the base class, you’d just need to have a separate control flow for Exception objects that assume that the args attr that gets packed on object construction is the list of positional arguments that you match against.

I’ll be honest, the idea of returning exception values rather than raising them sounds very odd. Unless you’re trying to transplant some other language’s error handling model (such as Rust’s, which is what this proposal sounds like) then I don’t see why you’d be doing that.

And if you’re proposing that Python adds support in its exception classes for a model of exception handling that isn’t how Python does it, then I think you’re going to have a really hard time persuading anyone it’s a good idea…

1 Like

I understand that it’s not the way that people are normally taught w/r/t how exceptions are handled in Python, but even in languages like C++ you see major projects, like Apache Arrow, forgoing the try/catch syntax over returning status objects and using wrappers to emulate errors as values because they’ve decided that it’s a better way to handle error propagation in their code base.

IMO you get most of the way there simply by making it so the match statement can catch Exception objects in the way they’re typically constructed.

match ValueError("message", 1):
    case ValueError(str as msg, 1):
        <handle error>

this feels very pythonic, and its only a small change to how the match_class function works in ceval.c

Not knowing anything about how Apache Arrow does it, I would assume that it’s a better thing to do in C++ because it’s statically typed, which means the compiler can do a bunch of optimization that would not be possible in Python (assuming you do it the Rust way with a Result type). Exception handling in Python is highly specialized to have little overhead in the ‘good’ case and I would wager that such optimizations are currently impossible to do with return-style exceptions.

As with any question of optimization, it’s impossible to know without measuring. But it’s worth noting that exception handling MAY be implemented with nonlocal goto, which completely eliminates the cost of not throwing an exception; but the return values must still be checked.

However, in my opinion, performance is the least significant factor when looking at exceptions and return values. It’s much more about whether the programmer (a) might forget to check something, or (b) is forced to load in unnecessary boilerplate.

2 Likes

I mean, it creates an Exception instance and sticks it in the PyThreadState struct and functions return NULL up the stack until it’s handled somewhere. If it’s just a question of shoving it in the global PyThreadState store vs making it the return value, I doubt there’s a perf. difference.

I actually think this is the best of both worlds because you only need to handle returned error values if the function actually returns it. See:

def fn(x: int) -> int|ValueError

you know exactly what to handle just by looking at the function signature, esp if you also know it’s not going to raise an exception from within its nested call stack. And if there’s no error value to return, you don’t need to change how you call the fn, so there’s no boiler plate required because we just return values instead of some Result(value) or Option(value). And with errors as values, you don’t worry as much about forgetting to check whether a function call and it’s subroutines throw an Exception that you need to handle, esp since you can’t use something like mypy to assist in finding those.

What I meant with ‘highly specialized’ is that a try-except block has very little overhead in the non-throwing case, due to specializations that I assume are specific for this style of exception raising. I don’t know how fast a match-case exception mechanism could be (and probably shouldn’t speculate too hard) but I doubt a general-purpose tool (match-case) can be as fast a specialized tool (try-except).

Also, I think your suggestion would not make type checkers happy. Returning union-types is very annoying to deal with as a caller, as you are now forced to check the return type to make the type checker happy. Currently, type checker do not enforce using try-except for functions that could throw, which means you can just ignore exceptions from a typing perspective.

But now, any time you call fn(n), either you have to put a check to see if it returned ValueError, or you risk not noticing the error. That’s what I mean. You have this completely unnecessary boilerplate, or you risk silently ignoring errors.

Yes you’d need a subsequent match fn(n) if your return value is a union, which is the root of this discussion.

I doubt you’d risk silently ignoring errors, because you’d have to actively ignore the function signature, and even then if you just haphazardly continue execution of val = fn(0) assuming its an int instead of a ValueError object, you’re going to throw other errors real quick when that duck doesn’t quack.

In the try/except paradigm, you’re exposed to the same type of risk, except the magnitude is much greater as you’re assuming the risk of the entire call stack under a function call, instead of the direct result of a function call. Errors as values mitigates this risk.