`re.match()`: raise exception if string doesn't match?

Is there any sense in providing a version of re.match() that raises an exception, rather than returning None, when the string doesn’t match the pattern? It’s one of the few places where I find myself repeatedly writing if match is None: rather than except ValueError: or similar. Thanks!

1 Like

Is there any use-case where such an API would be cleaner? It seems to me that in many applications, matches and mismatches are equally possible, so the if-test feels cleaner to me than the try/except.

1 Like

I’m guessing the use case would be a situation where you’re already sure that the input matches the pattern, and you just want to extract one or more substrings decided by the pattern; or where if the input doesn’t match you want to reject the input (like the OP suggested).

What makes the API design a bit problematic is that we’d have to fix which exception to raise (ValueError? Something specific to the re module?) or we’d have to specify the exception explicitly in the call, which would make it pretty cumbersome.

If you feel this need regularly you should probably just break down and put a little helper in your personal toolbox.

I did a fair amount of text file processing (think log file scanning) in my career (which is now in the rear view mirror). I think regular expression matching is more akin to regular string searching than to, say, trapping division by zero. I wouldn’t want to see an exception raised every time a line in a log file didn’t contain the string “WARNING”. Similarly, failure to locate a particular regular expression pattern in a line isn’t much different than that. In modern Python parlance, I think you’d use the walrus operator to capture the re.match result:

if match := re.match(...) is not None:
    process match ...
else:
    maybe some non-matching logic ...

(Apologies for mistakes above. The walrus is new to me, I’m tapping this on my phone and don’t have my usual Emacs syntax highlighting crutch or a REPL handy.)

1 Like

A lot of my Python usage tends to be noddy log file parsers, scripts involving shutil, etc. The sort of things you might use Perl for in years gone by.

My thinking here is that Python generally raises an exception when an operation fails, whereas other languages (Lua, JS, Groovy) return null / nil / whatever.

E.g. for finding an item in a list there’s list.index(), which raises ValueError if the item can’t be found, rather than returning None or some sentinel value. str objects have both index() and find() – the latter returns -1 on failure – but I usually see str.index() recommended to beginners to get them into the idea of using exceptions for control flow, and because there’s no equivalent list.find() method.

But with re.match I often just want an exception – any exception – to be triggered when a match fails. Without a None check, that exception is usually AttributeError: NoneType has no attribute 'groups', which is more cryptic than necessary.

If I had to guess at an exception type:

# re.py

class MatchError(ValueError):
    pass

(the string value passed to match() is the “value” referred to by ValueError)

No need to provide a new version of re.match() and other functions. It is easy to make the code raising an exception.

m = re.match(...) or notmatch()

where notmatch() is defined as:

def notmatch():
    raise ValueError

It is literally a one-line function. Depending on your application it can raise different types of exceptions with specific messages, take arguments, etc. Since it is so simple and application specific, it is not worth to add such function in the stdlib either.

4 Likes

To me that seems perfectly natural in C, but not in Python. In Python I don’t normally need to check the return value to see if an operation succeeded.

2 Likes

I get your logic but why not write your own wrapper

def rematch(regex, string)
    matched = re.match(regex, string)
    if not matched:
        raise MatchError(regex, string)
    return matched

Of course it could also be added as kwargs

re.match(myregex, mystring, throw=True)