Here’s an interesting rexexp that came up recently:
r"\d+\s+"
What’s the big deal? Run it with .match() and it returns “almost instantly” even if the target string doesn’t match.
Bur run it with .search() on a string like "5" * N (which can’t suicceed) , and it takes time quadratic in N to fail. But N has to be in the thousands before this becomes very noticeable.
I don’t believe I’ve ever seen a discussion of this kind of failure mode. I’ll leave it to you to figure out why it happens ![]()
This is not a case of “catastrophic backtracking” (which consumes time exponential in N to fail to match), it’s just a consequence of how .search() works. There appears to be nothing you can do to the regexp to make it fast in all cases. Using possessive \d++ instead does speed it quite a bit, but it’s still quadratic time.
Also true under the very capable regex extension module, which is immune to many ways to try to provoke exponential time behavior. It’s no faster in this case than the core’s re module.
Personally, I almost always use “match” instead of “search”. But then I don’t use regexps to try to do “too much” at a time. I use it more like a flexible lexer, to pick off “the next” token in an input string, typically passing a “start index” argument too to a compiled pattern.
Deprecating match would just annoy people like me a lot ![]()