Add "re.prefixmatch()", deprecate "re.match()"

gh-user-2022 · January 29, 2026, 1:27pm

I’ve spent, and continue to spend, time debugging bugs that are fixed by replacing re.match() → re.search(), over and over again.

It’s one of old confusing quirks for which there’s an (more or less) obvious solution.

Surprisingly, I couldn’t find any proposal with deprecation of re.match(). So, I suggest we discuss it.

Also, perhaps we may want to add re.test() that returns bool, for the case when we don’t need a match object.

gh-user-2022 · January 29, 2026, 1:33pm

And another possible source of bugs is when re.match()was used while re.fullmatch()was intended. However, I haven’t encountered this in practice.

Rosuav · January 29, 2026, 1:44pm

To convince people that re.match() should be deprecated, you have to show that the cost of deprecation and replacement is lower than the cost of maintaining the status quo. So my first question to you is: Do you know how extensive the costs are of a change like this? Before even discussing which cost is greater, be sure you actually understand what you’re asking for here. Is there a planned removal for re.match()? If so, how soon? If not, why not?

Have you observed other deprecations and how they have been accepted?

Then, make your case. What are the costs of the status quo? How frequently do you run into this problem yourself? How much code out there is likely to be buggy? Is that code part of maintained software that is likely to be fixed if the deprecation goes through, but isn’t otherwise going to be changed? What are all the costs associated with NOT deprecating this?

You’re going to need to do a lot of work here, a lot of research… but on the plus side, this is a proposal relating to regular expressions, and most of your research is going to be doing regular expression searches through large corpuses of code (eg searching GitHub)

barry-scott · January 29, 2026, 1:59pm

Rather then deprecate match and search why not propose easier to remember names as aliases?

For “match” alias as “startswith”.
For “search” alais as “contains” (not sure this is best choice).

gh-user-2022 · January 29, 2026, 2:15pm

A soft deprecation is obvious choice.

I almost never see a re.match()where the author of code indeed intended to match the prefix. It can often be seen re.match(r'^ ...')with anchor when one needs to match the prefix, because almost nobody who use it knows the secret that it matches the prefix. And if there was no anchor, it eventually gets debugged and replaced with re.search().

And the clear indication that re.match() wasn’t really intended are the regex patterns like r'\b ...' or r'(?<! ... ) ...'. Possibly such code can be found in GitHub, including the commit history. Sure, if the idea gets traction to the point of drafting PEP, somebody may be inclined to gather the statistics to prove the need for deprecation.

oscarbenjamin · January 29, 2026, 2:24pm

I assume you just mean by this that the docs would suggest not to use it. I don’t think that is necessary but perhaps the docs could be a bit clearer about when to use or not use it. The docs already explain the difference though and there is even a search-vs-match section. I don’t generally use regex but I’m pretty sure in any situation where I might I would be much more likely to want fullmatch or search rather than match but that seems pretty clear from reading the docs.

What exactly would you propose to change about the docs? A PEP is not needed just to add some clarifying text to the docs.

Rosuav · January 29, 2026, 2:31pm

My point is that it isn’t obvious. You may think it’s obvious based on your personal experience, but that’s an argument that has to be made. You cannot assume that we also already agree with you.

But let’s suppose that soft deprecation is all that happens. In other words, there is no date at which the existing API is to be removed. All you’re doing is putting a note in the docs saying “use this alias instead”. Okay. So, suppose you’re developing some software. You have a choice: use re.match(), which will work on all existing versions of Python and all planned future versions as well, or use re.search() with an anchored regex, which will also work on all existing and all planned versions, but will be less efficient. Which do you choose? Does the deprecation make any difference here?

You are, of course, free to replace all uses of re.match in your own code with re.search. That’s fine. Nothing wrong with it. But the deprecation won’t actually add anything to that argument, unless you can show that there is real benefit to be gained here.

Liike I said, you’re going to need to do some research here. “I almost never see” isn’t enough of an argument. How many cases do you find of this on GitHub (or some other large corpus of code)? How many major projects have this happening?

No, the time to get those statistics is now. You won’t get traction for any further steps otherwise. And “somebody may be inclined to”? Are you asking someone else to do the work for you? If so, go change this in your own codebase only, and don’t ask for deprecation. If you want to push for a language change, you have to be prepared to do your own research.

bwoodsend · January 29, 2026, 10:39pm

There are quite a dissapointing number or re.match("^...")s out there. 418k out of 926k uses of re.match() directly on a literal.

But these renames never pay off the cost changing everything. Even if re.match is only soft deprecated, there will still be linters and IDEs and drive-by PRs pushing people to change code that has nothing wrong with it.

tim.one · January 29, 2026, 11:10pm

Here’s an interesting rexexp that came up recently:

r"\d+\s+"

What’s the big deal? Run it with .match() and it returns “almost instantly” even if the target string doesn’t match.

Bur run it with .search() on a string like "5" * N (which can’t suicceed) , and it takes time quadratic in N to fail. But N has to be in the thousands before this becomes very noticeable.

I don’t believe I’ve ever seen a discussion of this kind of failure mode. I’ll leave it to you to figure out why it happens

This is not a case of “catastrophic backtracking” (which consumes time exponential in N to fail to match), it’s just a consequence of how .search() works. There appears to be nothing you can do to the regexp to make it fast in all cases. Using possessive \d++ instead does speed it quite a bit, but it’s still quadratic time.

Also true under the very capable regex extension module, which is immune to many ways to try to provoke exponential time behavior. It’s no faster in this case than the core’s re module.

Personally, I almost always use “match” instead of “search”. But then I don’t use regexps to try to do “too much” at a time. I use it more like a flexible lexer, to pick off “the next” token in an input string, typically passing a “start index” argument too to a compiled pattern.

Deprecating match would just annoy people like me a lot

adelfino · January 29, 2026, 11:10pm

Could be confusing since str.startswith returns a bool.

gh-user-2022 · January 30, 2026, 7:21am

Yes, there are at least 2 sources of quadratic slowdown.

This is a matter of optimization of the RegEx engines.

Then again, when one needs to do something that is “too much“ for RegEx, they resort to other tools. So nobody is inclined to invest heavily in optimization of the general-purpose RegEx engines in scripting languages.

re.prefixmatch() vs re.match() is a matter of clarity and bug avoidance, and naturally such cases may involve the performance vs correctness tradeoff.

Soft deprecation may be without warning, both in interpreter itself and in linters. It could be understood as avoid using in new code.

Rosuav · January 30, 2026, 7:23am

But WHY avoid using it? You’re proposing creating a new API that won’t work on any older version of Python, which has to compete with an old API that works on all older versions and all new versions. What is the point of avoiding the old API that works just fine, and will continue to work?

Soft deprecation is utterly meaningless unless there is some real benefit to using the new API, and a simple rename seldom achieves that.

tim.one · January 30, 2026, 8:12am

While I don’t expect this will make progress, I think you’d have a much better chance of adding a wordier alias for match than deprecating anything (“soft” or not). “match” has been there for 3 decades, and a great many have never had any notable problem with the name. Some people do, and especially newbies. But it’s generally a shallow learning curve they quickly climb. Your:

is something I hadn’t heard of before. The lack of “me too!” responses in this topic suggests it’s not part of many others’ experience either.

BTW, the “newbie confusions” fell after a suggestion of mine made many years ago: instead of listing the re’s module’s functions in alphabetical order, put “search” before “match”. While it’s not how I happen to use the module, I did (& still do) believe most newcomers are looking for “search”.

petersuter · January 30, 2026, 9:20am

There was limited support for an alias some years ago in this issue:

github.com/python/cpython

Proposal: re.prefixmatch method (alias for re.match)

opened 07:27PM - 13 Nov 20 UTC

gpshead

type-feature stdlib topic-regex

BPO | [42353](https://bugs.python.org/issue42353) --- | :--- Nosy | @gpshead, @e…zio-melotti, @serhiy-storchaka, @msuozzo PRs | <li>python/cpython#31137</li> <sup>*Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.*</sup> <details><summary>Show more details</summary><p> GitHub fields: ```python assignee = 'https://github.com/gpshead' closed_at = None created_at = <Date 2020-11-13.19:27:03.105> labels = ['expert-regex', 'type-feature', 'library', '3.11'] title = 'Proposal: re.prefixmatch method (alias for re.match)' updated_at = <Date 2022-02-05.08:57:42.604> user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = <Date 2022-02-05.08:57:42.604> actor = 'serhiy.storchaka' assignee = 'gregory.p.smith' closed = False closed_date = None closer = None components = ['Library (Lib)', 'Regular Expressions'] creation = <Date 2020-11-13.19:27:03.105> creator = 'gregory.p.smith' dependencies = [] files = [] hgrepos = [] issue_num = 42353 keywords = ['patch'] message_count = 6.0 messages = ['380928', '380975', '380984', '380996', '412554', '412564'] nosy_count = 5.0 nosy_names = ['gregory.p.smith', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'matthew.suozzo'] pr_nums = ['31137'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue42353' versions = ['Python 3.11'] ``` </p></details>

storchaka · January 30, 2026, 9:20am

See Proposal: re.prefixmatch method (alias for re.match) · Issue #86519 · python/cpython · GitHub.

I do not think this is the way to go, because it would not solve any real problem, but would cause a worldwide code churn on par with 2→3.

More common error is when re.search() is used instead of `re.match()`. I encountered this many times. It can be unnoticed for a long time because it “works” if you only use it for expected input and limited variance of invalid input (even if there may be small performance impact).

This is a pretty common error too.

barry-scott · January 30, 2026, 12:02pm

Okay. Me too!

I aways have to read the re docs when I use these methods as I cannot keep search or match semantics in my head.

An alias like match_at_start and search_within would work for me.

gerardw · January 30, 2026, 2:24pm

Replying to multiple comments:

re.match is horrible. Yuk. confusing. Why do ya’ll do that?

But lacking a time machine, the “fix” is far worse that the status quo for all the reasons listed above.

As far as “disappointing” instances of match(^ …), I’m probably using re.search(‘^whatever’).

Not all my regexs are Python – I’m so old I still use sed for quick edits. So consistently using the front anchor ^ is just easier.
A typical use case is going to be i/o reading whatever I’m regexing, anyway.
My re metal model is search good, match bad
If I’m processing sufficient volume where I care about performance matches prefixes, I’m just using str.startswith.

Here are some benchmarks:

Test                  | 3.8  | 3.9  | 3.10 | 3.11 | 3.12 | 3.13 | 3.14
----------------------+------+------+------+------+------+------+-----
re.match('dog')       | 0.94 | 0.90 | 0.96 | 0.84 | 0.96 | 0.90 | 0.85
re.match('^dog')      | 0.96 | 0.92 | 0.98 | 0.87 | 0.98 | 0.93 | 0.88
re.search('^dog')     | 1.18 | 1.15 | 1.24 | 0.92 | 1.00 | 1.02 | 0.97
str.startswith('dog') | 0.34 | 0.31 | 0.33 | 0.31 | 0.33 | 0.16 | 0.14

gist.github.com

https://gist.github.com/Gerardwx/aff90de1932a8ef06cad82d2743a8dd4

regex_benchmark.py

#!/usr/bin/env python3
"""
Benchmark comparison of regex execution times:
- re.match('dog')
- re.search('^dog')
- re.match('^dog')
- str.startswith('dog')
"""

import re

This file has been truncated. show original

run_all_python_versions.sh

#!/bin/bash
# Run regex benchmark across Python 3.6 through 3.14 and generate CSV

SCRIPT="/tmp/claude/regex_benchmark.py"
CSV_FILE="/tmp/claude/benchmark_results.csv"
TEMP_DIR="/tmp/claude/benchmark_temp"

# Create temp directory for intermediate results
mkdir -p "$TEMP_DIR"
rm -f "$TEMP_DIR"/*.csv 2>/dev/null

This file has been truncated. show original

Pyprohly · January 30, 2026, 2:32pm

Me too!

I’m inclined to agree with their opinion; I get suspicious whenever re.match() is used.

When I was a newbie it took a long time for me to ingrain the difference between re.match(), re.fullmatch(), and re.search(). These days I tend to just ignore re.match() and re.fullmatch() and always use the respective re.search() equivalent because it seems cleaner to me to have the pattern matching intent baked into the pattern rather than method name.

Using the \A and \z anchors, re.search() can do everything re.match() and re.fullmatch() can:

re.match(p) → re.search(rf"\A{p}")
re.fullmatch(p) → re.search(rf"\A{p}\z")

So my perfect world would have re.search() renamed to the better sounding ‘re.match()’ and remove re.match() and re.fullmatch() completely, and not worry so much about the minor performance penalty.

However, I understand that these name changes aren’t really possible at this stage. The current names are not ideal but they’re not terrible, so my vote is to do nothing.

Rosuav · January 30, 2026, 2:43pm

Yeah, name-swapping virtually guarantees that there’s no way to write properly-compatible code, so, that’s basically never gonna happen.

gpshead · January 30, 2026, 5:22pm

We, as a language, need to stop looking towards the past and look towards what makes it possible to write more clearly understandable code without need for special domain knowledge and reference manuals. Thats why I put up that issue and PR implementing this in the first place.

We need re.prefixmatch.

re.match’s meaning is a real footgun problem that people continually trip over in Python.

Fixing it does not require getting rid of re.match. All we need is the trivial feature using the proper self-explanatory name (see PR) to provide a well lit path of the actually understandable name.

prefixmatch provides a clear way past the footgun. There is no requirement for all existing code to be updated and never will be. But complaining that it makes it worse, some theoretical problem of projects receiveing PRs to “fix” things that aren’t broken is focusing on yesterday instead of the future. Those are non-problems compared to enabling code to be more understandable.