Summary
Python should add the pos
/endpos
optional parameters to the module convenience functions of re.search/match/fullmatch/findall/finditer
. This would enable pos/endpos searching without having to first compile the regex to a pattern. Here’s a sample diff that would match up with the underlying C functionality.
If there’s appetite for this sort of idea, I’d be happy to create an issue on the issue tracker and write the code and tests for it.
Rationale
There are a number of methods for in the Python Regex Pattern
class that support optional positional arguments (pos
/endpos
):
Pattern.search(string[, pos[, endpos]])
Pattern.match(string[, pos[, endpos]])
Pattern.fullmatch(string[, pos[, endpos]])
Pattern.findall(string[, pos[, endpos]])
Pattern.finditer(string[, pos[, endpos]])
Additionally, Python provides access to these pattern methods as top-level convenience functions in the module itself:
re.search()
re.match()
re.fullmatch()
re.findall()
re.finditer()
However, these top-level convenience functions do not support the optional positional arguments. If anyone wants to utilize the optional positional arguments, they must first compile a pattern with re.compile()
and then call the method with the optional argument.
But all the convenience functions do is 1) compile the pattern and then 2) call the method. Here’s an example directly from the re.py source:
def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string)
Looking at the underlying C Code for these methods, the method defines pos
and endpos
as 0
and PY_SSIZE_T_MAX
respectively. It only changes the values if the arg parser detects the presence of either pos
or endpos
.
Example C code from match (indentation adjusted for readability):
static PyObject *
_sre_SRE_Pattern_match(PatternObject *self, PyTypeObject *cls, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
(...)
Py_ssize_t pos = 0;
Py_ssize_t endpos = PY_SSIZE_T_MAX;
(...)
pos = ival;
(...)
endpos = ival;
(...)
return_value = _sre_SRE_Pattern_match_impl(self, cls, string, pos, endpos);
We could add equivalent functionality to the top level module functions by simply adding two new optional arguments to each of the related functions.
Here’s a sample of what it would look like for match()
import sys
def match(pattern, string, flags=0, pos=0, endpos=sys.maxsize):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string, pos=pos, endpos=endpos)
And here’s a gist with a full implementation. It’s a very simple change, overall: Adding pos/endpos to re · GitHub
As stated above, if there’s appetite for this sort of idea, I’d be happy to create an issue on the issue tracker and write the code and tests for it.