Inconsistent API in regex module

Hello,

Consider the following code:

import re

RE_STRING = re.compile(R"([0-9]+)@([a-z]+)?")

text = "1234@"

match = RE_STRING.match(text)
search = RE_STRING.search(text)
find_iter = RE_STRING.finditer(text)

find_all = RE_STRING.findall(text)

print(match.groups())  # ('1234', None)
print(search.groups())  # ('1234', None)
print(next(find_iter).groups())  # ('1234', None)

print(find_all[0])  # ('1234', '')

findall is the only call that return an empty string instead of None for the optional capturing group.

Why findall doesn’t return None for optional capturing group?

I assume this is because findall returns a list of tuple of strings or a list of strings while the other functions return Match objects.

In that case, why doesn’t findall return a list of Match objects as well?

Are there some historic reasons?

I will speculate that when processing re.findall output, it’s more consistent if all results are of the same type, string in this case. The return value is a list of tuples, whose elements are the matches from the various groups. In larger bodies of text, the call might well generate a list of more than one tuple. Expanding your input a bit:

>>> text = "1234@ 2345@abc 456@x"
>>> RE_STRING.findall(text)
[('1234', ''), ('2345', 'abc'), ('456', 'x')]

In that case, why doesn’t findall return a list of Match objects as well?

list(RE_String.finditer(text) does this. There woudn’t be any need for finditer if findall returned Match objects (except saving memory). So I would guess there was a time when there was only findall, and no finditer.

That’s true for many/most/all(?) *iter functions and methods. That said, concrete implementation of iterators has been around now for about 25 years. PEP 234 re.finditer was introduced in Python 2.2, but re.findall has been around since Python 1.5.2. See the 2.7 module docs.

(In retrospect, I think it did a disservice to newer users of Python to eliminate the little historical footnotes regarding additions or significant changes which happened in the pre-3.x days. Hopefully the 2.7 docs will remain available in perpetuity.)

2 Likes

Seems likely. The Python Documentation by Version page provides docs way back to the 1.4 version, even though probably (almost?) nobody is using that anymore. And the Ancient Releases page offers version 0.9.1 including documentation. (Oh hey it’s you :slight_smile: )