Good point. It depends on how we define dunders.
If a dunder is any string with double underscores prepended and appended, then '__' + '' + '__'
or quad underscores should be the shortest dunder string, and the minimum length should be 4.
If a dunder is any string that begins and ends with two underscores, not necessarily distinct, then '__'
and '___'
could be considered dunders, and the minimum length is 2.
If a dunder is any non-empty string with two leading and trailing underscores, then the minimum length should be 5.
Normally these cases don’t come up, because we’re talking about dunder methods or at least identifiers, and the identifier part is always a non-empty name like “add” or “init”. But the ipython environment defines special global variables __
, ___
etc and it is not clear if they should be filtered out or not.
Possibly some of that concern about regexes is that they are frequently over-used, especially in the Perl and Javascript communities. Consequently, many people are very defensive – perhaps too defensive – when they see beginners reading for regexes when there are faster, better, easier solutions.
Or in the words of Jamie Zawinski: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”
Even expert coders can easily mess up regexes, for instance when a rogue regex accidently cut off access to vast areas of the web to all of Russia.
Some of the issues with regexes include:
- extremely terse, cryptic syntax;
- for anything but trivial problems, regexes are hard to write, hard to read, hard to debug, and hard to maintain;
- consequently, except for the simplest and most trivial problems, regexes often contain bugs and misbehaving corner cases.
(There are others, such as the risk of exponentially poor backtracking performance, and bugs in the regex engine itself.)
No community is more pro-regexes than Perl, and this is what Perl’s creator, Larry Wall, had to say about regexes when they were evolving the language into Perl 6:
“… unfortunately, there’s a lot of regex culture that needs breaking.”
It is worth ready Larry Wall’s essay there, it makes very good points about the problems with regex syntax as it exists today.
Regexes are an independent programming language, (almost?) as powerful as Python. For example, here is a regular expression that will tell you whether a number n
is a prime number or not.
not re.match(r'^1?$|^(11+?)\1+$', '1'*n)
If you are tempted to use it, don’t: it is very inefficient. Short and cryptic but inefficient.
I think that Python regexes are not quite powerful enough to count as a full Turing-complete programming language, but other languages (especially Perl) have even more powerful regexes which are genuinely Turing-complete.
You may have seen the monster regex that validates that email addresses are fully compliant with the standard. No, it is not a joke.
You suggested this regex:
re.fullmatch(r'__.+__', s)
That looks about as short and simple as a regex can be. Let’s see how it performs on my computer. (Your computer may perform differently.)
Here is your regex solution:
[steve ~]$ python3.10 -m timeit -s "import re" "re.fullmatch(r'__.+__', '__abcde__')"
500000 loops, best of 5: 648 nsec per loop
[steve ~]$ python3.10 -m timeit -s "import re" "re.fullmatch(r'__.+__', '__abcde__')"
500000 loops, best of 5: 652 nsec per loop
Here is the solution using startswith and endswith, in a helper function:
[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 272 nsec per loop
[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 276 nsec per loop
The regex solution takes more than double the time.
Here is the solution using slicing, with no length check.
[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return s[:2] == s[-2:] == '__'" "is_dunder('__abcde__')"
1000000 loops, best of 5: 233 nsec per loop
[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return s[:2] == s[-2:] == '__'" "is_dunder('__abcde__')"
1000000 loops, best of 5: 232 nsec per loop