Add tuple support to more str functions

Rosuav · May 24, 2024, 9:55am

Ah, that’s WAY more complicated. I mean, I love messing around with LALR grammars and building expression evaluators with full flexibility (it’s hugely fun!), but that’s really not a replacement for regex

NeilGirdhar · May 24, 2024, 9:57am

In the long run, why not? An ideal compile function could detect that the grammar can be implemented as a regular expression, and it would just work like RE.

Once you’ve put all the work into building a nice RE syntax, you may as well build a parser on top of it

NeilGirdhar · May 24, 2024, 10:06am

I think we are on topic. This may not be the solution you prefer, but I think it is a reasonable solution.

What you are writing as str.find((x, y)), can just as easily be written as something like humre.find(str, humre.either(x, y)). Why is are you opposed to the humre solution?

I made a few arguments why I think it’s better:

it’s more general (you can do anything regular expressions can do, including
- capturing patterns (which was mentioned in the ,
- searching for complicated patterns, etc.
and it fits with the OO principle that methods should be reserved for doing things that cannot be done with the public interface.

I also agreed Serhiy’s arguments against expanding the standard library for this.

Regexes can be compiled as a deterministic finite state machines leading to linear performance. (Re2 from Google does that I believe)

Right, exactly. Which is why regular expressions are a fine solution to your problem, right?

dg-pb · May 24, 2024, 10:09am

It is “inevitable”.

As much as I appreciate a theory and long-term possibilities, it would be nice to keep pragmatic and realistic here.

Unless of course you have a clearly defined goal and plan of action how you are going to improve re so that its performance will “almost” match implementations proposed in this thread.

Nineteendo · May 24, 2024, 10:13am

Current implementation: GitHub - nineteendo/cpython at find-tuple

Benchmark (compared against fastest alternative, which is kind of unfair, because that might not be the fastest in other cases):

script

# find_tuple.py
import re

PATTERN = re.compile(r"[\\/]")

def find1(p):
    seps = "\\/"
    i, n = 0, len(p)
    while i < n and p[i] not in seps:
        i += 1
    if i == n:
        i = -1
    return i

def find2(p):
    match = PATTERN.search(p)
    i = match.start() if match else -1
    return i

def find3(p):
    sep = "\\"
    altsep = "/"
    i = p.find(sep)
    new_i = p.find(altsep)
    if new_i != -1 and (new_i < i or i == -1):
        i = new_i
    return i

def find4(p):
    seps = ("\\", "/")
    i = p.find(seps)
    return i

def rfind1(p):
    seps = "\\/"
    i = len(p) - 1
    while i >= 0 and p[i] not in seps:
        i -= 1
    return i

def rfind2(p):
    match = PATTERN.search(p[::-1])
    i = len(p) - 1 - match.start() if match else -1
    return i

def rfind3(p):
    sep = "\\"
    altsep = "/"
    i = max(p.rfind(sep), p.rfind(altsep))
    return i

def rfind4(p):
    seps = ("\\", "/")
    i = p.rfind(seps)
    return i

# find_tuple.sh
echo find - best case
find-tuple/python.exe -m timeit -s "import find_tuple; string = r'\/' + 'a' * 998" "find_tuple.find1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = r'\/' + 'a' * 998" "find_tuple.find2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = r'\/' + 'a' * 998" "find_tuple.find3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = r'\/' + 'a' * 998" "find_tuple.find4(string)"
echo find - mixed case
find-tuple/python.exe -m timeit -s "import find_tuple; string = '/' + 'a' * 999" "find_tuple.find1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = '/' + 'a' * 999" "find_tuple.find2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = '/' + 'a' * 999" "find_tuple.find3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = '/' + 'a' * 999" "find_tuple.find4(string)"
echo find - worst case
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.find1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.find2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.find3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.find4(string)"
echo rfind - best case
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 998 + r'\/'" "find_tuple.rfind1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 998 + r'\/'" "find_tuple.rfind2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 998 + r'\/'" "find_tuple.rfind3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 998 + r'\/'" "find_tuple.rfind4(string)"
echo rfind - mixed case
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999 + '/'" "find_tuple.rfind1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999 + '/'" "find_tuple.rfind2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999 + '/'" "find_tuple.rfind3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999 + '/'" "find_tuple.rfind4(string)"
echo rfind - worst case
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.rfind1(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.rfind2(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.rfind3(string)"
find-tuple/python.exe -m timeit -s "import find_tuple; string = 'a' * 999" "find_tuple.rfind4(string)"

find - best case - 1.40x faster

2000000 loops, best of 5: 108 nsec per loop
1000000 loops, best of 5: 280 nsec per loop
2000000 loops, best of 5: 122 nsec per loop
5000000 loops, best of 5: 77 nsec per loop

find - mixed case - 1.27x faster

2000000 loops, best of 5: 109 nsec per loop
1000000 loops, best of 5: 279 nsec per loop
2000000 loops, best of 5: 135 nsec per loop
5000000 loops, best of 5: 85.8 nsec per loop

find - worst case - 1.48x faster

5000 loops, best of 5: 48.5 usec per loop
50000 loops, best of 5: 5 usec per loop
2000000 loops, best of 5: 143 nsec per loop
5000000 loops, best of 5: 96.7 nsec per loop

rfind - best case - 1.42x faster

2000000 loops, best of 5: 117 nsec per loop
200000 loops, best of 5: 1.25 usec per loop
2000000 loops, best of 5: 151 nsec per loop
5000000 loops, best of 5: 82.2 nsec per loop

rfind - mixed case - 6.01x slower

2000000 loops, best of 5: 116 nsec per loop
200000 loops, best of 5: 1.25 usec per loop
500000 loops, best of 5: 743 nsec per loop
500000 loops, best of 5: 697 nsec per loop

rfind - worst case - 1.03x faster

5000 loops, best of 5: 47.3 usec per loop
50000 loops, best of 5: 5.84 usec per loop
200000 loops, best of 5: 1.34 usec per loop
200000 loops, best of 5: 1.3 usec per loop

rfind() seems to be consistently slower than find(), which is outside of my control (I haven’t written the helper functions).

dg-pb · May 24, 2024, 10:18am

That might be 1 step ahead, but one feature of re could be useful - REVERSE flag.

import re
s = 'a' * 50000 + '_' + 'a' * 50000 + '_' + 'a' * 50000
%timeit s.find('_')      # 807 ns
%timeit s.rfind('_')      # 20.7 µs

%timeit next(re.finditer(r'_', s)).span()   # 14.7 µs
%timeit next(re.finditer(r'(?s:.*)_', s)).span()   # 36.4 µs

str.rfind is 20x slower than str.find (is this the same for everyone? I am surprised a bit)

And there is no straight forward way to do a reverse search in re. Is there?

Has the flag, but not sure how it functions.

methane · May 24, 2024, 10:19am

Are there common use cases for find/rfind/split by multiple separater string?
Maybe, line ending is the most common case. But Python provides str.splitlines() already.

If we can focus on multile character separaters, we can ignore Aho-Corasick vs brute force.

For example, Go provides simple APIs by support only chars:

Nineteendo · May 24, 2024, 10:21am

Well, yeah Windows paths for example, see here:

And we can still ignore Aho-Corasick when tuple(seps) is passed.

MegaIng · May 24, 2024, 10:26am

Because it’s an extra package I have to install and learn
everyone reading the package also has to learn it
It is currently implemented on top of re, meaning from a performance perspective it is going to be worse, always (and based on the chosen syntax, I am not sure if it can ever compete)
Personally, I much prefer something far less verbose than what humre offers.

This is not an OPP principle I have ever heard about, and IMO it’s a bad one that leads to incomplete APIs missing essential features for day-to-day use, and scatters the needed functions across multiple places.

It also goes against past design decisions for python, see the history of the string module.

NeilGirdhar · May 24, 2024, 10:35am

I think you’ll find it in many OO books. Herb Sutter discusses it here specifically discussing a string class.

The suggestion isn’t that essential features be in multiple places. The suggestion is that they should implemented in free functions rather than methods. E.g., string. find rather than str.find. And once you’re already importing string, you could instead import re or something like it.

That’s fair, but then you can use re if you don’t want to learn something new.

Post-compile, it is identical. I also really don’t understand this interest in optimization. Are you running on ten megabyte strings? If not, who cares about speed? Nearly all of the time is going to be in setting up the call no matter how fast you make the call.

dg-pb · May 24, 2024, 10:40am

One very common case is Case-Change problem. snake_case to Title Case etc.
a) split sentence on multiple separators
b) separators must be kept in place to be able to join back

Note, split already offers this, but only for 1 specific case - whitespaces. It checks whether a character is one of many internally. Also, it lacks keepsep flag.

Currently this is usually being solved with re but solution is fairly involved. There is a whole library for it: case-converter · PyPI

MegaIng · May 24, 2024, 10:40am

Yes, exactly! That is where humre is going to fail for such simple cases compared to re or builtins. ( Unless I have a predefined pattern somewhere else, scattering the relevant code throughout the module)

elis.byberi · May 24, 2024, 10:42am

I am unable to reproduce that.

import timeit

setup = "s = 'a' * 50000 + '_' + 'a' * 50000 + '_' + 'a' * 50000"
number = 1000000

# Benchmark for find
find_time = timeit.timeit(
    "s.find('_')",
    setup=setup,
    number=number
)

# Benchmark for rfind
rfind_time = timeit.timeit(
    "s.rfind('_')",
    setup=setup,
    number=number
)

print(f"Time taken for find: {find_time} seconds")
print(f"Time taken for rfind: {rfind_time} seconds")

Output:

Time taken for find: 0.711215250999885 seconds
Time taken for rfind: 0.7262046799987729 seconds

NeilGirdhar · May 24, 2024, 10:44am

I mean the setup in the surrounding code, which has to happen no matter what. Whatever setup that humre does still going to be tiny by comparison.

And anyway, the humre compilation typically only happens once for a piece of code unless you’re searching for different things each time, which isn’t something that’s been illustrated in any examples.

I think in practice, these kind of micro-optimizations don’t matter. Python isn’t the right language if they do matter to you.

dg-pb · May 24, 2024, 10:46am

Much thanks! Im running OSX, could be a bug. Tested on 3.11 & 3.12 - same result.

dg-pb · May 24, 2024, 10:47am

Why? Could you elaborate?

Nineteendo · May 24, 2024, 10:51am

Very strange:

script

# find_tuple.sh
echo best case
find-tuple/python.exe -m timeit -s "string = '/' + 'a' * 999" "string.find('/')"
find-tuple/python.exe -m timeit -s "string = 'a' * 998 + '/'" "string.rfind('/')"
echo worst case
find-tuple/python.exe -m timeit -s "string = 'a' * 1000" "string.find('/')"
find-tuple/python.exe -m timeit -s "string = 'a' * 1000" "string.rfind('/')"

best case
10000000 loops, best of 5: 32.3 nsec per loop
5000000 loops, best of 5: 40.1 nsec per loop
worst case
5000000 loops, best of 5: 44.8 nsec per loop
500000 loops, best of 5: 639 nsec per loop

NeilGirdhar · May 24, 2024, 10:53am

If you’re worried about squeezing out every bit of speed, use a language like Rust or C. In Python, I don’t think that optimization is an important consideration compared with the other considerations mentioned in this thread: good interface design and keeping the stdlib simple.

Nineteendo · May 24, 2024, 10:56am

We (not me) were recently investigating if we could avoid re imports because of the import time. Also, because of the setup cost alternatives beat it out for short strings, making it not objectively better. str.find(tuple) doesn’t suffer from those problems.

dg-pb · May 24, 2024, 11:00am

re is already 15x slower than str.find. How much slower is it going to be with humbre overhead?

I think it is very wrong to argue: “It is either: speed or convenience”.
A good product is both fast and convenient.

Following your logic, 3 abstractions down nothing will be possible to calculate in time anymore.
And it is ok, because it is the wrong language.
So who is going to build a right one?

It is always easy to argue with such cliches.
It is much harder to come up with solutions that are good in all aspects.

Finally, yes, python is slower than compiled language.
Does it mean that perfromance does not matter anymore?
Or does it mean that extra effort is needed for it to be competitive?