Add tuple support to more str functions

It’s not only speed, readability also counts: Justifying Python language changes | Curious Efficiency

def find1(p):
    seps = r"\/"
    i, n = 0, len(p)
    while i < n and p[i] not in seps:
        i += 1
    if i == n:
        i = -1
    return i

def find4(p):
    seps = r"\/"
    i = p.find(tuple(seps))
    return i

The first approach is just ridiculous (It’s not C).


Does someone want to run my benchmark on their PC?

1 Like

If you don’t want to write specialised functions for this, maybe you want to consider possibility of dividing it into segments of certain size?

I.e.:

def find(s, subs):
    i, n = 0, len(s)
    while i < n:
        for sub in subs:
            any_find_slice(s, i, min(n, i + constant_partition)
            ...
        if found:
            break
        else:
            i += constant_partition

If you optimize for constant_partition, it is possible that cost of extra C function calls would be justified for benefit of early stop leading to overall better balanced solution.

I’m unclear why “this API is faster” is being used to justify “it needs to be in cpython”. Plenty of fast APIs are provided by third-party libraries. Performance should be completely irrelevant.

I would argue the burden this proposal needs to shoulder is explaining why this API is of sufficiently wide use and would make code sufficiently more readable that it will benefit being a cpython battery. Nothing else is relevant.

2 Likes

I think this should be raised as a separate issue and not mixed up with this work.

Yeah, that’s why I’m asking someone else to run the benchmark. (I uninstalled my developer tools on Windows).

It seems it is a Mac issue, so someone with linux or win is best.

Create a thread in Help (not in Ideas) if you are willing to pursue this issue. Explain the issue and provide a Python script to conduct the benchmark, rather than a bash script.

I ran it for strings of 1 million characters (rfind() should only be about 144x slower with memrchr()):

find best case - 1.42x faster - regex - 3.65x slower
2000000 loops, best of 5: 109 nsec per loop
1000000 loops, best of 5: 281 nsec per loop
2000000 loops, best of 5: 122 nsec per loop
5000000 loops, best of 5: 77 nsec per loop
find mixed case - 144x slower - regex - 2.53x slower
2000000 loops, best of 5: 111 nsec per loop
1000000 loops, best of 5: 280 nsec per loop
20000 loops, best of 5: 16.2 usec per loop
20000 loops, best of 5: 16 usec per loop
find worst case - no difference - regex - 151x slower
5 loops, best of 5: 50.9 msec per loop
50 loops, best of 5: 4.81 msec per loop
10000 loops, best of 5: 32.1 usec per loop
10000 loops, best of 5: 31.8 usec per loop
rfind best case - 1.42x faster - regex - 8739x slower
2000000 loops, best of 5: 118 nsec per loop
500 loops, best of 5: 728 usec per loop
2000000 loops, best of 5: 152 nsec per loop
5000000 loops, best of 5: 83.3 nsec per loop
rfind mixed case - 5034x slower - regex 6161x slower
2000000 loops, best of 5: 118 nsec per loop
500 loops, best of 5: 727 usec per loop
500 loops, best of 5: 596 usec per loop
500 loops, best of 5: 594 usec per loop
rfind worst case - no difference - regex 4.66x slower
5 loops, best of 5: 51.7 msec per loop
50 loops, best of 5: 5.54 msec per loop
200 loops, best of 5: 1.19 msec per loop
200 loops, best of 5: 1.19 msec per loop

OK, re is very bad at this (mostly because it can’t search in reverse).
Speed is compared against the fastest algorithm (that doesn’t cheat).

The find1 function is pretty inefficient. You could give this other one a shot:

def find1(p):
    seps = {'\\', '/'}
    for i, c in enumerate(p):
        if c in seps:
            return i
    return -1
1 Like

Yeah that works, thanks.

I was suggesting that this could be the function you are looking for. It is approximately 1.2 times faster than your find1 function. Its time complexity is 𝑂(𝑛) in the worst case, where 𝑛 is the length of the string.

The time complexity of the algorithm proposed in the pull request is 𝑂(𝑛𝑘) in the worst case, where 𝑛 is the length of the string and 𝑘 is the length of the tuple.

I am curious about the speed difference for a longer seps tuple. At what tuple length does the proposed algorithm become slower than the find1 function?

def find1(p):
    seps = {'\\', '/'}  # what length?
    for i, c in enumerate(p):
        if c in seps:
            return i
    return -1

Regex gets absolutely smoked (yes that’s 100 thousand times slower).
If there was a flag to find the last match…

find best case - 1.76x faster - regex - 4.13x slower
1000000 loops, best of 5: 177 nsec per loop
1000000 loops, best of 5: 277 nsec per loop
2000000 loops, best of 5: 118 nsec per loop
5000000 loops, best of 5: 67.1 nsec per loop
find mixed case - 2.64x faster - regex - 4.08x slower
2000000 loops, best of 5: 177 nsec per loop
1000000 loops, best of 5: 7 nsec per loop
20000 loops, best of 5: 16 usec per loop
5000000 loops, best of 5: 67.1 nsec per loop
find worst case - 1.12x faster - regex - 165x slower
5 loops, best of 5: 40.9 msec per loop
50 loops, best of 5: 4.82 msec per loop
10000 loops, best of 5: 32.6 usec per loop
10000 loops, best of 5: 29.2 usec per loop
rfind best case - 1.20x faster - regex - 100,544x slower
2000000 loops, best of 5: 110 nsec per loop
50 loops, best of 5: 9.24 msec per loop
2000000 loops, best of 5: 153 nsec per loop
5000000 loops, best of 5: 91.9 nsec per loop
rfind mixed case - 1.29x faster - regex - 108,696x slower
2000000 loops, best of 5: 110 nsec per loop
50 loops, best of 5: 9.25 msec per loop
500 loops, best of 5: 598 usec per loop
5000000 loops, best of 5: 85.1 nsec per loop
rfind worst case - 1.03x slower - regex - 18.7x slower
5 loops, best of 5: 49.2 msec per loop
10 loops, best of 5: 22.3 msec per loop
200 loops, best of 5: 1.19 msec per loop
200 loops, best of 5: 1.22 msec per loop

Compared once again to the fasted alternative.

Summary of what has been done and current case:

  1. Integration. Given str.startswith & str.endswith is able to accept tuple substring argument, this extension is not something completely new, but follows a design of couple of other string methods.
  2. Functionality. There were 2 options for this extension. limit substrings to single characters (the way that “go” has it as suggested by @methane strings package - strings - Go Packages) or support multi-character strings. It so happened that there is an easy and effective way to implement multi-character string support, while keeping API (i.e. tuple argument) in line with str.stasrtswith. Thus, this provides complete functionality in comparison to implementation of single-character substrings, where if multi-string support becomes needed in the future it would lead to additional extensions with all the issues that comes with it in addition to keeping backwards compatibility with existing incomplete addition (if this was single-string implementation).
  3. Maintenance. Although there have been comments that implementation is complex, this does not seem so to me. There were details to sort out along the way as bucketing implementation was a spontaneous process without any example to follow. The final result is a well defined simple algorithm, which handles any case properly and efficiently and makes use of existing functions for single substring case. So the maintenance cost is much lower in comparison to newly written algorithm from scratch such as Aho–Corasick algorithm - Wikipedia. The implementation written in python is clear and simple:
def find_tuple(s, subs, start=None, end=None):
    CHUNK_SIZE = 10_000
    start = 0 if start is None else start
    end = len(s) if end is None else end
    result = -1
    while result == -1 and start <= end:
        cur_end = start + CHUNK_SIZE
        # `- 1` & `cur_end + len(sub)` down below
        # means minimum 1 character in current chunk
        cur_end -= 1
        for sub in subs:
            sub_end = cur_end + len(sub)
            sub_end = min(sub_end, end)
            pos = s.find(sub, start, sub_end)
            if pos != -1:
                if pos == start:
                    return start
                # match with `cur_end = pos` would be as good
                # `- 1` here to only allow for later match
                cur_end = pos - 1
                result = pos
        start += CHUNK_SIZE
    return result
  1. Performance. Although performance wasn’t the main target, but the optimization above ensured that this implementation is at least as fast as any other solution currently available in python for any case (it is possible with certain effort to beat it by a small percentage, but unlikely in practice). This optimization ensured that implemented functionality strongly satisfies There should be one-- and preferably only one --obvious way to do it., while the status quo is that different solutions for different cases need to be used for optimal performance. The suggested implementation performs very well in practice and more complex implementations would only provide marginal improvement at significantly greater maintenance cost.
  2. Readability. All of alternative solutions are less readable than the suggested implementation. Cases and examples where this extension would improve readability:
  3. Calibration. CHUNK_SIZE was selected to be 10_000 as python call with all of the overhead takes as much time as actual scan of 10_000 length string. Lower sizes lead to no significant improvements in short running cases, while negative performance impact, although small (<5% for CHUNK_SIZE=1000), but is observed for full scans of 1M length strings. While larger sizes negatively impact performance of mixed cases (where one substring is found early, but full scans are done for all remaining ones). Selected number is a well balanced starting value. Fine-tuning is straightforward and can always be done later if such need arises.
  4. Additional Information. This solution provides: “The first/last match with the lowest/highest STARTING index”. And if there is a need it is easy to get (with extra cost): “The first/last match with the lowest/highest ENDING index” by:
s[::-1].rfind((sub1[::-1], sub2[::-1]))    # first lowest ending index
s[::-1].find((sub1[::-1], sub2[::-1]))    # last highest ending index

So it does provide simple solutions for 2 more cases as opposed to the case where these 2 were directly reversible and reversion as above would just lead to each other.

To sum up:
This addition doesn’t overly rely on any single point from the above, but is rather balanced in benefits from each.

1 Like

Is there someone who would like to run the benchmark on Linux? The performance of rfind() is vastly different on macOS.

Could you add this function to the benchmark?

def find_oneliner(p):
    subs = ('\\', '/')
    return next((i for i in map(lambda x: p.find(x), subs) if i != -1), -1)

Doesn’t work, returns wrong result for '/\\':

>>> find_oneliner("/\\")
1

This is good enough:

def find3(p):
    sep = "\\"
    altsep = "/"
    i = p.find(sep)
    new_i = p.find(altsep, 0, None if i == -1 else i)
    if new_i != -1:
        i = new_i
    return i

Also you shouldn’t directly return the value, they’re only in functions to benchmark them.

It should be this other one. Feel free to adjust.

def find_oneliner(p):
    subs = ('\\', '/')
    i = min((i for i in map(lambda x: p.find(x), subs) if i != -1), default=-1)
    
    return i


print(find_oneliner("/\\"))
print(find_oneliner("0/\\"))
print(find_oneliner("01/\\"))

Output:

0
1
2

You also need real-world usage datasets to benchmark these functions. It’s not sufficient to benchmark only a two-item tuple.

Here is a use case:

As a oneliner, min([i for s in subs if (i := p.find(s)) != -1], default=-1) is faster.

source

# min((i for i in map(lambda x: p.find(x), subs) if i != -1), default=-1)
oneliner1: Mean +- std dev: 759 ns +- 7 ns
# min([i for i in map(p.find, subs) if i != -1], default=-1)
oneliner2: Mean +- std dev: 560 ns +- 9 ns
# min([i for s in subs if (i := p.find(s)) != -1], default=-1)
oneliner3: Mean +- std dev: 482 ns +- 6 ns
1 Like

And against find3()? It seems slower as it searches beyond the first match.
The goal wasn’t to compare it against different strategies, not oneliners.