# 1. Work
I managed to combine all good tricks that I found in the library intoā¦ 1 dynamic solution, which seems to perform well and eliminate hard-coded boundaries for algorithm selection.
1. Instead of 3 different implementations only one (`horspool_find`) is now called (for both forward and reverse search). It dynamically defaults to linear-complexity-assured solution (`two_way_find`) if it predicts it will perform better.
2. Direction agnostic logic allowed `rfind` to use exact same code as `find`.
3. Special case `n == m` to use `memcmp` added.
# 2. Results
Aggregate impact of this change seems to be net positive. It results in non-trivial average performance increase, adapts more advanced search algorithms for reverse search, smooths out performance `surface` and improves on general code structure and documentation.
Benefits:
1. Performance `surface` is much smoother now. There is only 1 logic that can cause a step change now and it is dynamic as opposed to many hard-coded step changes of the current logic.
2. Direction agnostic logic works well and eases the strain of the alternative of having to keep 2 implementations in sync.
3. Benchmarks:
* Average **75%** performance increase of `find` for artificial benchmark of shuffled alphabet.
* Average **34%** performance increase of `find` for real file search of different slice lengths.
* Average **247%** performance increase of `rfind` for artificial benchmark of shuffled alphabet.
Worth noting:
1. Splitting 2 directions (forward and reversed) into 2 implementations would result in 10-30% better performance based on tested benchmarks. However, I think it is a good trade-off, given the advantages of such approach.
2. There are areas and cases where new algorithm performs worse (see benchmark). However, they are either not clustered or where they are, the performance decrease is not substantial.
# 3. Benchmarks:
Benchmark result value:
```python
current_runtime = `run time of current python version`
new_runtime = `run time of this PR`
result = (new_runtime - current_runtime) / min(new_runtime, current_runtime)
```
### 3.1. Artificial dataset via randomized alphabet.
<details><summary>Case Generation Gode</summary>
<p>
```python
# shuffled alphabet
alphabet = 'DHUXYEZQCLFKISBVRGNAMWPTOJ'
zipf = [1/x for x in range(1, 1+len(alphabet))]
def zipf_string(length, seed):
letters = random.Random(seed).choices(alphabet, weights=zipf, k=length)
return ''.join(letters)
NLS = [
2, 3, 4, 6,
8, 12, 16, 24,
32, 48, 64, 96,
128, 192, 256, 384,
500, 1000, 10000, 100_000
]
HSS = [
500, 750, 1000, 1500,
2_000, 3_000, 4_000, 6_000,
8_000, 12_000, 16_000, 24_000,
32_000, 48_000, 64_000, 96_000,
1_000_000
]
def generate_benchmarks():
output = []
for m in NLS:
for n in HSS:
if n < m:
continue
for s in (1, 2, 3):
seed = (s*n + m) % 1_000_003
needle = zipf_string(m, seed)
haystack = zipf_string(n, seed ** 2)
name = f"needle={m}, haystack={n}, seed={s}"
output.append((name, needle, haystack))
with open(f"{PATH}/_generated.py", 'w') as f:
print("benches = [", file=f)
for name, needle, haystack in output:
print(f" {(name, needle, haystack)!r},", file=f)
print("]", file=f)
```
</p>
</details>
<details><summary>1.a. Results. Current vs new `str.find/str.count`.</summary>
<p>
<img width="963" alt="Screenshot 2024-06-27 at 15 06 37" src="https://github.com/python/cpython/assets/3577712/812f7a33-3118-460b-aa88-bc6710cde333">
</p>
</details>
Comparison for `len(haystack) == 1000` for `str.find`. x-axis is `"{needle_len}:{seed}"`. Upper chart is run time, lower chart is percentage difference. It depicts the issue this PR is addressing. I.e. Big sub-optimal step-changes in performance for small input changes.
<img width="607" alt="Screenshot 2024-06-27 at 14 12 55" src="https://github.com/python/cpython/assets/3577712/7a5ea910-3f61-4fcf-9a74-57791c72d788">
<details><summary>1.b. Results. Current vs new `str.rfind/str.rcount`..</summary>
<p>
<img width="597" alt="Screenshot 2024-06-08 at 23 23 17" src="https://github.com/python/cpython/assets/3577712/d00493a1-c58b-4479-9bd1-8f7d2425e5fa">
</p>
</details>
### 3.2. Search for arbitrary chunks in real files.
<details><summary>Case Generation Gode. `str.rfind/str.rcount`..</summary>
<p>
```python
/
FILES = {
"c": (CPYTHON_PATH / "Objects" / "unicodeobject.c").read_text(),
"py": (CPYTHON_PATH / "Lib" / "_pydecimal.py").read_text(),
"en": (CPYTHON_PATH / "Doc" / "library" / "stdtypes.rst").read_text(),
"bin": (CPYTHON_PATH / "python.exe").read_bytes(),
}
MS = [10, 15, 20, 30, 40, 60, 80, 120, 160, 240, 320, 640, 1280]
MR = range(12)
def generate_benchmarks():
results = dict()
for file_label, haystack in FILES.items():
n = len(haystack)
for m in MS:
for i in MR:
stt = (1_000_003 * i) % (n - m)
needle = haystack[stt:stt + m]
results[(m, file_label, i)] = haystack, needle
return results
```
</p>
</details>
<details><summary>2.a. Results. Current vs new `str.find/str.count`.</summary>
<p>
<img width="735" alt="Screenshot 2024-06-27 at 15 06 54" src="https://github.com/python/cpython/assets/3577712/609500cd-7fd9-4276-9569-2c97114aefba">
</p>
</details>
* Issue: gh-119702