find best case
1000000 loops, best of 5: 303 nsec per loop
1000000 loops, best of 5: 357 nsec per loop
1000000 loops, best of 5: 260 nsec per loop
2000000 loops, best of 5: 129 nsec per loop
find mixed case
1000000 loops, best of 5: 301 nsec per loop
1000000 loops, best of 5: 370 nsec per loop
10000 loops, best of 5: 23.9 usec per loop
1000000 loops, best of 5: 240 nsec per loop
find worst case
5 loops, best of 5: 47.2 msec per loop
20 loops, best of 5: 17.1 msec per loop
5000 loops, best of 5: 45.5 usec per loop
10000 loops, best of 5: 34.2 usec per loop
rfind best case
1000000 loops, best of 5: 209 nsec per loop
20 loops, best of 5: 13.7 msec per loop
1000000 loops, best of 5: 265 nsec per loop
2000000 loops, best of 5: 163 nsec per loop
rfind mixed case
1000000 loops, best of 5: 217 nsec per loop
20 loops, best of 5: 13.6 msec per loop
10000 loops, best of 5: 23.8 usec per loop
1000000 loops, best of 5: 275 nsec per loop
rfind worst case
5 loops, best of 5: 85.9 msec per loop
10 loops, best of 5: 37.8 msec per loop
5000 loops, best of 5: 44.6 usec per loop
10000 loops, best of 5: 35.4 usec per loop
Why donāt you just return here? What exactly are you trying to benchmark?
I was talking about different language constructs.
My only concern is the worst case. find shouldnāt become a performance trap; it shouldnāt be naive.
Take a look at the use case above. It would make perfect sense to use string.find(english_words) there.
Iāve already provided the alternative in the linked thread. But thatās irrelevant; simply put, find should handle these cases if youāre going to allow multiple substrings.
It is relevant, weāre comparing against alternatives. If I have no alternatives to compare it with we canāt say how much faster or slower it is. Only that it appears inefficient, which is subjective.
Iām not saying we shouldnāt benchmark this, but I need those alternatives.
This problem is well-known. It has edge cases like the thread I linked, where brute force is going to win.
I repeat, if find were to support multiple substrings as an input, it should handle these edge cases. We canāt just say, āLook, find is naive, donāt use it.ā
Could you point exactly to the problem you are referring to? The problem you linked there just doesnāt match the specification of what find does here. In other words, find aims to find the lowest index of any of the substrings, while the problem you are referring to needs to exit if ANY substring was found.
I would be more than happy to include the case you have in mind if you wrote it down simply and clearly here. If the problem is a subset of what find does (which is the case for the problem in linked thread), then you need to provide solution against which to benchmark.
I have re-considered. Although it might seem that those are not covered in the current benchmark, they actually are. There are 3 cases with the search of 2 substrings in a very long string:
both of substrings are at the start of long string - this is the same as short string and a large tuple. Although tuple is not that large, it captures this nuance well. Longer tuple would just scale complexity linearly.
1 substring is hit early, while other is at the back of very long string. This tests early stop. Again, more items in a tuple would just scale complexity linearly.
There are no hits in a very long string, so that algorithm needs to run full scan. Again, increasing number of substrings in a tuple would just scale complexity linearly.
So benchmark case is complete. And efficient alternative methods are chosen, so that there is at least 1 alternative which in theory should deal well with each benchmark case.
If you have a specific case, which you can write down (without imports please) and provide an alternative method against which to benchmark - I can run it.
But the method needs to be efficient so that there is an actual value in benchmarking. An algorithm written in python which uses loops has no chance outperforming current implementation.
But I would require a code which I can run to do this.