Not if your code does that a lot.
I am out of ideas for now. Explored what I could with my current knowledge.
Will keep using loops for short sequences and any(map(pred, *args_with_needed_repeats))
for long sequences.
But will keep this in mind. If anyone has any ideas, I am open - it would be nice to have non-loop instruments that work efficiently for problems of all sizes.
Final results that I ended up with. any_mapc
is a joint any
and map
python extension with 1 non-iterable extra arg to map
βs function.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 10 repeats, 1,000 times β
β£ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ«
β Units: ns 0 5 10 50 100 1000 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ«
β any_base β 630 1632 2621 10463 20204 196625 β
β any_true β 719 1500 2328 9017 17323 174015 β
β any_map β 325 1189 1997 8671 17061 173005 β
β any_loop β 134 969 1731 8798 17025 173039 β
β next_impl β 685 1521 2315 8857 17226 172828 β
β any_mapc β 173 1076 1969 9163 18391 186403 β
βββββββββββββ»ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Made a bit more progress and have an ok picture now. I tested my idea above regarding the use of partial
and map
version with only 1 arg, which removes a bit of overhead.
What has been done:
- To make use of existing
functools.partial
I implementedcontains_rev
function with arguments swapped. - Modified
map
(namedmap1
), which only accepts 1 iterable. Removing some complexity improved its performance.
Results:
def any_loop_(maps, key):
for m in maps:
if key in m:
return True
return False
# ------------------------------------
k = 0
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 611
%timeit any(True for el in maps if k in el) # 610
%timeit any(map(contains, maps, repeat(k))) # 358
%timeit any_loop_(maps, k) # 159
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 160
%timeit any(map1(pred, maps)) # 146
# ------------------------------------
k = 5
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 922
%timeit any(True for el in maps if k in el) # 821
%timeit any(map(contains, maps, repeat(k))) # 514
%timeit any_loop_(maps, k) # 321
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 333
%timeit any(map1(pred, maps)) # 281
# ------------------------------------
k = 10
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 1300
%timeit any(True for el in maps if k in el) # 906
%timeit any(map(contains, maps, repeat(k))) # 634
%timeit any_loop_(maps, k) # 471
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 456
%timeit any(map1(pred, maps)) # 405
# ------------------------------------
k = 25
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 2200
%timeit any(True for el in maps if k in el) # 1150
%timeit any(map(contains, maps, repeat(k))) # 1000
%timeit any_loop_(maps, k) # 924
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 930
%timeit any(map1(pred, maps)) # 795
# ------------------------------------
k = 50
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 4.0 Β΅s
%timeit any(True for el in maps if k in el) # 2.0 Β΅s
%timeit any(map(contains, maps, repeat(k))) # 1.5 Β΅s
%timeit any_loop_(maps, k) # 1.6 Β΅s
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 1.7 Β΅s
%timeit any(map1(pred, maps)) # 1.4 Β΅s
# ------------------------------------
k = 100
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 6.0 Β΅s
%timeit any(True for el in maps if k in el) # 3.3 Β΅s
%timeit any(map(contains, maps, repeat(k))) # 2.6 Β΅s
%timeit any_loop_(maps, k) # 2.9 Β΅s
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 3.2 Β΅s
%timeit any(map1(pred, maps)) # 2.6 Β΅s
# ------------------------------------
k = 1000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 63 Β΅s
%timeit any(True for el in maps if k in el) # 29 Β΅s
%timeit any(map(contains, maps, repeat(k))) # 23 Β΅s
%timeit any_loop_(maps, k) # 28 Β΅s
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 30 Β΅s
%timeit any(map1(pred, maps)) # 26 Β΅s
# ------------------------------------
k = 10_000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 645 Β΅s
%timeit any(True for el in maps if k in el) # 284 Β΅s
%timeit any(map(contains, maps, repeat(k))) # 230 Β΅s
%timeit any_loop_(maps, k) # 279 Β΅s
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 304 Β΅s
%timeit any(map1(pred, maps)) # 251 Β΅s
# ------------------------------------
k = 100_000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps) # 5.5 ms
%timeit any(True for el in maps if k in el) # 2.8 ms
%timeit any(map(contains, maps, repeat(k))) # 2.3 ms
%timeit any_loop_(maps, k) # 2.9 ms
pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps)) # 3.0 ms
%timeit any(map1(pred, maps)) # 2.6 ms
Benchmark results:
- Suggested approach performs well for very short lists. Loop is still slightly faster for iterables of size lower than 5. (
any_loop_
function has extra function call (subtract ~50ns to compare). - Suggested approach performs best for medium size iterables - 5-100. (Or 25-100 if predicate can not be pre-stored)
- Suggested approach performs 2nd best for iterables larger than 100.
- One main drawback of suggested approach is that the construction of predicate is costly. The times above excludes that cost, which is ~150ns for this particular problem.
- Suggested approach successfully addressed the indicated problem of βefficient predicated short-circuiting for iterables lower than size 50 without using loopsβ. However, for very short sequences loops would still be most optimal.
Suggested improvements:
- First step to benefit from these findings would be to improve flexibility of
functools.partial
. This would allow efficiently constructing predicates from performant functions that exist in the standard library. Especially the ones inoperator
module. There have been many different attempts to implement more flexible variants ofpartial
, which proves that such functionality is desired. To name a few:- partiell Β· PyPI
- partial-apply Β· PyPI
- partial Β· PyPI
- better-partial Β· PyPI
All of them are implemented in pure python and do not offer desired performance. Proposed changes:
- A simple placeholder functionality would cover most of the needs.
- Ability to logically aggregate partial functions would cover 99% of cases. I.e.
predicate = partial(isinstance, _, str) | partial(lt, _, 5) & partial(gt, _, 2)
- Performance of
map
can be improved by implementing more basic version with 1 argument. It could also be worthwhile exploring whether switching to simplified behaviour if only 1 iterable was provided could benefit performance of existingmap
function.
Summary:
- The benefit of
functools.partial
improvement is extended functionality that enables more efficient usage of existing standard library functions (mainlyoperator
module). - Simplified
map
offers up to 20% better performance for 1-argument cases. - With both improvements implemented, suggested solution to the problem would provide best performance for a certain range of problems and at least 2nd best performance for problem of any size.
I agree with @pf_moore that the such proposal as it is has zero to none success. What I donβt agree with is that itβs not worth exploring this further. This would at least be an interesting exercise.
Although success of going in this direction is uncertain, honest exploration in this direction is beneficial. Results of such endeavour would be valuable regardless of the outcome as long as they bear any significant conclusions.
The way I see it, this is a valid idea. However it is very weak in a sense of 3 aspects:
- Readability & intuitiveness. It just doesnβt read well.
- Integration. It uses function names as keywords. Also, it is very similar to other comprehensions, while differs significantly in what it does. It could potentially be a big strain on a parser to differentiate which one it is (this is my guess, I donβt have much experience there. But this would need to be explored to put such proposal). Also, it would be confusing to the user. Given its functionality is different, its syntax would ideally be sufficiently distinct to not cause any confusions. To cover this properly, some familiarity with comprehension parsing (and with loop parsing) would be essential.
- Flexibility. Comprehension expression is a non-trivial construct. To justify similar introduction, the benefits of its functionality would need to outweigh the cost (which is big on many fronts).
As opposed to this problem:
def any_loop(maps, key):
for m in maps:
if key in m:
return True
return False
It would at least need to address a more general short-circuiting problem.
def short_cicuit(iterator, expression, n):
i = 0
for el in iterator:
if expression(el):
i += 1
if i > n:
break
return copysign(1, i - n)
any(key in m for m in maps): short_circuit(iterator, key in el, 1) > 0
all(key in m for m in maps): short_circuit(iterator, key not in el, 1) > 0
These would be my starting points. Interesting exercise IMO. It is difficult to have both flexible and performant short-circuiting tools. I know numpy
has explored some options and still there is nothing for it.
What I am looking at is improving performance and performant interactions of existing tools. But the scope of improvements is limited.
What you are suggesting has a much bigger potential, but the size of a task is equivalently larger with much greater uncertainty of the outcome.