Builtins.any performance

Stefan2 · May 12, 2024, 8:24pm

Not if your code does that a lot.

dg-pb · May 12, 2024, 9:36pm

I am out of ideas for now. Explored what I could with my current knowledge.

Will keep using loops for short sequences and any(map(pred, *args_with_needed_repeats)) for long sequences.

But will keep this in mind. If anyone has any ideas, I am open - it would be nice to have non-loop instruments that work efficiently for problems of all sizes.

dg-pb · May 12, 2024, 10:29pm

Final results that I ended up with. any_mapc is a joint any and map python extension with 1 non-iterable extra arg to map’s function.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                  10 repeats, 1,000 times                        ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ Units: ns        0        5       10       50      100     1000 ┃
┃           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃  any_base ┃    630     1632     2621    10463    20204   196625 ┃
┃  any_true ┃    719     1500     2328     9017    17323   174015 ┃
┃   any_map ┃    325     1189     1997     8671    17061   173005 ┃
┃  any_loop ┃    134      969     1731     8798    17025   173039 ┃
┃ next_impl ┃    685     1521     2315     8857    17226   172828 ┃
┃  any_mapc ┃    173     1076     1969     9163    18391   186403 ┃
┗━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

dg-pb · May 13, 2024, 8:20am

Made a bit more progress and have an ok picture now. I tested my idea above regarding the use of partial and map version with only 1 arg, which removes a bit of overhead.

What has been done:

To make use of existing functools.partial I implemented contains_rev function with arguments swapped.
Modified map (named map1), which only accepts 1 iterable. Removing some complexity improved its performance.

Results:

def any_loop_(maps, key):
    for m in maps:
        if key in m:
            return True
    return False

# ------------------------------------
k = 0
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)             # 611
%timeit any(True for el in maps if k in el)     # 610
%timeit any(map(contains, maps, repeat(k)))     # 358
%timeit any_loop_(maps, k)                      # 159

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                    # 160
%timeit any(map1(pred, maps))                   # 146

# ------------------------------------
k = 5
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)             # 922
%timeit any(True for el in maps if k in el)     # 821
%timeit any(map(contains, maps, repeat(k)))     # 514
%timeit any_loop_(maps, k)                      # 321

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                    # 333
%timeit any(map1(pred, maps))                   # 281

# ------------------------------------
k = 10
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 1300
%timeit any(True for el in maps if k in el) #  906
%timeit any(map(contains, maps, repeat(k))) #  634
%timeit any_loop_(maps, k)                  #  471

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                #  456
%timeit any(map1(pred, maps))               #  405

# ------------------------------------
k = 25
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 2200
%timeit any(True for el in maps if k in el) # 1150
%timeit any(map(contains, maps, repeat(k))) # 1000
%timeit any_loop_(maps, k)                  #  924

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                #  930
%timeit any(map1(pred, maps))               #  795

# ------------------------------------
k = 50
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 4.0 µs
%timeit any(True for el in maps if k in el) # 2.0 µs
%timeit any(map(contains, maps, repeat(k))) # 1.5 µs
%timeit any_loop_(maps, k)                  # 1.6 µs

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                # 1.7 µs
%timeit any(map1(pred, maps))               # 1.4 µs

# ------------------------------------
k = 100
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 6.0 µs
%timeit any(True for el in maps if k in el) # 3.3 µs
%timeit any(map(contains, maps, repeat(k))) # 2.6 µs
%timeit any_loop_(maps, k)                  # 2.9 µs

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                # 3.2 µs
%timeit any(map1(pred, maps))               # 2.6 µs

# ------------------------------------
k = 1000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 63 µs
%timeit any(True for el in maps if k in el) # 29 µs
%timeit any(map(contains, maps, repeat(k))) # 23 µs
%timeit any_loop_(maps, k)                  # 28 µs

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                # 30 µs
%timeit any(map1(pred, maps))               # 26 µs

# ------------------------------------
k = 10_000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 645 µs
%timeit any(True for el in maps if k in el) # 284 µs
%timeit any(map(contains, maps, repeat(k))) # 230 µs
%timeit any_loop_(maps, k)                  # 279 µs

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                # 304 µs
%timeit any(map1(pred, maps))               # 251 µs

# ------------------------------------
k = 100_000
maps = [{}] * k + [{k: k}]
%timeit any(k in el for el in maps)         # 5.5 ms
%timeit any(True for el in maps if k in el) # 2.8 ms
%timeit any(map(contains, maps, repeat(k))) # 2.3 ms
%timeit any_loop_(maps, k)                  # 2.9 ms

pred = ftl.partial(contains_rev, k)
%timeit any(map(pred, maps))                # 3.0 ms
%timeit any(map1(pred, maps))               # 2.6 ms

Benchmark results:

Suggested approach performs well for very short lists. Loop is still slightly faster for iterables of size lower than 5. (any_loop_ function has extra function call (subtract ~50ns to compare).
Suggested approach performs best for medium size iterables - 5-100. (Or 25-100 if predicate can not be pre-stored)
Suggested approach performs 2nd best for iterables larger than 100.
One main drawback of suggested approach is that the construction of predicate is costly. The times above excludes that cost, which is ~150ns for this particular problem.
Suggested approach successfully addressed the indicated problem of “efficient predicated short-circuiting for iterables lower than size 50 without using loops”. However, for very short sequences loops would still be most optimal.

Suggested improvements:

First step to benefit from these findings would be to improve flexibility of functools.partial. This would allow efficiently constructing predicates from performant functions that exist in the standard library. Especially the ones in operator module. There have been many different attempts to implement more flexible variants of partial, which proves that such functionality is desired. To name a few:
- partiell · PyPI
- partial-apply · PyPI
- partial · PyPI
- better-partial · PyPI
  All of them are implemented in pure python and do not offer desired performance. Proposed changes:
1. A simple placeholder functionality would cover most of the needs.
2. Ability to logically aggregate partial functions would cover 99% of cases. I.e. predicate = partial(isinstance, _, str) | partial(lt, _, 5) & partial(gt, _, 2)
Performance of map can be improved by implementing more basic version with 1 argument. It could also be worthwhile exploring whether switching to simplified behaviour if only 1 iterable was provided could benefit performance of existing map function.

Summary:

The benefit of functools.partial improvement is extended functionality that enables more efficient usage of existing standard library functions (mainly operator module).
Simplified map offers up to 20% better performance for 1-argument cases.
With both improvements implemented, suggested solution to the problem would provide best performance for a certain range of problems and at least 2nd best performance for problem of any size.

dg-pb · May 13, 2024, 9:29pm

Nice Zombies:

How about for any ... in ... & for all ... in ...? Currently this is a syntax error.

# SyntaxError: invalid syntax. Perhaps you forgot a comma?
print(i == 99 for any i in range(100))
# SyntaxError: invalid syntax. Perhaps you forgot a comma?
print(i >= 0 for all i in range(100))

I agree with @pf_moore that the such proposal as it is has zero to none success. What I don’t agree with is that it’s not worth exploring this further. This would at least be an interesting exercise.

Although success of going in this direction is uncertain, honest exploration in this direction is beneficial. Results of such endeavour would be valuable regardless of the outcome as long as they bear any significant conclusions.

The way I see it, this is a valid idea. However it is very weak in a sense of 3 aspects:

Readability & intuitiveness. It just doesn’t read well.
Integration. It uses function names as keywords. Also, it is very similar to other comprehensions, while differs significantly in what it does. It could potentially be a big strain on a parser to differentiate which one it is (this is my guess, I don’t have much experience there. But this would need to be explored to put such proposal). Also, it would be confusing to the user. Given its functionality is different, its syntax would ideally be sufficiently distinct to not cause any confusions. To cover this properly, some familiarity with comprehension parsing (and with loop parsing) would be essential.
Flexibility. Comprehension expression is a non-trivial construct. To justify similar introduction, the benefits of its functionality would need to outweigh the cost (which is big on many fronts).

As opposed to this problem:

def any_loop(maps, key):
    for m in maps:
        if key in m:
            return True
    return False

It would at least need to address a more general short-circuiting problem.

def short_cicuit(iterator, expression, n):
    i = 0
    for el in iterator:
        if expression(el):
            i += 1
            if i > n:
                break
    return copysign(1, i - n)

any(key in m for m in maps): short_circuit(iterator, key in el, 1) > 0
all(key in m for m in maps): short_circuit(iterator, key not in el, 1) > 0

These would be my starting points. Interesting exercise IMO. It is difficult to have both flexible and performant short-circuiting tools. I know numpy has explored some options and still there is nothing for it.

What I am looking at is improving performance and performant interactions of existing tools. But the scope of improvements is limited.

What you are suggesting has a much bigger potential, but the size of a task is equivalently larger with much greater uncertainty of the outcome.