# Builtins.any performance

This is not an idea. I would just like to share some information and a graph. And find out what others think.

The reason why I chose to spend more time on this is that this is one of very few cases that I have seen where loop ends up to be the most performant solution.

I have done some comparisons of different approaches to use `any` on sequence which needs to be mapped with operation.

I was looking at `ChainMap.__contains__` performance and I think it is a good example case where performance of `any` function for short sequences is important.

Tested functions:

``````def any_base(maps, key):
return any(key in el for el in maps)

def any_true(maps, key):
return any(True for el in maps if key in el)

def any_map(maps, key):
return any(map(contains, maps, repeat(key)))

def any_loop(maps, key):
for m in maps:
if key in m:
return True
return False
``````

Benchmark numbers are relative to the base case. (Disregard time units)

Interesting fact. For 1M size `any_true` is most performant although it is almost as bad as the base case for small sizes.

So it is just a bit of food for thought. Is it possible to improve performance of any above solutions in C? Do they need to be improved at all? Is it possible to beat simple loop for small-size problems?

1 Like

I feel like we could replace most occurences of â€śany_baseâ€ť in the stdlib with â€śany_trueâ€ť.

For `all()` it looks like this (not sure if thatâ€™s faster):

``````def all_false(maps, key):
return all(False for el in maps if key not in el)
``````

If `any()` was a keyword, this wouldnâ€™t have been an issue, but thatâ€™s not happening.

Added one more data point with 1M elements. Fewer repetitions, so not as accurate as the one above. It seems that `any_true` performs best for large sequences.

For the problem on stack `any(el > 0 for el in iterable)` `any_true` outperforms others for 100K, but is in line with others for 1M. So results are not conclusive for large sizes.

You call `any()` with different arguments. It is not surprising that it has different performance (and may have different result) for different input. The difference lies outside of `any()`.

Of course, you can make `any()` detecting that the argument is an instance of GeneratorType, analyzing the bytecode of the corresponding generator function and creating a modified generator function and object. It will improve the performance by few percents for large input in this particular case. But it will make it slower in all other cases, not counting the maintenance cost and the risk of affecting other parts of the interpreter.

1 Like

I recognise that. I am not emphasising `any`, but looking at possible ways to come up with more performant solution to the indicated problem (regardless whether it has anything to do with implementation of `any` or not).

Would it be possible to get performance of a loop if joint function was made out of `map` and `any`. With a signature of `map` but implicitly doing `any`?:

``````def any_map(func, *args):
``````

Or alternatively, maybe using `any(map())` is fine with more performant predicates?
e.g.:

``````a = [0] * 10 + [1]
%timeit any(map(bool, a))  # 279 ns
%timeit any(map(opr.gt, a, itl.repeat(0)))  # 505 ns
``````

Maybe there exists a way of pre-compiling `gt_0=lambda x: x > 0` that would bring performance closer to the `any(map(bool`?

`any` and `all` are builtin function names, I donâ€™t think there is a chance that they become keywords.

Personally, I donâ€™t have issues with syntax of any of the above (except the loop ), more concerned with finding a way to improve performance of what exists.

I wonder if next with a default of False would be faster? Something like

``````def next_impl(maps, key):
return next((True for el in maps if key in el), False)
``````

After all, the comparison done at each step by `any` is unnecessary.

Its speed is in line with `any_true`.

Ohhhh right, â€śat each stepâ€ť was the clue, thereâ€™s only ever one unnecessary comparison anyway so barely anything to save. Is it any significant win at the few-entries end of the scale where that cost isnâ€™t amortised?

Added your proposed function and function where it checks containment in C directly:

``````static PyObject*
any_key_c(PyObject* self, PyObject* args)
{
PyObject* func;
PyObject* iterable;
PyObject* func_arg = NULL;

if (!PyArg_ParseTuple(args, "OOO", &func, &iterable, &func_arg))
return NULL;
PyObject *it = PyObject_GetIter(iterable);
PyObject *item;
if (!it) {
Py_DECREF(it);
return NULL;
}
while ((item = Py_TYPE(it)->tp_iternext(it))) {
if (item == NULL)
break;
if (PySequence_Contains(item, func_arg)){
Py_DECREF(it);
Py_RETURN_TRUE;
}
Py_DECREF(item);
}
Py_DECREF(it);
if (PyErr_Occurred()) {
if (PyErr_ExceptionMatches(PyExc_StopIteration))
PyErr_Clear();
else
return NULL;
}
Py_RETURN_FALSE;
}
``````

1 Like

This feels more like itâ€™s to do with the performance characteristics of `ChainMap` (which I imagine isnâ€™t optimised for this use case). Do other containers show similar results?

Yes, it is very specific to `ChainMap`. But I am just using this example to look into performance of â€śpredicated short-circuitingâ€ť.

It is not the first time I am replacing `any` with `for loop`. This becomes highly preferable for sequences of small sizes (up to 50 elements).

Just to give another example. Graph building. There is a node, which contains other nodes of different types. It is often needed to check if all sub-nodes are of the same type.

``````class Node:
def __init__(self, func, args):
self.func = func
self.args = args

...
if all(isinstance(a, Atom) for a in node.args):
#do something simple
....
``````

Same as in case of a `ChainMap`, it has significant impact when such operation is done many times. Every time `__contains__` is called in case of `ChainMap` and recursive operations on graphs in above example.

One thing I can think of: 2 steps that could improve performance of `any_map` solution, which I quite like as it does fairly ok for any size now:

1. Implement more efficient map with only 1 argument. So that there is less construction overhead.
2. Implement `partial` in C. Also argument skipping would be needed. I have implemented EMPTY placeholders for partial in python already - useful feature, which I got from partial Â· PyPI and some other package. From my experience `EMPTY` placeholder is much more useful than `Ellipsis` of `partiell` package.

Result:

``````pred = c_partial(opr.contains, EMPTY, key)
any(one_arg_map(pred, maps)
``````

I donâ€™t believe that. Doesnâ€™t make sense. And I canâ€™t reproduce it:

``````41.8 ms  any(el > 0 for el in iterable)
18.7 ms  any(True for el in iterable if el > 0)
40.4 ms  any(el > 0 for el in iterable)
19.9 ms  any(True for el in iterable if el > 0)
40.5 ms  any(el > 0 for el in iterable)
24.4 ms  any(True for el in iterable if el > 0)
``````
Script

# Python, 265 bytes

``````from timeit import repeat

setup = 'iterable = [0] * 1_000_000 + [1]'

for code in [
'any(el > 0 for el in iterable)',
'any(True for el in iterable if el > 0)'
] * 3:
t = min(repeat(code, setup, number=1, repeat=25))
print(f'{t*1e3:4.1f} ms ', code)
``````

Attempt This Online!

How did you measure and what were your results?

1 Like

Apologies for confusion, by â€śothersâ€ť I meant all other solutions except base case - `any(el > 0 for el in iterable)`, which I already assumed to be by far the worst for any size.

Anything is technically possible, but I donâ€™t think this is likely to get accepted. At an absolute minimum you would need to:

1. Define the semantics much more precisely. At the moment all you have is a vague example. Imagine you were going to implement this and write the spec for what it would do and how it would work.
2. Research the history. There have been many discussions in the past about extending generator expressions and for loops. Read those, and understand why the various proposals didnâ€™t get accepted. Work out why your proposal is different, and what about it addresses the problems that caused previous proposals to fail.
3. Find real world examples where your proposal would offer genuine benefits. Remember that the original post in this thread was explicitly presented as some interesting performance results, and not as an actual problem needing a solution.

Even if you do all of this (and itâ€™s a lot of work - expect to spend days working on it before you have anything worth posting) the likelihood is that youâ€™ll get very little enthusiasm for the idea, and it will ultimately not get accepted (especially if you are not able or willing to implement it yourself, which would involve getting familiar with the interpreter code). If thatâ€™s not something you are OK with, my advice is to drop the idea.

Also, remember that anyone who reads your proposal and comments on it is spending their free time on it. Itâ€™s not just your time that gets spent on a proposal.

Personally, I think you should drop the idea. The OPâ€™s results sound more like an interesting exercise in optimising Python code, and not something that demands a new language feature.

11 Likes

Your graphs show a difference of about 10% between the worst and best variants. You can get rid of some overhead of Python interpretator by rewriting your code in C. It is boring and error prone (in your other example you forgot that `PySequence_Contains()` can fail), so it is better to use some Python compiler, like Cython or Nuitka.

But thereâ€™s no way a 10% increase is enough to add a new highly specialized built-in function to Python.

2 Likes

Yeah, I know, I saw that `operator.contains` has some overhead to address this.

That is for the large size iterables.
For iterable of size 1, max - min = 40%, size 10: 30%, size 50: 25% to base. And this is pretty much the area I am addressing.

Yeah, I am not offering to introduce specialized version for this. I have only checked its performance to see the theoretical lower bound.

But for an iterable of size 1, no one cares about the relative performance because the time to do one iteration is already negligible.

See my earlier reply: Builtins.any performance - #15 by dgrigonis