# `repeat` compound statement

So the situation is: “repeating a block of code n times”.

These are the timings of different approaches.

Code
``````from itertools import repeat, cycle, count, islice

R_NONE = repeat(None)

def test0(n):
for _ in islice(R_NONE, n):
pass

def test1(n):
for _ in repeat(None, n):
pass

def test2(n):
for _ in range(n):
pass

def test3(n):
i = 0
while i < n:
i += 1

def test4(n):
while n:
n -= 1

def test5(n):
ct = count()
while next(ct) < n:
pass

ns = [1, 5, 10, 50, 100, 1000]
map = {
'irepeat': test0,
'repeat': test1,
'range': test2,
'while': test3,
'while-': test4,
'count': test5,
}
``````
``````┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃             5 repeats, 1,000 times                   ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ Units: ns     1     5    10      50     100     1000 ┃
┃           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃   irepeat ┃ 216   248   301     659   1,139    9,553 ┃
┃    repeat ┃ 214   239   273     541     876    7,054 ┃
┃     range ┃ 192   225   236     518     856   21,247 ┃
┃     while ┃  74   148   260   1,025   2,064   35,814 ┃
┃    while- ┃  76   135   232     903   1,741   31,311 ┃
┃     count ┃ 274   415   566   1,855   3,594   49,131 ┃
┗━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
``````

The problem is:

• There doesn’t seem to be one best method to do this
• For `n <= 10`, use `while`
• For `10 < n <= 100`, use `for _ in range`
• For `100 < n`, use `for _ in itertools.repeat(None, n)`

The question is:

• Could there be one without introducing new compound statement?
1 Like

Why would you care about performance at all when `n` is small? Of course you want the solution that scales well, and that would be `itertools.repeat`, which does not waste time creating any new int objects.

You can achieve similar performance with `range` if you stick to small ints but it’s gonna be ugly and needs extra loop nesting:

``````for _ in range(n // 256):
for _ in range(256):
pass
for _ in range(n % 256):
pass
``````
3 Likes

These timings only matter when your loop body is trivial. Otherwise, who cares about a nanosecond here or there? Write whichever one is the most readable, and move on.

9 Likes

Darn, I thought I could just add another way and run the benchmark, but the benchmarking code is missing…

1 Like

Apologies for this, but my timers have a lot of local dependencies. You can share the code and I will add it.

It could be inside another loop with many iterations.

3 Likes

Have you never done something small often?

If your repeats are small, it’s not just “a nanosecond here or there” but up to over 100 ns per element.

3 Likes

Some ideas:

``````for _ in (0,) * n:
pass

for _ in [0] * n:
pass

for _ in b'.' * n:
pass

n += 0.0
while n:
n -= 1.0
``````
1 Like

Ben’s - RangeBH
Stefan’s - rangeSt (`b'.' * n`)

its a ratio - divided by simple `range(n)`

1 Like

(I’m going to ignore the n == 1 case as it’s almost completely irrelevant (you don’t find yourself writing heaps of loops in tight places where you need to iterate once).)

The most obvious way to write this is `for _ in range(n)`, so I’ll use that as the baseline. The fastest option for small loops is either that, or a while loop, with at most about 18ns per iteration difference. At larger loop counts, other options start to become better, but still with about the same performance advantage - 28ns per iteration. So, how significant is that? How much can you do in 28ns?

Note that from here on, the numbers will be different, since I’m running this on my computer rather than using the table given. I took two of the test cases and added a couple that actually put some work into the loop.

``````from itertools import repeat, cycle, count, islice

def test1(n):
for _ in repeat(None, n):
pass

def test2(n):
for _ in range(n):
pass

def test3(n):
x = 1
for _ in repeat(None, n):
x = x * 2 - 1

def dummy(): pass

def test4(n):
for _ in repeat(None, n):
dummy()
``````

Results:

``````rosuav@sikorsky:~/tmp\$ python3 -m timeit -s 'from timings import test1' 'test1(1000)'
100000 loops, best of 5: 2.97 usec per loop
rosuav@sikorsky:~/tmp\$ python3 -m timeit -s 'from timings import test2' 'test2(1000)'
50000 loops, best of 5: 5.45 usec per loop
rosuav@sikorsky:~/tmp\$ python3 -m timeit -s 'from timings import test3' 'test3(1000)'
50000 loops, best of 5: 8.91 usec per loop
rosuav@sikorsky:~/tmp\$ python3 -m timeit -s 'from timings import test4' 'test4(1000)'
20000 loops, best of 5: 10.7 usec per loop
``````

The difference between the fastest option given and the simplest to read is about 2.5ns per loop. (My computer’s notably faster than the OP’s, where the difference was about ten times that, but I expect that the relative differences will be consistent.) And the difference between that fast one with `pass` as its body, and the same loop header with some very simple arithmetic as the body, is about 6ns per loop. It’s even more obvious with a function call, where you can see three times as much difference between “do nothing” and “call functino that does nothing” as there is between the types of loop headers.

So my conclusion is: It does not matter. The time difference is meaningless. Write the most obvious code: `for _ in range(n):` as it is fast enough.

You can always make microbenchmarks to prove anything you like. But it needs to be put into perspective, otherwise you end up in the position of ranking all cigarettes by how much of each toxin they have, proving that whichever one has the lowest must be the healthiest. [1] Make sure your numbers actually show something useful, otherwise it’s not worth sacrificing readability for a difference you won’t even notice.

1. Yes, that actually happened, and the company in question ran with it in their ads. The actual study was trying to show that they were all equally bad, but numbers being numbers, one of them is bound to be the lowest. ↩︎

1 Like

Side point: Don’t do this. Please. Make something we can actually run - science is repeatable. The standard library contains a `timeit` module that will serve this purpose very well (see my example). Even if you use your own internal timing library for your initial research, publish something that we can actually test out. Ideally, I should be able to run your exact code completely unchanged, and see the exact same (relative) differences that you see; that’s always the first step, to calibrate the unavoidable differences between our computers.

Plus, that way, anyone can quickly and easily test your results against different CPython versions, different interpreters (how does this go in PyPy, for example?), or different CPU architectures. Hiding i all behind unpublished code forces us to reinvent that ourselves, and then hope that it’s actually comparable.

1 Like

This one is so far the best. Would be nice to get 1-5 tail down - it does become a bottleneck to me from time to time. But not sure if that is possible without implementing `repeat` statement.

Also:

``````array.array('i', [0]) * 10_000
# is much faster than
[0] * 10_000
``````

That can be helpful from time to time, but for this case `bytes` is even faster.

1 Like

Discourse uses some Bayesian filters, etc. to flag spam and abuse automatically. Maybe it got hung up on your first word there, we don’t have visibility into that. I restored the post.

2 Likes

Then why does `timeit` use `itertools.repeat` instead?

1 Like

Unusual situations call for unusual considerations. If I had to guess, it would be that you could disrupt the timings by allocating and deallocating integer objects, but using `itertools.repeat` ensures that it’s reusing the same object every time. This is highly unusual and applicable only to that specific situation. I highly doubt that it’s simply “because it’s faster” - if that were the goal, `timeit` itself would have been written in C.

2 Likes

Now I’m curious how that would be done and whether it would actually be faster. I can even imagine it being slower, due to the back-and-forth between C and Python code. The `itertools.repeat` overhead per iteration in Python code seems to only be around 10 ns (i.e., when I timeit just `'pass'`).

2 Likes

Here is an example, which exposes the significance of this from my POV.

These situations from my experience mostly happen in graph / trie frameworks or anything similar that needs to traverse parents (e.g. python frames). The issue is that such frameworks are building components that should and will be used for different problems. Sometimes it has a depth of 1, sometimes 3, sometimes 1000.

Below it can be seen that difference can be 20-60% in favour of `while` and 17-28% in favour of `repeat` (for selected functions that are called in loops).

Is it a lot? In a grand scheme of things nothing major. But let’s say this constitutes 1/3 of total run time for let’s say a query. So 15 improvements like that might double the total speed.

Also, I would like to think that “There should be one-- and preferably only one --obvious way to do it.” still matters.

And finally, compound repeat statement would be faster than any of the methods examined for any number of repetitions as all of the mentioned methods have unnecessary overhead.

``````import string
from itertools import repeat
from os.path import dirname

PATHS = ['/'.join(string.ascii_letters[i:i+10]) for i in range(20)]

def get_parents_while(paths, lvl=3, func=dirname):
result = list()
for path in paths:
n = lvl
while n:
n -= 1
path = func(path)
result.append(path)
return result

def get_parents_range(paths, lvl=3, func=dirname):
result = list()
for path in paths:
for _ in range(lvl):
path = func(path)
result.append(path)
return result

def get_parents_repeat(paths, lvl=3, func=dirname):
result = list()
for path in paths:
for _ in repeat(None, lvl):
path = func(path)
result.append(path)
return result

func = dirname                                  # 350ns / call
%timeit get_parents_while(PATHS, 1)             # 13.3µs
%timeit get_parents_range(PATHS, 1)             # 14.8µs (+11%)
%timeit get_parents_repeat(PATHS, 1)            # 16.6µs (+24%)

%timeit get_parents_while(PATHS, 3)             # 38.6µs
%timeit get_parents_range(PATHS, 3)             # 39.2µs (+ 2%)
%timeit get_parents_repeat(PATHS, 3)            # 40.8µs (+ 6%)

%timeit get_parents_while(PATHS, 1000)          # 8.1ms (+17%)
%timeit get_parents_range(PATHS, 1000)          # 7.3ms (+ 6%)
%timeit get_parents_repeat(PATHS, 1000)         # 6.9ms

func = lambda x: x.rsplit('/', 1)[0]            # 130ns / call
%timeit get_parents_while(PATHS, 1, func)       # 4.7µs
%timeit get_parents_range(PATHS, 1, func)       # 6.5µs (+38%)
%timeit get_parents_repeat(PATHS, 1, func)      # 7.6µs (+60%)

%timeit get_parents_while(PATHS, 3, func)       # 12.6µs
%timeit get_parents_range(PATHS, 3, func)       # 13.8µs (+ 9%)
%timeit get_parents_repeat(PATHS, 3, func)      # 14.8µs (+17%)

%timeit get_parents_while(PATHS, 1000, func)     # 3.2ms (+28%)
%timeit get_parents_range(PATHS, 1000, func)     # 2.9ms (+16%)
%timeit get_parents_repeat(PATHS, 1000, func)    # 2.5ms
``````

So ok, the following is very inconvenient, but so far the fastest solution.

``````RANGES512 = [tuple(range(i)) for i in range(512)]
RNG512 = tuple(range(512))

def f_dgpb2(n):
if n < 4:
while n:
n -= 1
pass
return
try:
for _ in RANGES512[n]:
pass
except IndexError:
rngs = [RANGES512[n % 512]] + [RNG512] * (n // 512)
for rng in rngs:
for _ in rng:
pass
``````

There is one obvious way to do it, which is `for i in range(n)`. It’s just that trying to squeeze every possible nanosecond out of things isn’t generally part of “doing the obvious thing” in Python.

3 Likes

Try replacing `tuple(range(i))` with `tuple(j % 4 for j in range(i))` and `tuple(range(512))` with `tuple(range(4)) * 128`.

2 Likes