Ok, here are results and my own solutions. I modified the benchmark script slightly, it’s still the same benchmark, but run several times.
1.67 1.66 1.71 hprodh_numpy_accumulate
1.76 1.81 1.79 eendebakpt_numpy_accumulate
2.62 2.63 2.67 eendebakpt_numpy_compress
3.60 3.59 3.62 Stefan2_nested_loops_float
3.85 3.86 3.90 Stefan2_nested_loops
3.94 3.97 3.97 Stefan2_float
4.09 4.09 4.09 Stefan2_hi_lo
4.13 4.15 4.12 eendebakpt_lengthhint_for
4.16 4.17 4.16 eendebakpt_lengthhint
4.30 4.31 4.30 Stefan2_pairs_even
4.38 4.42 4.43 Stefan2_loop
4.41 4.41 4.43 eendebakpt_loop
4.93 4.94 4.99 kapinga_loop
8.02 8.08 8.09 Stefan2_accumulate
Python: 3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]
Full script and output are at the end of this post.
Now my solutions:
Straightforward loop
My own starting point was the straightforward loop as well:
def maxes_Stefan2_loop(xs):
count = 0
X = 0
for x in xs:
if x > X:
X = x
count += 1
return count
The for x in xs
and if x > X
are rather essential and the X = x
is fast, but the count += 1
has room for improvement.
Floats are faster than ints
This one is dirty, as it can “only” count up to 253 = 9,007,199,254,740,992.
def maxes_Stefan2_float(xs):
count = 0.
X = 0
for x in xs:
if x > X:
X = x
count += 1.
return int(count)
Small ints (-5 to 256) are faster
CPython has them cached and doesn’t create new int objects for them. So I use two counters: one somewhat for the low 8 bits, and one for the higher bits which updates only rarely. The low counter goes backwards because it’s fast to check whether it’s 0.
def maxes_Stefan2_hi_lo(xs):
X = 0
hi = lo = 256
for x in xs:
if x > X:
X = x
lo -= 1
if not lo:
lo = 256
hi += 256
return hi - lo
Update counter only every second time
Count not the individual maxes but pairs of them, so we create a new int object only half as often. Here I use a bool flag to know whether I have seen an even or odd number of maxes.
def maxes_Stefan2_pairs_even(xs):
X = 0
pairs, even = 0, True
for x in xs:
if x > X:
X = x
if even:
even = False
else:
pairs += 1
even = True
return 2 * pairs + (not even)
Two states via nested loops
Again count only every second max. My two loops represent two program states: In the first=outer loop, X
is the max so far and I’m looking for an even larger x
. In the second=inner loop, x
is the max so far and I’m looking for an even larger X
. Whenever a new max is found, I switch to the other state by running into or dropping out of the inner loop. Besides doing count +=
only half as often, I also don’t need an X = x
anymore.
def maxes_Stefan2_nested_loops(xs):
X = 0
count = 0
xs = iter(xs)
for x in xs:
if x > X:
for X in xs:
if X > x:
count += 2
break
else:
return count + 1
return count
I’ve also used this “two states via nested loops” for a few other things, for example for merging two sorted iterables (see the merge_nested_states
function in the code there).
Same but with floats
Just dirty with floats again, and since I count only pairs, it works up to 254.
def maxes_Stefan2_nested_loops_float(xs):
X = 0
pairs = 0.
xs = iter(xs)
for x in xs:
if x > X:
for X in xs:
if X > x:
pairs += 1.
break
else:
return 2 * int(pairs) + 1
return 2 * int(pairs)
Pretty
from itertools import accumulate
def maxes_Stefan2_accumulate(xs):
return len(set(accumulate(xs, max)))
Full script and output
Script
def maxes_kapinga_loop(xs):
n = 1
x_max = xs[0]
for x in xs[1:]:
if x > x_max:
n += 1
x_max = x
return n
def maxes_eendebakpt_loop(xs):
n = 0
x_max = -1
for x in xs:
if x > x_max:
n += 1
x_max = x
return n
import array
import numpy as np
m3 = np.array([-1, -1, -1], dtype=np.int32)
def maxes_eendebakpt_numpy_compress(xs):
a = np.array(array.array('I', xs)).astype(np.int32)
prev = -1
while True:
bb = np.concatenate( (m3, a))
good = a > bb[2:-1]
np.logical_and(a > bb[1:-2], good, out=good)
np.logical_and(a > bb[:-3], good, out=good)
a = a.compress(good)
if a.size == prev:
return a.size
prev = a.size
import numpy as np
import numpy as np
def maxes_eendebakpt_numpy_accumulate(xs):
a = np.array(array.array('I', xs))
b = np.maximum.accumulate(a)
return (np.diff(b)>0).sum() + (a.size>0)
import itertools
def maxes_eendebakpt_lengthhint(xs):
x_max = -1
c = itertools.repeat(None, len(xs))
for x in xs:
if x > x_max:
next(c)
x_max = x
return len(xs) - c.__length_hint__()
def maxes_eendebakpt_lengthhint_for(xs):
x_max = -1
c = itertools.repeat(None, len(xs))
for x in xs:
if x > x_max:
for _ in c: break
x_max = x
return len(xs) - c.__length_hint__()
def maxes_hprodh_numpy_accumulate(xs):
a = np.array(array.array('I', xs))
b = np.maximum.accumulate(a)
return np.count_nonzero(np.diff(b)) + (a.size > 0)
def maxes_Stefan2_loop(xs):
count = 0
X = 0
for x in xs:
if x > X:
X = x
count += 1
return count
def maxes_Stefan2_float(xs):
count = 0.
X = 0
for x in xs:
if x > X:
X = x
count += 1.
return int(count)
def maxes_Stefan2_hi_lo(xs):
X = 0
hi = lo = 256
for x in xs:
if x > X:
X = x
lo -= 1
if not lo:
lo = 256
hi += 256
return hi - lo
def maxes_Stefan2_pairs_even(xs):
X = 0
pairs, even = 0, True
for x in xs:
if x > X:
X = x
if even:
even = False
else:
pairs += 1
even = True
return 2 * pairs + (not even)
def maxes_Stefan2_nested_loops(xs):
X = 0
count = 0
xs = iter(xs)
for x in xs:
if x > X:
for X in xs:
if X > x:
count += 2
break
else:
return count + 1
return count
def maxes_Stefan2_nested_loops_float(xs):
X = 0
pairs = 0.
xs = iter(xs)
for x in xs:
if x > X:
for X in xs:
if X > x:
pairs += 1.
break
else:
return 2 * int(pairs) + 1
return 2 * int(pairs)
from itertools import accumulate
def maxes_Stefan2_accumulate(xs):
return len(set(accumulate(xs, max)))
funcs = [
maxes_kapinga_loop,
maxes_eendebakpt_loop,
maxes_eendebakpt_numpy_compress,
maxes_eendebakpt_numpy_accumulate,
maxes_eendebakpt_lengthhint,
maxes_eendebakpt_lengthhint_for,
maxes_hprodh_numpy_accumulate,
maxes_Stefan2_loop,
maxes_Stefan2_float,
maxes_Stefan2_hi_lo,
maxes_Stefan2_pairs_even,
maxes_Stefan2_nested_loops,
maxes_Stefan2_nested_loops_float,
maxes_Stefan2_accumulate,
]
from itertools import accumulate
from time import perf_counter as time
from statistics import mean, stdev
import random
import sys
# Reproducibility
seed = random.randrange(10**9)
print(f'{seed=}')
random.seed(seed)
# Correctness
for _ in range(10):
xs = tuple(accumulate(random.choices(range(-1, 3), k=10**5), initial=10**6))
expect = funcs[0](xs)
for f in funcs[1:]:
result = f(xs)
if result != expect:
raise Exception(f'{f.__name__} counted {result} instead of {expect}.')
# Speed
timess = {f: '' for f in funcs}
for i in range(5):
xs = tuple(accumulate(random.choices(range(-1, 3), k=10**5), initial=10**6))
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):5.2f} ± {stdev(ts):4.2f} ms '
for _ in range(100):
random.shuffle(funcs)
for f in funcs:
t = time()
f(xs)
times[f].append(time() - t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__.removeprefix('maxes_'))
if i >= 2:
timess[f] += stats(f)[:5]
print()
for f in sorted(funcs, key=stats):
print(timess[f], f.__name__.removeprefix('maxes_'))
print('\nPython:', sys.version)
Attempt This Online!
Output
seed=185826456
1.72 ± 0.04 ms hprodh_numpy_accumulate
1.85 ± 0.03 ms eendebakpt_numpy_accumulate
2.74 ± 0.01 ms eendebakpt_numpy_compress
3.62 ± 0.01 ms Stefan2_nested_loops_float
3.86 ± 0.01 ms Stefan2_nested_loops
3.95 ± 0.02 ms Stefan2_float
4.11 ± 0.01 ms Stefan2_hi_lo
4.13 ± 0.01 ms eendebakpt_lengthhint_for
4.14 ± 0.04 ms eendebakpt_lengthhint
4.29 ± 0.02 ms Stefan2_pairs_even
4.38 ± 0.03 ms Stefan2_loop
4.42 ± 0.02 ms eendebakpt_loop
4.98 ± 0.04 ms kapinga_loop
8.18 ± 0.05 ms Stefan2_accumulate
1.69 ± 0.03 ms hprodh_numpy_accumulate
1.77 ± 0.04 ms eendebakpt_numpy_accumulate
2.64 ± 0.04 ms eendebakpt_numpy_compress
3.60 ± 0.01 ms Stefan2_nested_loops_float
3.88 ± 0.01 ms Stefan2_nested_loops
3.95 ± 0.02 ms Stefan2_float
4.08 ± 0.01 ms Stefan2_hi_lo
4.11 ± 0.02 ms eendebakpt_lengthhint_for
4.15 ± 0.02 ms eendebakpt_lengthhint
4.30 ± 0.01 ms Stefan2_pairs_even
4.39 ± 0.01 ms Stefan2_loop
4.40 ± 0.02 ms eendebakpt_loop
4.94 ± 0.04 ms kapinga_loop
7.98 ± 0.09 ms Stefan2_accumulate
1.67 ± 0.03 ms hprodh_numpy_accumulate
1.76 ± 0.02 ms eendebakpt_numpy_accumulate
2.62 ± 0.02 ms eendebakpt_numpy_compress
3.60 ± 0.02 ms Stefan2_nested_loops_float
3.85 ± 0.02 ms Stefan2_nested_loops
3.94 ± 0.01 ms Stefan2_float
4.09 ± 0.02 ms Stefan2_hi_lo
4.13 ± 0.01 ms eendebakpt_lengthhint_for
4.16 ± 0.01 ms eendebakpt_lengthhint
4.30 ± 0.03 ms Stefan2_pairs_even
4.38 ± 0.02 ms Stefan2_loop
4.41 ± 0.02 ms eendebakpt_loop
4.93 ± 0.06 ms kapinga_loop
8.02 ± 0.01 ms Stefan2_accumulate
1.66 ± 0.01 ms hprodh_numpy_accumulate
1.81 ± 0.02 ms eendebakpt_numpy_accumulate
2.63 ± 0.02 ms eendebakpt_numpy_compress
3.59 ± 0.02 ms Stefan2_nested_loops_float
3.86 ± 0.03 ms Stefan2_nested_loops
3.97 ± 0.02 ms Stefan2_float
4.09 ± 0.02 ms Stefan2_hi_lo
4.15 ± 0.02 ms eendebakpt_lengthhint_for
4.17 ± 0.01 ms eendebakpt_lengthhint
4.31 ± 0.02 ms Stefan2_pairs_even
4.41 ± 0.01 ms eendebakpt_loop
4.42 ± 0.01 ms Stefan2_loop
4.94 ± 0.05 ms kapinga_loop
8.08 ± 0.05 ms Stefan2_accumulate
1.71 ± 0.01 ms hprodh_numpy_accumulate
1.79 ± 0.01 ms eendebakpt_numpy_accumulate
2.67 ± 0.02 ms eendebakpt_numpy_compress
3.62 ± 0.01 ms Stefan2_nested_loops_float
3.90 ± 0.01 ms Stefan2_nested_loops
3.97 ± 0.02 ms Stefan2_float
4.09 ± 0.01 ms Stefan2_hi_lo
4.12 ± 0.03 ms eendebakpt_lengthhint_for
4.16 ± 0.03 ms eendebakpt_lengthhint
4.30 ± 0.02 ms Stefan2_pairs_even
4.43 ± 0.01 ms Stefan2_loop
4.43 ± 0.02 ms eendebakpt_loop
4.99 ± 0.05 ms kapinga_loop
8.09 ± 0.03 ms Stefan2_accumulate
1.67 1.66 1.71 hprodh_numpy_accumulate
1.76 1.81 1.79 eendebakpt_numpy_accumulate
2.62 2.63 2.67 eendebakpt_numpy_compress
3.60 3.59 3.62 Stefan2_nested_loops_float
3.85 3.86 3.90 Stefan2_nested_loops
3.94 3.97 3.97 Stefan2_float
4.09 4.09 4.09 Stefan2_hi_lo
4.13 4.15 4.12 eendebakpt_lengthhint_for
4.16 4.17 4.16 eendebakpt_lengthhint
4.30 4.31 4.30 Stefan2_pairs_even
4.38 4.42 4.43 Stefan2_loop
4.41 4.41 4.43 eendebakpt_loop
4.93 4.94 4.99 kapinga_loop
8.02 8.08 8.09 Stefan2_accumulate
Python: 3.13.0 (main, Nov 9 2024, 10:04:25) [GCC 14.2.1 20240910]