I can just beat the top score if I cheat and use C but only if use clang. It drops to a distant 2nd place with gcc. Frustratingly, most of the time is spent just converting the tuple to an array. I even had to switch it to unsigned integers because the conversion time doubled otherwise.
import io
import array
import os
from cslug import CSlug, ptr
slug = CSlug("maxes-c", io.StringIO("""
#include <stddef.h>
size_t maxes(size_t *xs, size_t size) {
if (!size) return 0;
size_t output = 1;
size_t max = xs[0];
for (size_t i = 1; i < size; i++) {
if (xs[i] > max) {
max = xs[i];
output++;
}
}
return output;
}
"""))
os.environ["CC"] = "clang"
slug.make()
def maxes_bwoodsend_c(xs):
return slug.dll.maxes(ptr(array.array("Q", xs)), len(xs))
With clang:
0.89 ± 0.00 ms bwoodsend_c
1.06 ± 0.00 ms hprodh_numpy_accumulate
1.16 ± 0.01 ms eendebakpt_numpy_accumulate
With gcc:
1.06 ± 0.01 ms hprodh_numpy_accumulate
1.14 ± 0.00 ms bwoodsend_c
1.15 ± 0.00 ms eendebakpt_numpy_accumulate