Micro-optimization challenge: Count increasing maximums

I can just beat the top score if I cheat and use C but only if use clang. It drops to a distant 2nd place with gcc. Frustratingly, most of the time is spent just converting the tuple to an array. I even had to switch it to unsigned integers because the conversion time doubled otherwise.

import io
import array
import os
from cslug import CSlug, ptr

slug = CSlug("maxes-c", io.StringIO("""
    #include <stddef.h>

    size_t maxes(size_t *xs, size_t size) {
        if (!size) return 0;
        size_t output = 1;
        size_t max = xs[0];
        for (size_t i = 1; i < size; i++) {
            if (xs[i] > max) {
                max = xs[i];
                output++;
            }
        }
        return output;
    }
"""))
os.environ["CC"] = "clang"
slug.make()


def maxes_bwoodsend_c(xs):
    return slug.dll.maxes(ptr(array.array("Q", xs)), len(xs))

With clang:

 0.89 ± 0.00 ms  bwoodsend_c
 1.06 ± 0.00 ms  hprodh_numpy_accumulate
 1.16 ± 0.01 ms  eendebakpt_numpy_accumulate

With gcc:

 1.06 ± 0.01 ms  hprodh_numpy_accumulate
 1.14 ± 0.00 ms  bwoodsend_c
 1.15 ± 0.00 ms  eendebakpt_numpy_accumulate
2 Likes

Not my cup of tea, but interesting :slight_smile:

1 Like

I suppose you could aggressively multithread this problem…

Given that there are 10_000 numbers, you could calculate the local maximum for a 100 parts of a 100. Then you could run over the local maxima to see where the local maximum increases compared to the previous local maximum. If it does not, the chunk can be eliminated for further consideration.

With the remaining chunks you can keep on iterating using smaller chunk sizes until they are of size one, and then just count the chunks for the solution.

For this particular data it will be very ineffecient (given the very frequent maximums), but if the input was range(-50,55) it would be a good solution to get a low wall clock time.

1 Like