Quicker function

Is there a way to make this function quicker? It takes about 3 minutes to run this function.

def oorsprong(output,tijd,kolom):
    return[elem - np.mean(baseline(output,tijd,kolom)) for elem in channel(output,kolom)

The output is an array of size 21000, I think that is the problem, the other functions are down below.

def baseline(output,tijd,kolom):
    s_20 = []

    for i in range(tijd):
        s = float(output[i][kolom])
        s_20.append(s)
    return s_20
def channel(output,kolom):
    chann = []
    
    for i in range(len(output)):
        chan = float(output[i][kolom])
        chann.append(chan)
    return chann
    def oorsprong(output,tijd,kolom):
        ch = channel(output,kolom)
        bs = baseline(output,tijd,kolom)
        mean = np.mean(bs)
        r =[(elem - mean) for elem in ch]
        return r

I think you can take most of the calculation out of the loop as it does not seem to change as you iterate
John

Building lists one value at a time with looped .append()s is super slow. That’s where comprehensions most save you, or .extend() if you have an existing list. But in this case you don’t, you can just:

def channel(output,kolom):
    return [float(x[kolom]) for x in output]

On my system that’s about 25-30% faster than your original function, operating on a 2-d array of 10×100 values.

Edit: You can benchmark Python algorithms using the timeit module. For example, I determined that my newchannel was 25% faster by running Python interactively, defining both yours and my versions, then creating an input array and calling each algorithm 100,000 times for 5 runs each, via timeit.repeat():

>>> def channel(...):
>>> def newchannel(...):
>>> d= [[x for x in range(y, y+10)] for y in range(100)]
>>> import timeit
>>> timeit.repeat('channel(d, 3)', number=100000, globals=globals())
[1.0493650209973566, 1.01071211398812, 1.0100015600328334, 1.017461879993789, 1.012951304030139]
>>> timeit.repeat('newchannel(d, 3)', number=100000, globals=globals())
[0.7873937499825843, 0.7495891030412167, 0.7532214020029642, 0.7522081299684942, 0.7529334379942156]

Beats guessing.

Edit2: Also, my d could’ve been generated as simply:

>>> d = [list(range(y, y+10)) for y in range(100)]

That, a bit unexpectedly, is nearly twice as fast as my quick-and-dirty double-comprehension version!

>>> timeit.repeat('d= [[x for x in range(y, y+10)] for y in range(100)]', number=10000)
[0.6352741380105726, 0.6351301579852588, 0.6506627409835346, 0.6374996530357748, 0.6320426259771921]
>>> timeit.repeat('d= [list(range(y, y+10)) for y in range(100)]', number=10000)
[0.375542622001376, 0.37530899205012247, 0.37232113699428737, 0.3739727840293199, 0.37440704699838534]

Since you are using numpy, why not do everything with numpy arrays instead of Python lists? First convert your output to a float numpy array (if it wasn’t already).
Then your oorsprong function just becomes:

def oorsprong(output, tijd, kolom):
    return output[:, kolom] - output[:tijd, kolom].mean()

Lightning fast.

1 Like