How to delete all the items (in a list) beautifully?

CAM-Gerlach · February 11, 2024, 8:46pm

Expanding on this approach, I was interested in how the numbers scaled to the somewhat more meaningful case of emptying a large list, and ended up substantially extending the benchmark script to make all the key parameters easily visible and tunable, and to be useful in the future to run arbitrary benchmarks on a set of alternatives. For reference, these are my results on an old Windows laptop running the original script:

 22.9 ± 0.0 ns  pass  # as baseline
 83.1 ± 0.0 ns  lst.clear()
108.4 ± 0.1 ns  lst *= 0
138.4 ± 0.3 ns  del lst[:]
153.2 ± 0.3 ns  lst[:] = ()
173.0 ± 0.2 ns  lst[:] = []

Python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:12:32) [MSC v.1929 64 bit (AMD64)]

and my extended version:

 22.9 ± 0.0 ns  pass  # as baseline
 83.1 ± 0.1 ns  lst.clear()
108.6 ± 0.2 ns  lst *= 0
138.4 ± 0.3 ns  del lst[:]
153.1 ± 0.5 ns  lst[:] = ()
173.3 ± 0.3 ns  lst[:] = []

Python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:12:32) [MSC v.1929 64 bit (AMD64)]
Total runtime: 2.56 s

with parameters set to match the original:

OUTER_LOOPS = 100
INNER_LOOPS = 10**4
SAMPLES = 10
UNIT = "ns"
CUSTOM_PARAMS = {"list_len": 5}

With the parameters modified to resemble a more realistic situation where this is more likely to actually matter, i.e. a list with 10 000 items:

OUTER_LOOPS = 100
INNER_LOOPS = 10**2
SAMPLES = 10
UNIT = "us"
CUSTOM_PARAMS = {"list_len": 10_000}

we can see that the difference is entirely fixed per-call overhead, and does not scale at all with list size:

  0.3 ± 0.0 us  pass  # as baseline
 30.0 ± 0.1 us  lst.clear()
 30.0 ± 0.1 us  lst *= 0
 30.1 ± 0.1 us  del lst[:]
 30.2 ± 0.0 us  lst[:] = []
 30.2 ± 0.1 us  lst[:] = ()

Python: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:51:25) [MSC v.1934 64 bit (AMD64)]
Total runtime: 4.89 s

So, this might matter if you’re clearing the same small list thousands to millions of times in a very tight inner loop, but will be lost in the noise on larger lists or when not in a tight inner loop.

Script

import random
import sys
import time
from statistics import mean, stdev
from timeit import timeit

OUTER_LOOPS = 100
INNER_LOOPS = 10**2
SAMPLES = 10
UNIT = "us"
CUSTOM_PARAMS = {"list_len": 10_000}

FUNCTIONS = '''\
lst.clear()
del lst[:]
lst[:] = []
lst[:] = ()
lst *= 0
pass  # as baseline
'''.splitlines()

TEST_CODE = 'for lst in lists: {function}'
SETUP_CODE = 'lst = list(range({list_len})); lists = [lst[:] for _ in range({INNER_LOOPS})]'

UNITS = {
    "s": 1,
    "ms": 1e-3,
    "us": 1e-6,
    "ns": 1e-9,
}

PARAMS_TO_FILL = {
    'OUTER_LOOPS': OUTER_LOOPS,
    'INNER_LOOPS': INNER_LOOPS,
    **CUSTOM_PARAMS}


def generate_stats(f, times):
    ts = [t / UNITS[UNIT] for t in sorted(times[f])[:SAMPLES]]
    return f'{mean(ts):5.1f} ± {stdev(ts):3.1f} {UNIT} '


def time_functions():
    start_time = time.monotonic()
    times = {f: [] for f in FUNCTIONS}
    for __ in range(OUTER_LOOPS):
        random.shuffle(FUNCTIONS)
        for f in FUNCTIONS:
            t = timeit(
                TEST_CODE.format(function=f, **PARAMS_TO_FILL),
                SETUP_CODE.format(function=f, **PARAMS_TO_FILL),
                number=1,
            ) / INNER_LOOPS
            times[f].append(t)

    stats = {f: generate_stats(f, times) for f in FUNCTIONS}
    for f in sorted(FUNCTIONS, key=lambda f: stats[f]):
        print(stats[f], f)

    print('\nPython:', sys.version)
    print(f'Total runtime: {time.monotonic() - start_time:.2f} s')


if __name__ == '__main__':
    time_functions()

Stefan2 · February 11, 2024, 10:45pm

Also depends on what the elements are, not just on how many there are. Several tests with your list length 10000:

With your lst = list(range({list_len})):

  0.1 ± 0.0 us  pass  # as baseline
 16.0 ± 0.2 us  del lst[:]
 16.0 ± 0.4 us  lst.clear()
 16.0 ± 0.4 us  lst[:] = ()
 16.1 ± 0.1 us  lst[:] = []
 17.7 ± 0.2 us  lst *= 0

With lst = {list_len} * [0]:

  0.2 ± 0.0 us  pass  # as baseline
 25.3 ± 0.1 us  lst.clear()
 25.4 ± 0.1 us  lst *= 0
 25.4 ± 0.2 us  lst[:] = ()
 25.4 ± 0.2 us  del lst[:]
 25.5 ± 0.1 us  lst[:] = []

With lst = {list_len} // 4 * [0, 1, 2, 3]:

  0.1 ± 0.0 us  pass  # as baseline
 13.9 ± 0.4 us  lst[:] = []
 14.0 ± 0.2 us  lst.clear()
 14.1 ± 0.3 us  del lst[:]
 14.2 ± 0.2 us  lst[:] = ()
 16.8 ± 0.2 us  lst *= 0

With your lst = list(range({list_len})) but with lists = [[-x for x in lst] for _ in range({INNER_LOOPS})] (that is, [-x for x in lst] instead of lst[:]):

  0.7 ± 0.0 us  pass  # as baseline
107.4 ± 0.3 us  lst.clear()
108.5 ± 0.3 us  lst[:] = ()
108.6 ± 0.5 us  del lst[:]
108.6 ± 0.7 us  lst[:] = []
117.4 ± 0.4 us  lst *= 0

Negative numbers take much longer to delete

I wonder why the *= solution is slower for me…

alexprengere · February 12, 2024, 8:40am

You can use %%timeit, the “cell” mode of ipython timeit, the first line is setup code that is not timed:

%%timeit a = list(range(1_000_000))  # this is setup code
a.clear()                            # this is timed
# 15.6 ns ± 0.493 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

Note that you can use stdlib timeit to do this as well:

$ python3 -m timeit -s 'a = list(range(1_000_000))' 'a.clear()' 
20000000 loops, best of 5: 14.8 nsec per loop

Stefan2 · February 12, 2024, 1:52pm

So you’re not avoiding it. That’s just bad.

alexprengere · February 12, 2024, 2:13pm

I checked and you are correct, I was under the false assumption that the timeit setup code ran once before “every measurement”, but it actually just runs once

Edit: and now I am wondering if this would be a good addition to timeit – setup code to run before every measurement