Finding out which element in a comprehension raised an exception, whithout making it too slow

For a project I’m working on (typedload), if a list comprehension raises an exception, I need to know which element of the comprehension caused the exception, to pass the information to the user.

I’m using list comprehensions because they are faster than regular for loops, where such a thing would be really trivial.

So at the moment my best guess is to do the following

[f(j) for i, j in enumerate(items, start=1) if index:=i]

But it turns out that adding the enumerate generator brings a visible slowdown in my benchmarks.

I also tried to remove the condition and do index:=i generating a tuple and then discarding that, but it was slightly slower.

I’ve also tried replacing the function with a wrapper that increments index, but that’s even slower.

Is it possible to achieve this in pure python?

For a project I’m working on (typedload),

Interesting looking project!

if a list comprehension raises an exception, I need to know which
element of the comprehension caused the exception, to pass the
information to the user.

I’m using list comprehensions because they are faster than regular for loops, where such a thing would be really trivial.

So at the moment my best guess is to do the following

[f(j) for i, j in enumerate(items, start=1) if index:=i]

But it turns out that adding the enumerate generator brings a visible slowdown in my benchmarks.

Well, you’re making extra state - there’s going to be a cost. Is it a
serious slowness?

BTW, is that if index:=i a magic trick to set an external variable
index to the current enumeration point? (Experiments; ooh, the
comprehension names seem to be locals to the comprehension.)

Is this any better?

 [f(j,i) for i, j in enumerate(items, start=1)]

I appreciate that it means you need to change f().

I expect that anything you do will be slower than:

 [f(j) for j in items]

If nothing else, enumerate is inherently creating the (i,j)tuples which are then unpacked intoiandj`.

Cheers,
Cameron Simpson cs@cskk.id.au

This is true, although the cost of constructing tuples and discarding them is reduced drastically by the free list (since every one of these is a two-element tuple, they’ll generally all reuse the same memory block, making it mostly just pointer setting and such).

Personally though, I would just turn this into a vanilla for loop. Even if there IS a measurable performance hit, I’d accept that as the price of easier debugging.

1 Like

enumerate creates only one tuple object there and reuses it throughout.

@LtWorf Sharing your benchmark code comparing your solutions would be useful, so we can easily benchmark alternatives.

Conceptually no, it has to create separate tuples for each returned pair (tuples are immutable); but CPython optimizes the construction and destruction of tuples heavily.

Not in my testing, it doesn’t:

>>> x = enumerate('abc')
>>> a, b, c = next(x), next(x), next(x)
>>> 
>>> a, b, c
((0, 'a'), (1, 'b'), (2, 'c'))

Separate values, because there are separate tuples - and assignment doesn’t copy, so they must have come from the iteration process itself. Aside from that, it’s a bit hard to imagine how a tuple object could get reused with different values - what with being immutable and all. I suppose at the C level, cheating is possible, but.

If you tried something like:

>>> x = enumerate('abc')
>>> id(next(x)), id(next(x)), id(next(x))
(139753995115008, 139753995115008, 139753995115008)

That’s because the tuples aren’t needed after the id call, so they can get ref-count GCd immediately and then the next one is allocated in the same place.

What I’ve tried so far

def f(v):
    return v

timeit [f(i) for i in range(5000)]
# 147 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

timeit [f(j) for i,j in enumerate(range(5000), start=1) if (index := i)]
# 226 µs ± 1.58 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

timeit [f((index:=i,j)[1]) for i,j in enumerate(range(5000), start=1)]
# 281 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

index = 0
def wrapper(v):
    nonlocal index
    r = f(v)
    index += 1
    return r
    
timeit [wrapper(i) for i in range(5000)]
# 357 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So the enumerate solution is the fastest I came up with, but in the testsuite of typedload, for the “load list of ints” case, it’s about 0.2 seconds slower (goes from 0.6386s to 0.8594 on that test).

Of course in more complicated benchmarks it’s not as noticeable, but I was still surprised at how much overhead enumerate adds, while still being the fastest one I’ve found.

On an unrelated note, doing this makes cython very very unhappy (I have a plan to eventually use it).

1 Like

Do be aware that your figures here are in microseconds. It’s sometimes easy to miss when timeit switches unit, but this is a fraction of a millisecond per iteration, and the difference between non-enumerated and enumerated iteration is 79us or one twelve thousandth of a second.

I’m aware. But in the typedload benchmark the microseconds add up to 0.2 seconds in the end of the specific benchmark I’m talking about.

(which is this one typedload/perftest/load list of ints.py at master · ltworf/typedload · GitHub)

But that’s not what happens there. Their for i, j in enumerate immediately unpacks the tuple and doesn’t keep a reference to it. So whenever the enumerate iterator is asked for its next item, it sees that it has the only reference to the tuple and reuses it. See the code:

Well that’s different, not equivalent to the code we were talking about. You do keep a reference there, preventing the optimization.

No, that’s not what happens. The enumerate iterator still has a reference to the tuple, so it doesn’t get GCd (it gets reused instead).

Two alternative solutions:

  1. Instead of copying every i to the outside index and fake-testing its truth, create outside access to i for potential use after an error occurred.

  2. Itertools, to benefit from C speed.

Code with demo:

from itertools import compress, count

# Setup
iterable = [True] * 100
iterable[42] = False
def f(v):
    assert v

# Solution 1
try:
    [f(v)
     for _ in [None]
     if (last_i := lambda: i)
     for i, v in enumerate(iterable)]
except AssertionError:
    print('error at', last_i())

# Solution 2
try:
    ctr = count(1)
    [f(v) for v in compress(iterable, ctr)]
except AssertionError:
    print('error at', next(ctr) - 2)

Output (Attempt This Online!):

error at 42
error at 42

Benchmark results:

 482.7 ± 3.7 μs  [f(v) for v in iterable]
 485.3 ± 4.6 μs  it = iter(iterable); [f(v) for v in it]
 607.8 ± 6.4 μs  ctr = count(1); [f(v) for v in compress(iterable, ctr)]
 629.2 ± 4.3 μs  [f(v) for _ in [None] if (last_i := lambda: i) for i, v in enumerate(iterable)]
 699.4 ± 7.1 μs  [f(v) for i,v in enumerate(iterable, start=1) if (index := i)]
 938.4 ± 7.5 μs  [f((index:=i,v)[1]) for i,v in enumerate(iterable, start=1)]
Python: 3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]

(The it = ... solution is from my later “One more” message. Had to omit your wrapper one, somehow it doesn’t work with this timeit usage.)

Benchmark code:

from timeit import timeit
from statistics import mean, stdev
import sys

setup = '''
from itertools import compress, count

def f(v):
    return v

iterable = range(5000)

index = 0
def wrapper(v):
  #  nonlocal index
    r = f(v)
    index += 1
    return r
'''

solutions = [
'''[f(v) for v in iterable]''',
'''[f(v) for i,v in enumerate(iterable, start=1) if (index := i)]''',
'''[f((index:=i,v)[1]) for i,v in enumerate(iterable, start=1)]''',
#'''[wrapper(i) for i in iterable]''',
'''[f(v) for _ in [None] if (last_i := lambda: i) for i, v in enumerate(iterable)]''',
'''ctr = count(1); [f(v) for v in compress(iterable, ctr)]''',
'''it = iter(iterable); [f(v) for v in it]''',
]

times = {s: [] for s in solutions}
def stats(s):
    ts = [t * 1e6 for t in sorted(times[s])[:10]]
    return f'{mean(ts):6.1f} ± {stdev(ts):3.1f} μs '

for _ in range(100):
    for s in solutions:
        t = timeit(s, setup, number=10) / 10
        times[s].append(t)

for s in sorted(solutions, key=stats):
    print(stats(s), s)

print('Python:', sys.version)

Attempt This Online!

1 Like

Ah, I see what you mean. My point was still that, conceptually, it is always a new tuple; what you’re showing there is a special case of the “CPython optimizes these heavily” situation. In general, these optimizations are done with free lists; the one here is slightly more aggressive in that it’s one very specific cached tuple for this enumeration. It’s still a CPython optimization, not a language feature.

One more, though it needs the iterable to support len or some other way to determine its size. And to find the error index, it iterates over the remaining items:

# Setup
iterable = [True] * 100
iterable[42] = False
def f(v):
    assert v

# Solution 3
try:
    it = iter(iterable)
    [f(v) for v in it]
except AssertionError:
    print('error at', len(iterable) - len(list(it)) - 1)

Attempt This Online!

Benchmark result is included in my previous message, although the lack of an error in your/my benchmark might make this misleading (depends on how realistic that is in your actual usage).

There are more efficient (at least memory-wise) ways to find the length of an iterator than len(list(it)), see ilen. I just kept it simple here for the demo, as your/my benchmark doesn’t have an error anyway.

You could also use the iterator’s length hint, but I wouldn’t rely on that for anything important:

except AssertionError:
    print('error at', len(iterable) - it.__length_hint__() - 1)

Attempt This Online!

1 Like

Right, I took CPython as the context since I think the optimizations you mentioned are also CPython-specific and CPython is what most people use.

hey! a bit offtopic: you used ipython / jupyter magics for running this right?

Thanks I’m using the itertools one!

I see you’re not actually using a list comprehension but this:

function(f(l, v, t) for v in compress(value, ctr))

This might be faster (using repeat from itertools):

function(map(f, repeat(l), compress(value, ctr), repeat(t)))
1 Like

Thanks, that is indeed noticeably faster. I will replace with this one.