# Why does unreached code take linear time?!?

Compare these two functions:

``````def short():
try:
if 0 == 1:
unreached
raise RuntimeError
except RuntimeError:
pass

def long():
try:
if 0 == 1:
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
raise RuntimeError
except RuntimeError:
pass
``````

The only difference is that `long()` has 100 unreached statements instead of just one. But it takes much longer:

`````` 357.0 Â±  0.3 ns  short
1355.8 Â±  1.5 ns  long

Python: 3.11.4 (main, Jun 24 2023, 10:18:04) [GCC 13.1.1 20230429]
``````

And when I increase it to 200 unreached statements, it takes even longer:

`````` 358.3 Â±  1.3 ns  short
2357.8 Â±  2.9 ns  long

Python: 3.11.4 (main, Jun 24 2023, 10:18:04) [GCC 13.1.1 20230429]
``````

So it looks like it takes an extra 10 ns for each unreached statement. Why? Shouldnât it just âskipâ them all in constant time and be just as fast as `short()`?

Benchmark code (Attempt This Online!):

``````from timeit import timeit
from time import perf_counter as time
from statistics import mean, stdev
import sys

def short():
try:
if 0 == 1:
unreached
raise RuntimeError
except RuntimeError:
pass

def long():
try:
if 0 == 1:
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
raise RuntimeError
except RuntimeError:
pass

funcs = short, long

for _ in range(3):
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.1f} Â± {stdev(ts):4.1f} ns '
for _ in range(100):
for f in funcs:
t = timeit(f, number=10**4) / 1e4
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print()

print('Python:', sys.version)
``````
1 Like

Just a guess, but could the difference be bytecode generation? If you use `py_compile` on each of the functions and then time them, is the result the same?

Might be mostly a Python 3.11 issue.

On mycompiler.io (Python 3.11) itâs even far worse (had to reduce `range(100)` to `range(25)` due to their 15 seconds limit):

`````` 317.9 Â±  1.7 ns  short
5771.0 Â±  3.0 ns  long
Python: 3.11.4 (main, Jun  9 2023, 07:59:55) [GCC 12.3.0]
``````

On trinket.io (Python 3.10) it consistently takes only a little longer:

`````` 225.6 Â±  3.9 ns  short
242.1 Â±  2.2 ns  long
Python: 3.10.9 (main, Jan 23 2023, 22:32:48) [GCC 10.2.1 20210110]
``````

On replit.com (Python 3.10), it also consistently takes only a little longer:

`````` 309.8 Â±  1.5 ns  short
332.6 Â±  1.9 ns  long
Python: 3.10.11 (main, Apr  4 2023, 22:10:32) [GCC 12.2.0]
``````

In the disassembly for the two functions, I did find one thing that MIGHT be relevant:

``````>>> dis.dis(short)
1           0 RESUME                   0

2           2 NOP

3           4 LOAD_CONST               1 (0)
6 LOAD_CONST               2 (1)
8 COMPARE_AND_BRANCH      39 (==)
12 POP_JUMP_IF_FALSE        6 (to 26)
...

>>> dis.dis(long)
1           0 RESUME                   0

2           2 NOP

3           4 LOAD_CONST               1 (0)
6 LOAD_CONST               2 (1)
8 COMPARE_OP              32 (==)
12 EXTENDED_ARG             2
14 POP_JUMP_IF_FALSE      600 (to 1216)

...
``````

The larger bytecode means that the conditional jump doesnât fit inside the normal opcode. Itâs entirely possible that this has an impact.

Iâm not familiar with that and am not completely sure what you mean. Does it differ from the normal compilation when just running the script? ? Iâm testing on the linked âAttempt This On-lineâ site, so you could try your test there.

I get that as well, but with 200 unreached statements it looks equivalent, so wouldnât explain the additional slowdown of that. (Plus Iâd be surprised if this small thing had that much impact.)

With 100:

``````              8 COMPARE_OP               2 (==)
14 EXTENDED_ARG             2
16 POP_JUMP_FORWARD_IF_FALSE   700 (to 1418)

19          18 LOAD_GLOBAL              0 (unreached)
``````

With 200:

``````              8 COMPARE_OP               2 (==)
14 EXTENDED_ARG             5
16 POP_JUMP_FORWARD_IF_FALSE  1400 (to 2818)

19          18 LOAD_GLOBAL              0 (unreached)
``````

Thatâs definitely interesting.

Yeah, but when weâre dealing with unexplained performance regressions, ANY potential focal point is worth exploring. Are you able to find the exact point at which it flips to EXTENDED_ARG, and get a performance measurement just below and just above that point? If those have nearly the same slowdown, that would solidly exclude that as a possibility.

For me on ATO, thatâs between 36 and 37. Runtimes:

`````` 726.9 Â±  4.5 ns  long36
736.5 Â±  4.1 ns  long37
``````
script

Attempt This Online!

``````from timeit import timeit
from time import perf_counter as time
from statistics import mean, stdev
import sys

def long36():
try:
if 0 == 1:
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached
raise RuntimeError
except RuntimeError:
pass

def long37():
try:
if 0 == 1:
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached; unreached
unreached; unreached; unreached; unreached; unreached; unreached; unreached
raise RuntimeError
except RuntimeError:
pass

import dis
# dis.dis(long36)
# dis.dis(long37)

funcs = long36, long37

for _ in range(3):
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.1f} Â± {stdev(ts):4.1f} ns '
for _ in range(100):
for f in funcs:
t = timeit(f, number=10**4) / 1e4
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print()

print('Python:', sys.version)
``````

For you I guess itâs around 256/6â42.7, since you jump by 600 over the 100 statements instead of my 700.

Does this happen if the function isnât a `try/except` block? I am wondering if it needs to build and store a larger exception object in the slow case.

Btw, If I replace `raise RuntimeError` with `return`, the difference shrinks a lot (to just ~0.05 ns per unreached statement):

``````  49.4 Â±  0.1 ns  short
54.4 Â±  0.0 ns  long

Python: 3.11.4 (main, Jun 24 2023, 10:18:04) [GCC 13.1.1 20230429]
``````

Edit: Actually that seems to be constant 5 ns, not some time âper elementâ. It doesnât get bigger when I increase to 200 or even 300 unreached statements.

You mean with the try/except not inside the function but in the calling code? Then yes. Thatâs what I had first, the results were similar to what theyâre now (you can see it in the history of my first post).

I shouldnât have said try/except, I was thinking about the exception itself (as you just posted about). When you raise a RuntimeError it needs to construct the traceback for that and one of them has more stuff in it (perhaps).

How should I check how much stuff is in the traceback? With `except ValueError` (so it doesnât catch) and calling `long()` right after defining it, Iâm shown just this little bit:

``````Traceback (most recent call last):
File "/ATO/code", line 32, in <module>
long()
File "/ATO/code", line 29, in long
raise RuntimeError
RuntimeError
``````

I donât really know, just brainstorming what might be happening. I was messing around with this in a notebook and when I do this:

edit: sorry, I should say that I removed the try/except in the functions and just have them raise

``````try:
long()
except RuntimeError as exc:
print(exc.__traceback__.tb_frame.f_locals)
``````

I can see the entire definition of `long` being stored in there, as a string. But when I do that for `short` in the same notebook it also has the code for `long`, because theyâre in the same scope. Is `timeit` doing something to isolate them?

edit: this may have been a red herringâŠmy `f_locals` also has stuff like ipython history in it

What Python version do you use? With ATOâs 3.11.4, that only shows me this:

``````{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f02658a8410>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/ATO/code', '__cached__': None, 'timeit': <function timeit at 0x7f02658a56c0>, 'time': <built-in function perf_counter>, 'mean': <function mean at 0x7f026571d4e0>, 'stdev': <function stdev at 0x7f026571dd00>, 'sys': <module 'sys' (built-in)>, 'short': <function short at 0x7f0265ac1f80>, 'long': <function long at 0x7f02658a5bc0>, 'exc': RuntimeError()}
``````

Manually broken into multiple lines for readability:

``````{'__name__': '__main__',
'__doc__': None,
'__package__': None,
'__spec__': None,
'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__file__': '/ATO/code',
'__cached__': None,
'timeit': <function timeit at 0x7f02658a56c0>,
'time': <built-in function perf_counter>,
'mean': <function mean at 0x7f026571d4e0>,
'stdev': <function stdev at 0x7f026571dd00>,
'sys': <module 'sys' (built-in)>,
'short': <function short at 0x7f0265ac1f80>,
'long': <function long at 0x7f02658a5bc0>,
'exc': RuntimeError()}
``````

I donât think so (but Iâm not sure what you mean with isolate). And if I instead measure with

``````            t0 = time()
for _ in range(10**4):
f()
t = (time() - t0) / 1e4
``````

then it still looks the same (just ~100 ns slower due to `timeit` doing that loop faster):

`````` 418.2 Â±  3.6 ns  short
1428.7 Â± 13.9 ns  long``````

I was using a jupyter notebook with 3.9, so quite different (I was mistakenly looking at the jupyter history which they store as local vars). But I was looking in the wrong place when I mentioned the code string. The code is still in the traceback (as a code object), in `exc.__traceback__.tb_frame.f_code.co_code`

Modifying your ATO example (and just doing one rep instead of 10000):

``````...
except RuntimeError as exc:
print(len(exc.__traceback__.tb_frame.f_code.co_code))
...
``````

yields `188 1576` which sort of the same magnitude as the timing difference. So perhaps constructing the context of the exception object accounts for this timing.

I confirm. Although on trinket.io, where there wasnât such a large speed difference, I also get very different string lengths: 76 and 472. Also, I think that that value (`co_code`) is prepared beforehand, not each time an exception is created.

1 Like

Could someone run the first postâs benchmark in equivalently installed regular Python 3.10 and 3.11 on the same machine? (Iâm only on a phone, where thatâs quite difficult.)

the things I do[1]âŠ

On a debian11 VM on GCP:

``````mamba create -n py310 python=3.10
mamba create -n py311 python=3.11
mamba activate py310
python unreached_test.py
mamba deactivate
mamba activate py311
python unreached_test.py
``````

outputs:

``````(py310) jupyter@fluent-analysis ~\$ python unreached_test.py
142.3 Â±  0.1 ns  short
161.2 Â±  0.1 ns  long

150.4 Â±  0.3 ns  short
166.4 Â±  0.1 ns  long

150.7 Â±  0.1 ns  short
167.0 Â±  0.2 ns  long

Python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
``````

and

``````(py311) jupyter@fluent-analysis ~\$ python unreached_test.py
176.8 Â±  0.3 ns  short
643.3 Â±  0.4 ns  long

176.0 Â±  0.4 ns  short
644.0 Â±  0.3 ns  long

176.5 Â±  0.4 ns  short
644.7 Â±  0.6 ns  long

Python: 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
``````

So this definitely is new to 3.11, although something is different in 3.10 as wellâŠ

1. to avoid work â©ïž

1 Like

Thanks! I think with that, creating a GitHub issue would be appropriate. I might tomorrow, depending on further comments here.

I just noticed it also matters whether I write the 100 unreached statements on the same line or on separate lines. Benchmark with those two extremes (instead of 10 lines of 10 statements each):

`````` 356.4 Â±  2.9 ns  short
1382.3 Â±  4.9 ns  long_one_per_line
2266.7 Â±  5.5 ns  long_all_on_same_line

Python: 3.11.4 (main, Jun 24 2023, 10:18:04) [GCC 13.1.1 20230429]
``````
Code
``````from timeit import timeit
from time import perf_counter as time
from statistics import mean, stdev
import sys

def short():
try:
if 0 == 1:
unreached
raise RuntimeError
except RuntimeError:
pass

u = '            unreached\n' * 100
exec(f'''
def long_one_per_line():
try:
if 0 == 1:
{u}
raise RuntimeError
except RuntimeError:
pass
''')

exec(f'''
def long_all_on_same_line():
try:
if 0 == 1:
{'unreached;' * 100}
raise RuntimeError
except RuntimeError:
pass
''')

funcs = short, long_one_per_line, long_all_on_same_line

for _ in range(3):
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):6.1f} Â± {stdev(ts):4.1f} ns '
for _ in range(100):
for f in funcs:
t = timeit(f, number=10**4) / 1e4
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
print()

print('Python:', sys.version)

``````

Attempt This Online!

Btw `long_one_per_line.__code__.co_code == long_all_on_same_line.__code__.co_code` is true.