Faster DELETE bytecode for exception handlers

In compiling a try statement, I see bytecodes like the following:

      12 LOAD_NAME                1 (NameError)
      16 POP_JUMP_FORWARD_IF_FALSE    22 (to 62)
      18 STORE_NAME               2 (eg)
      20 PUSH_NULL
      22 LOAD_NAME                3 (bar)
      24 LOAD_NAME                4 (f)
      26 PRECALL                  1
      30 CALL                     1
      40 POP_TOP
      42 POP_EXCEPT
      44 LOAD_CONST               0 (None)
      46 STORE_NAME               2 (eg)
      48 DELETE_NAME              2 (eg)
      50 LOAD_CONST               0 (None)
 >>   54 LOAD_CONST               0 (None)
      56 STORE_NAME               2 (eg)
      58 DELETE_NAME              2 (eg)
      60 RERAISE                  1

The LOAD and STORE bytecodes are there only to prevent the DELETE from complaining about an unbound name.

The LOAD + STORE + DELETE can be replaced by a single (new) bytecode:


This would remove four bytecodes from any ‘except[*] … as name’ clause.

I realize that a simple DELETE_NAME by itself would not work, because the handler might have deleted the variable. That’s why doing the LOAD + STORE is necessary.

Replacing the three bytecodes with a single DELETE_NAME_NORAISE would also work.

Likewise for DELETE_FAST_NORAISE, DELETE_DEREF_NORAISE, and DELETE_GLOBAL_NORAISE. Possibly also DELETE_SUBSCR_NORAISE and DELETE_ATTR_NORAISE, if there are other situations where a DELETE is compiled and it is not certain whether the subscript or attribute already exists.

Have you benchmarked it? To me this doesn’t appear to be a performance-critical situation, compared to other things we’re still planning to tackle.

1 Like

Not performance-critical. Just file this away as something to do when you can get around to it. I expect it to be simple to implement the new bytecodes and use them in the compiler. I’d make a PR myself, but like you, I have more important things to do right now.

Adding byte codes will have side effects from increasing the size of code in ceval.c and that could mean the code falls out of the CPU cache.
If that happens all of python could slow down.
Without benchmarks on a number of CPU architectures and OS it would be hard to know the impact.

1 Like

In the CPU architectures that I am familiar with, instruction cache is in 64-byte lines. Only those lines occupied by code that is currently, or recently, executing will compete for cache space, and the cache is typically 4-way, so you’d have to have 4 other frequently used lines of code to result in thrashing and cache misses.

The faster Python improvements could be said to have the same potential problem, as they increase the size of the code in _PyEval_EvalFrameDefault.