The RESUME opcode has the same line number as the function definition and "breaks" previous behaviour

While playing around with the new CPython 3.11 bytecode, I came across a change in behaviour with previous versions introduced by the RESUME opcode. Before, when determining all the source code lines covered by a code object, I would get that the first line number followed function definitions of the form

def foo():  # no code on this line
    ...

That, if def foo(): occurred on line 42, then the first line of foo.__code__ would be 43. The new RESUME opcode is placed on the same line as the function definition (or perhaps more correctly I should say on line n - 1, where n represents the first line number as produced by CPython < 3.11).

I have tried to find documentation about RESUME but all I could locate was PEP 626. This new “behaviour” is likely to break any tools that worked under the assumptions of the old behaviour (indeed this broke a project I’m working on). Whilst this is not the end of the world (I am already putting new logic in place to deal with other bytecode differences), it forces introducing new logic to deal with this difference.

Is this new behaviour here to stay forever? Are there cogent reasons for having RESUME on line n - 1, or would it be possible to perhaps move RESUME to line n with a 3.11 patch(!) release?

EDIT:
Fiddling around a bit more, I noticed the same happens with generators/coroutines/… with

  5           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0

(in this example, the first line in the function body is 6). So this is not limited to RESUME.

You should probably report this bug on on github as well.

As much as I’d love to have this fixed, I don’t think it’s a CPython bug (not the kind that you would report on the repository). The tools I work on target CPython itself, so they are special in a sense, and things like this are to be expected.

What I’d like to get out of this discussion is more details about this change in behaviour, whether it could be changed to reproduce the old behaviour, or if we should just accept that this will be different from 3.11 forward because of no viable alternatives. My understanding is that bytecode is not part of the public API, and therefore one should not rely on its undocumented properties (like the one about first line numbers as derived from code objects).

1 Like

The RESUME instruction is special. It tells the interpreter that the execution of a frame is being resumed (or started).
Everything up to the first RESUME is frame set up.

You should ignore all instructions up to and including the first RESUME when inspecting the bytecode.

OOI, what are you trying to achieve?

The tool I’m working on needs to build a line number → function mapping every time a new module is imported (or create it on the spot, if the module is already loaded). I would then get the code object of every function and extract all the line numbers to create the said mapping. In Python 3.11 I now get the extra line number that generally corresponds to the def line. This could be undesirable for a few reasons. For example, whilst this is something that doesn’t affect my tool (yet), the extra line number would overlap with the lines that pertain to the code object of the module itself:

python3.11 -m dis test.py
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (<code object foo at 0x7f0eca5af870, file "test.py", line 1>)
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (foo)
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

Disassembly of <code object foo at 0x7f0eca5af870, file "test.py", line 1>:
  1           0 RESUME                   0

  2           2 LOAD_CONST               0 (None)
              4 RETURN_VALUE

See how line 1 is now common for both the module code object and the function code object. In contrast, there is no such overlap in previous versions:

python3.10 -m dis test.py
  1           0 LOAD_CONST               0 (<code object foo at 0x7f88c5fd6b80, file "test.py", line 1>)
              2 LOAD_CONST               1 ('foo')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (foo)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object foo at 0x7f88c5fd6b80, file "test.py", line 1>:
  2           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE

For completeness, the code in test.py is simply

def foo(a):
	pass

IMO the pre-3.11 behaviour was cleaner. And whilst it might not be hard to adapt tools that want to support 3.11 to the old behaviour, if needed, I feel this just clutters the logic in such tools.

I’m guessing that you want this for some sort of debugger?

The format of the bytecode has always been an implementation detail. Its purpose is to allow the interpreter to execute Python code in as fast a way as possible.

If you need metadata about code objects, and think it should be generally available could you make an issue.
IMO, if you need to parse the bytecode for information, that is a flaw in the code object API.

I would like to tell you to use code.co_lines() or code.co_positions(), but that wouldn’t help you, as they do not skip the instructions up to the first RESUME. Maybe they should.

1 Like

I’m guessing that you want this for some sort of debugger?

Indeed!

I would like to tell you to use code.co_lines() or code.co_positions() , but that wouldn’t help you, as they do not skip the instructions up to the first RESUME .

Since my code needs to support Python 2 at the moment, I have adopted this solution

{instr.lineno for instr in Bytecode.from_code(f.__code__) if hasattr(instr, "lineno")}

where Bytecode comes from the bytecode package. It’s certainly not ideal having to go to an abstract representation of the bytecode to retrieve line number information where one could still just get that from co_lnotab and the new equivalent of that, but the operation is done once and the result cached.

Maybe they should.

I think it would be of help to debugging tools to have this baked into the code object API. I think I could then live with the fact that the internal location information structure has more internal details than those exposed by said API. Maybe something like co_effective_lines? Or pass an optional argument to co_lines, with the default value yielding the old 3.10 behaviour for backwards-compatibility?

How do you handle
def foo(): code_on_same_line()
?

Presumably if the first line is on the same line as the definition, you need to break at the definition, not code_on_same_line().

If you exclude co.co_firstlineno from the lines of a function, then your code should work the same for 3.10 and 3.11.

How do you handle
def foo(): code_on_same_line()
?

I have no special handling in place for this case as it would correctly report that there is code on the def line, which is fine since I currently don’t care about the module code object. Interestingly, in this case there is no line “n - 1” in 3.11

python3.11 -m dis /tmp/test.py
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (<code object foo at 0x102026a30, file "/tmp/test.py", line 1>)
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (foo)
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

Disassembly of <code object foo at 0x102026a30, file "/tmp/test.py", line 1>:
  1           0 RESUME                   0
              2 LOAD_GLOBAL              1 (NULL + print)
             14 LOAD_CONST               1 ('hey')
             16 PRECALL                  1
             20 CALL                     1
             30 POP_TOP
             32 LOAD_CONST               0 (None)
             34 RETURN_VALUE

This would definitely cause serious issues with my tool as I would end up injecting bytecode before RESUME, which I suspect is not a good idea. Furthermore, removing co_firstlineno in this case would make the code object appear as if it had no lines.