Editing and executing instructions generated from dis.dis

arunppsg · May 5, 2024, 4:21pm

I am trying to edit instructions. Here is a simple example where I am trying to replace instances where we load 6 with 7.

import dis

def make_new_load(inst):
    if inst.argval == 6:
        new_argval = 7
    else:
        new_argval = inst.argval
    if inst.argrepr == '6':
        new_argrepr = '7'
    else:
        new_argrepr = inst.argrepr
    new_inst = dis.Instruction(opname=inst.opname, opcode=inst.opcode, arg=inst.arg, argval=new_argval, argrepr=new_argrepr, offset=inst.offset, starts_line=inst.starts_line, is_jump_target=inst.is_jump_target)
    return new_inst

def foo(x):
    x = x + 6
    return x

new_instructions = []
for inst in dis.get_instructions(foo):
    if inst.opname == 'LOAD_CONST':
        inst = make_new_load(inst)
    new_instructions.append(inst)

Now, if I want to execute the new instructions or make them as code objects, how can I do it? If I could make it as a code object, I could then use exec to run it.

fonini · May 5, 2024, 5:23pm

I’m not sure if the dis module can help you generate code objects (maybe it can – check the docs at dis — Disassembler for Python bytecode — Python 3.12.3 documentation).

What I know is that you can simply generate code objects with the constructor:

from types import CodeType
new_code_object = CodeType(...)

Mind, there are a lot of code object parameters you need to keep track of depending on what kind of stuff you modify from the original code object. E.g., if you add or remove instructions, you would probably have to recalculate co_stacksize (A quick search tells me that Python does this in the function calculate_stackdepth here: cpython/Python/flowgraph.c at main · python/cpython · GitHub). But with some effort, it’s doable.

kknechtel · May 5, 2024, 5:49pm

This approach can’t work for the specific example. The problem is that, when the code loads the 6 value, that value doesn’t come from the actual Python bytecode - it comes from data stored elsewhere in the code object. Inside the code object there’s a separate co_code which is the actual bytecode, as a bytes; then there is co_consts which is a tuple of constant values used by the code. In the case for the foo function from the example, the tuple will have two values in it: None (so the function can return it) and 6. The opcode specifies an index into that tuple.

When dis creates Instruction objects, they represent more than just the bytecode - they look up and reference those stored objects, following the bytecode’s logic.

Building a code object “from scratch” is quite difficult. But as of 3.8, code objects have a .replace method that can be used for the kind of changes you want. If you can take the existing co_code and co_consts, and compute a new bytes object and new tuple, then you can use those to create the new code object:

But as Pedro says, simply replacing these two values doesn’t necessarily result in a valid code object.

dis can help you with understanding the opcodes represented by the existing bytes, but it isn’t really helpful for creating the new bytes again. Worse, the bytecode spec changes frequently - every minor version should be treated as mutually incompatible, and most of the time the changes are even fairly noticeable. You’ll want a third-party library for tasks like this.