How to handwrite Python Bytecode

Hi!, I am a long time user of python (though this is my first time in the community) and wanted to understand python bytecode better (and potentially compile some languages to python bytecode) but I’m really struggling to understand how the actual bytecode file is structured? I did read through the dis library (and some other tutorials) on python bytecode so I understand how it works but they don’t really cover the actual python bytecode file itself. I thought to understand what is going on it would be good to handwrite some python bytecode but so far I am very much struggling.

If anybody could show me a resource or explain it to me or explain why I could never do this it would be great!

Thanks in advance.

(also sorry if this is the wrong place to post a question like this, I am very new to this community!)

(This question was originally asked in core development, and I was advised that the Users board might be a better place to look.)

Hi glubs,

bytecode differs for different Python versions - which is why it is not sensible to provide just the bytecode normally…

If you want to “compile” other languages to Python, you should not compile to bytecode but to plain Python code instead - if you ask me :slight_smile: :upside_down_face:

Cheers, Dominik

The exact byte-code generated is not documented and subject to change

without notice.

Aside from the dis module (disassembler), there is no documentation for

what byte-code will be generated by any Python code.

Python .pyc files are written using the marshal module, so you could

look at that.

https://docs.python.org/3/library/marshal.html

Also there are a few third-party libraries for writing byte-code:

If you want to experiment with how Python’s bytecode works, perhaps you’ll find one of my packages interesting:

It basically allows you to “inline” some bytecode instructions within any function (sort of similar to how asm extensions can be used with certain C compilers). It’s nice because you can rewrite part of a function in raw bytecode, while using normal Python syntax for other parts.

Just to give you a taste, here’s an example from the README. It accepts a sequence of items, and returns a list with each item repeated twice:

def doubled(items):
    out = []
    for item in items:
        out += item, item
    return out

With hax, you can do something like this, which keeps out on the stack (instead of in a local variable) and uses the LIST_APPEND op to build it:

from hax import *

@hax
def doubled(items):

    BUILD_LIST(0)

    for item in items:

        LOAD_FAST("item")
        DUP_TOP()
        LIST_APPEND(3)
        LIST_APPEND(2)

    RETURN_VALUE()
1 Like