Hand write python bytecode

Hi!, I am a long time user of python (though this is my first time in the community) and wanted to understand python bytecode better (and potentially compile some languages to python bytecode) but I’m really struggling to understand how the actual bytecode file is structured? I did read through the dis library (and some other tutorials) on python bytecode so I understand how it works but they don’t really cover the actual python bytecode file itself. I thought to understand what is going on it would be good to handwrite some python bytecode but so far I am very much struggling.

If anybody could show me a resource or explain it to me or explain why I could never do this it would be great!

Thanks in advance.

(also sorry if this is the wrong place to post a question like this, I am very new to this community!)

It is mostly just dumped out via the marshal API, although there’s a handful of extra fields written first:

https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py#L641

I’m not sure there’s a great description of the marshal format beyond the source code:

https://github.com/python/cpython/blob/master/Python/marshal.c

I’m not sure if this was the best spot, maybe Users would have been better, but you’re also asking for pretty low-level information.

Thank you for your reply. It has been very helpful. I will open an identical thread in Users to see if I get any more information.

Actually sorry I am kind of curious. If the bytecode file is just a marshall dump of a python object. Where is the bytecode actually used? is the marshall dump bytecode?

I did take a look at the source code but I am still learning to read other peoples code and so I did not get too much out of it.

Thanks again, you’ve been a real blessing!

The byte code will be re-loaded by unmarshaling it, which will ultimately produce a new code object. That code object then gets executed against a dictionary which gets created for the module scope. A good starting point might be just trying to create your own code objects in Python and then using exec() to run them. A simple example of doing this by letting Python produce the code object would be:

def f():
global x
x = 2

d = {}
exec(f.code, d)
print(d[‘x’])

You can construct your own code objects by doing:

from types import CodeType
CodeType(…)

You can do help(CodeType) to get information on all of the parameters you need to pass - it’s a lot, but you can get ideas for what you need to pass by looking at the attributes on f.code.