Separation of interpreter and VM?

I am not very familiar with how the underlying internals of Python implementations really work, but it is known that basically there are two steps in executing a Python program: the code is first transformed into the bytecode (the well-known .pyc files), which is then executed by the VM.

One of the big unresolved issues in Python world is distribution of applications, and there are many approaches, each with its own advantages and disadvantages: wrapping them up in self-contained environments like Docker containers, packing them together with the interpreter into a single binary (e.g. PyOxidizer) etc.

I’ve recently been thinking about one approach, but have no idea if it would be possible, in theory or in practice. Once the code is compiled into bytecode, all it requires is the virtual machine; it there a way to extract just the VM and use it to run the app without the interpreter? If it is, it might be possible to package the bytecode and the VM in an executable binary with a much smaller footprint that including the whole interpreter and the bytecode…

I’m mainly looking to understand if that is possible in theory (i.e. whether there are any conceptual issues in Python as a language that would prevent that separation), and if it is, whether it can be attempted with the existing implementations (primarily CPython). And, of course, if the answer to both is “yes”, has that been attempted in any way?

Or, of course, the whole idea might be completely bonkers for some obvious reasons that I have missed… I’d appreciate any comments. Thank you!

What is an “interpreter” to you and how it is different to a “VM”? Generally those terms are used interchangeably when talking about CPython.

As for your idea, there are already projects which distribute CPython and an app’s source code as an installable thing (e.g. PyInstaller, py2app, etc.). And you already mentioned PyOxidizer which does this.

So is your question specifically about trying to get CPython to take on the responsibility of providing the tooling instead of the community? Or are you asking something else?

One thing you need to factor in: The generation of bytecode is required at runtime. Dataclasses and namedtuple both use exec to compile code that’s generated at runtime, for example. There are many other examples of that.

From a terminology angle, it would probably be better to phrase your idea in terms of splitting the compiler from the VM/interpreter. The interpreter is the thing that executes the bytecode, the parser and compiler turn source code into bytecode.

From a language perspective one barrier to this is that there are language features (exec and eval) which rely on runtime availability of the compiler. And there are widely-used stdlib features (e.g. namedtuple) that rely on these language features. So any code intending to run without a compiler available would need to avoid those features.

Perhaps more importantly though, size-wise the parser and compiler just aren’t that big, relative to the interpreter and the standard library. So I don’t think you’d gain all that much by splitting them out. If you’re interested in smaller distributable executables, there’d be more potential win in pushing forward the various perpetually-stalled proposals for a slimmed-down stdlib.

Carl

1 Like

From a terminology angle, it would probably be better to phrase your idea in terms of splitting the compiler from the VM/interpreter

Yes, this is exactly what I had in mind, sorry for being unclear.

From a language perspective one barrier to this is that there are language features (exec and eval) which rely on runtime availability of the compiler

I expected something like that, but on the other hand I am aware that it is theoretically possible to distribute pure bytcode (the old egg format allowed that, and it’s still possible with wheels), without the original source. Of course, it might not cover all the use cases, but it might open some possibilities.

there’d be more potential win in pushing forward the various perpetually-stalled proposals for a slimmed-down stdlib

I wonder whether it would be possible to dynamically shed all the stdlib components that are not needed for a certain application? I know that some Linux distros notoriously exclude some of the stdlib packages (e.g. Ubuntu and venv); I wonder if that could be part of the packaging process?

The reply does not seem related to the original point, which was that Python has exec and eval functions that take a string and need the compiler to transform it to bytecode that is then interpreted.

Berislav Lopac wrote:

“it is theoretically possible to distribute pure bytcode”

Not just theoretically possible. Distribution of .pyc files only

without their corresponding .py file is supported, but of course they

only support a single interpreter minor version.

If you copy a .pyc file from the file system cache __pycache__ to

somewhere on the PYTHONPATH, and rename it to a valid module name

(keeping the .pyc extension), it is usable as a module.

"I wonder whether it would be possible to dynamically shed all the

stdlib components that are not needed for a certain application?"

I daresay that it is possible. For example, py2exe analysises your

application source code to work out the third-party dependencies, so

that could be applied to the stdlib as well. py2exe doesn’t do that:

but you can manually exclude stdlib files.

1 Like

Yes, this seems like a right approach that could be more streamlined… But if I’m not mistaken py2exe is only for packaging apps on Windows, correct?

Yeah, I’m just mentally dumping my thoughts on the topic…

I wasn’t suggesting that you use py2exe specifically, just using it as
proof that it is possible to eliminate unused libraries.

Of course, I didn’t mean to imply that, sorry. In any case thanks for the pointer!

IIRC, cx_freeze also tries to strip out everything that never gets imported from the frozen application.

1 Like