I am not very familiar with how the underlying internals of Python implementations really work, but it is known that basically there are two steps in executing a Python program: the code is first transformed into the bytecode (the well-known .pyc files), which is then executed by the VM.
One of the big unresolved issues in Python world is distribution of applications, and there are many approaches, each with its own advantages and disadvantages: wrapping them up in self-contained environments like Docker containers, packing them together with the interpreter into a single binary (e.g. PyOxidizer) etc.
I’ve recently been thinking about one approach, but have no idea if it would be possible, in theory or in practice. Once the code is compiled into bytecode, all it requires is the virtual machine; it there a way to extract just the VM and use it to run the app without the interpreter? If it is, it might be possible to package the bytecode and the VM in an executable binary with a much smaller footprint that including the whole interpreter and the bytecode…
I’m mainly looking to understand if that is possible in theory (i.e. whether there are any conceptual issues in Python as a language that would prevent that separation), and if it is, whether it can be attempted with the existing implementations (primarily CPython). And, of course, if the answer to both is “yes”, has that been attempted in any way?
Or, of course, the whole idea might be completely bonkers for some obvious reasons that I have missed… I’d appreciate any comments. Thank you!
What is an “interpreter” to you and how it is different to a “VM”? Generally those terms are used interchangeably when talking about CPython.
As for your idea, there are already projects which distribute CPython and an app’s source code as an installable thing (e.g. PyInstaller, py2app, etc.). And you already mentioned PyOxidizer which does this.
So is your question specifically about trying to get CPython to take on the responsibility of providing the tooling instead of the community? Or are you asking something else?
One thing you need to factor in: The generation of bytecode is required at runtime. Dataclasses and namedtuple both use exec to compile code that’s generated at runtime, for example. There are many other examples of that.
From a terminology angle, it would probably be better to phrase your idea in terms of splitting the compiler from the VM/interpreter. The interpreter is the thing that executes the bytecode, the parser and compiler turn source code into bytecode.
From a language perspective one barrier to this is that there are language features (exec and eval) which rely on runtime availability of the compiler. And there are widely-used stdlib features (e.g. namedtuple) that rely on these language features. So any code intending to run without a compiler available would need to avoid those features.
Perhaps more importantly though, size-wise the parser and compiler just aren’t that big, relative to the interpreter and the standard library. So I don’t think you’d gain all that much by splitting them out. If you’re interested in smaller distributable executables, there’d be more potential win in pushing forward the various perpetually-stalled proposals for a slimmed-down stdlib.
From a terminology angle, it would probably be better to phrase your idea in terms of splitting the compiler from the VM/interpreter
Yes, this is exactly what I had in mind, sorry for being unclear.
From a language perspective one barrier to this is that there are language features (exec and eval) which rely on runtime availability of the compiler
I expected something like that, but on the other hand I am aware that it is theoretically possible to distribute pure bytcode (the old egg format allowed that, and it’s still possible with wheels), without the original source. Of course, it might not cover all the use cases, but it might open some possibilities.
there’d be more potential win in pushing forward the various perpetually-stalled proposals for a slimmed-down stdlib
I wonder whether it would be possible to dynamically shed all the stdlib components that are not needed for a certain application? I know that some Linux distros notoriously exclude some of the stdlib packages (e.g. Ubuntu and venv); I wonder if that could be part of the packaging process?
The reply does not seem related to the original point, which was that Python has exec and eval functions that take a string and need the compiler to transform it to bytecode that is then interpreted.
Yes, this seems like a right approach that could be more streamlined… But if I’m not mistaken py2exe is only for packaging apps on Windows, correct?