Is it possible to modify the Python interpreter to generate machine code and run directly on the processor without going through the usual process of bytecode compilation and interpretation by the Python Virtual Machine (PVM)? S/N
(I donāt know what S/N means.)
There are many existing implementations of Python besides the one you get from python.org. As far as I know, none of them directly compile to machine code, but there are a few that target other VMs. There are also projects that can make a compiled result possible, for example shedskin. Itās limited (canāt handle every Python program), but it converts Python code to C++ which can then be compiled normally. Thereās also Cython, but itās intended for making extension modules rather than replacing the entire program.
None of these work by āmodifying the Python interpreterā; the default Python implementation is, from what I can tell, not a useful base for that kind of project.
So do I need to create a compiler to do this? āS/N or Y/N (YES OR NO)ā
You would need to make a compiler, yes.
I strongly recommend looking for existing tools that get the result you want, first.
Which programming language should I choose for this that involves low-level programming ā C++ or Python? Furthermore, is it possible to enhance Python to achieve a hybrid approach, combining high- and low-level features?
The compiler must generate machine code and run directly on the cpu right?
Iām not sure if itās possible to write a compiler that compiles Python code into an executable ā unless that executable also includes a Python interpreter, so basically still uses bytecodes! Consider a simple function like this:
def func(n):
return n + n
What is the type of n? How can you convert this directly into executable code without knowing what the type of n is? You only get to know this type is at runtime once n is bound to an actual value.
And what is ā+ā? How ā+ā is expected to behave actually depends on the type of nā¦
Ok, suppose you think there is a way, and just assume you have converted this somehow into a dynamic link library that generally seems to work (with strings and with integers and what not). Now someone else comes along and wants to use your function. They write a custom class that defines the ā+ā operator in a very special way (perfectly fine in Python). Can you guarantee that that will still work with your compiled code? Youād have to recompiling that script, of course, but now āfuncā should also be able to work with the custom class. This implies at the very least that compilation itself may (in general) become very inefficient - since to determine the meaning of ā+ā the compiler has to take into account all possible uses of ā+ā (and __add__
, __radd__
, __iadd__
) everywhere where this occurs.
Perhaps there are ways around this ā even so, I think this illustrates that writing this kind of compiler would be far more challenging then writing a compiler for a statically typed, non-interpreted language like C or Rust.
There ARE ways to get the performance of machine code, without sacrificing the flexibility that you might get. Usually it amounts to a quick check āare these the data types I expect?ā followed by a fast path or a slow path. Itās generally impractical to hand-write that sort of thing, but fortunately, that isnāt necessary. Give PyPy a try and see what it can do for you!
If I decide to develop a compiler for Python, which requires a low-level language, the following question arises: is it possible to enhance Python to achieve a hybrid approach, combining high and low-level features? Later, when creating the compiler at the low-level to access hardware directly, the question of this compilerās dependency on the Python interpreter arises. How do these parts interact, and what would be the impact of this dependence on the performance and effectiveness of the compiler in accessing hardware resources? Would it not impact the code to be compiled, or perhaps I would need to make extensive modifications? In a way, I must say it is quite confusing.
I only heard about it, never tried it.
A compiler, ultimately, is a program that converts information from one form (source code) into another form (machine code). That can be done in a high-level or low-level language, with equivalent functionality. You could, if you wanted to, write a C compiler in Python, or a Fortran compiler in JavaScript. The downside would be, itād take approximately five centuries to compile Firefox from source
The hybrid approach you suggest is basically what a JIT compiler will do, though. It starts out in āflexibleā mode, but if it notices that thereās benefit to be had, it can compile something to machine code, then immediately run it. This is what PyPy does, and itās able to cope with all of the flexibility of Python by simply NOT compiling those parts. Considering that most Python programs use those dynamic features in only a small handful of places (with boring old string/integer/float calculations making up the bulk of processing time), thereās still a lot of benefit to be had.
There was one that I looked at a long time ago, but I cannot find its name in my notes.
What it did was convert python modules into C code that called the libpython APIs. It ran python that I tried faster, but was very dependent on use case of the code.
MicroPython and CircuitPython can compile a subset of Python to machine code. They target microcontrollers; they cannot compile for x86-based CPUs.
Maybe Nuitka?
Exactly they have āalready done thatā.
The jit is the closest to what I think literally.
Itās still in development, but you might want to look into Mojo. Itās intended as a subset of Python with C-esque extensions that can be directly compiled.
Thatās exactly what I want to do but itās directly in Python and without losing its flexibility, library, etc.
I donāt understand what the Python interpreter has to do with a hypothetical Python compiler, even if they offer the same C API.
IMHO, Python is so dynamic that there is no JIT compiler that can fully compile it. There will always be trade-offs between the flexibility and ease of use provided by Pythonās dynamic features and the performance gains achieved by languages that are more statically-typed and compiled.
In theory, it is conceptually possible to compile Python entirely into machine code and run it on the processor. However, the complexity of this task is notable, primarily due to the inherently dynamic and flexible nature of the language. We are in a theoretical realm where the feasibility of such compilation resides on the threshold between the possible and the impractical.
The flexibility of Python, allowing dynamic manipulation of types and dynamic code execution, poses significant challenges for generating static code during compilation. This stands in contrast to statically typed and compiled languages, where the programās structure is more predictable during compilation.
While there are implementations like PyPy, which incorporates JIT techniques to optimize performance at runtime, the idea of fully compiling Python to machine code faces obstacles due to the need to preserve the languageās dynamic flexibility.
In summary, while the theoretical possibility exists, the practicality of fully compiling Python to machine code is challenging. Up to this point, the most effective implementations have adopted JIT approaches to balance the languageās flexibility with performance optimizations.
One example is the dynamic class definition. For that, you donāt necessarily need machine code; a hash table-like data structure is enough, and CPython already employs this in C. Simply put, what I mean is, you could find yourself interpreting the code at runtime, essentially creating something akin to CPython.