Rewriting CPython in assembly (NASM) instead of C – theoretical feasibility?

Title: Rewriting CPython in assembly (NASM) instead of C – theoretical feasibility?

Body:
I have a minimal x86 operating system written in NASM assembly (16/32-bit hybrid). It has a bootloader, FAT12 driver, shell, and basic commands. Everything is in assembly – no C, no libc.

I want to run Python on my OS. Not MicroPython – full CPython.

But instead of porting CPython as-is (which is written in C), I want to rewrite it entirely in assembly (NASM). The C code would be removed completely and reimplemented manually.

I understand CPython is ~2 million lines of C. I understand this is insane. I understand it would take years.

My questions are purely theoretical:

  1. Is there ANY official documentation or research on rewriting CPython in another language?
  2. Would a hand-written assembly interpreter for Python bytecode be feasible in terms of performance? Or would the lack of a proper compiler optimization make it slower?
  3. Has anyone ever attempted something like this before?

I know MicroPython exists. I know about C extensions. I am asking about a pure assembly rewrite for educational purposes and as a long-term challenge.

Thank you.

  1. I don’t know what kind of documentation you mean–there is the language specification, which is all you should need[1]. I don’t know what a guide to rewriting would look like–by definition, you are doing your own thing, and there can be no guide for that.

  2. It might be theoretically possible to make a faster interpreter with hand-written assembly, but it definitely not feasible. There’s just too much stuff there, and too many use cases to cover. It’s hard enough to accomplish in C.

    2.1. I’ll add that any theoretical performance gains from hand-written AsmPython are going to be totally overwhelmed by the overhead of the OS, unless you’ve already implemented a whole lot of clever tricks there.

  3. There are many other implementations, some of them current and some them in the past. I don’t know if anyone has tried assembly, because of the answer to 2.


  1. although there might be edge-cases that are ambiguously specified, and CPython is the reference ↩︎

What’s the difference between the assembly you intend to write,
and the machine code gcc would emit, if C Python was compiled with it, targeting the same platform?

A crazy project for sure :smiley:, but might be fun. Something to consider to slash the pain::virtually all C compilers support an option to show the generated assembly code. So compile Python with C, and start from that output. I don’t think many (if any) produce NASM output directly, so it reduces the problem to finding (or building your own) tools to convert.

But building your own OS and “no libc” may well be even harder. Python builds on top of OS and libc facilities - and in all they typically require much more code than Python itself.

I haven’t counted, and don’t care to, but various sources on the web say that’s about a factor of 4 too large. But some sources don’t count blank or comment lines, and/or don’t count header files. Regardless, it’s “a lot” :wink:.

Note that unlike Python and C, which are general purpose / use languages, assembly language is core specific (tied to a specific Instruction Set Architecture). What works for one platform will not work on another. Please be aware of this before you venture on this journey.

The language is probably less significant here than the lack of libc. Rewriting CPython to not require libc is going to be a huge job. Does it have to be CPython specifically, and not your own implementation of Python optimized for your situation?

That sure is a big project to tackle, but when I’m being honest, I’ve thought about doing something similar before too (just making it for my own ISA), and I do think it would be a cool challenge to tackle, especially if you are able to add some optimizations for it. I’ve always found it cool how low-level code can be translated to assembly quite easily (using no OOP of course), which is quite fun.

“It depends”. In practice, it’s still the case that portions of speed-critical apps are coded in hand-written assembler. For example, libraries like GMP have assembler cores for speeding bigint arthmetic. Assembler can access important HW features that aren’t exposed by portable C (like a CPU’s “carry” flag, or an instruction to count the number of trailing zero bits in a word).

And “Intel assembler syntax” doesn’t hide that x86-64 architectures evolve over time, adding more and more instructions, and changes in, e.g., how many integer multiplies can be simultaneously working. The GMP library has different assembler kernels for different specific major CPU architectures, and for significant major variations within a single architecture family

There’s also that, e.g., major C compilers can now magically “inline” function calls, removing the overheads of function call/return by generating assembler directly at the call site. That’s a trick you can’t automate in by-hand assembler, short of major tricks with macro expansions (so that there’s no “call” to begin with).

On and on.

I started my career working on Cray Research’s FORTRAN compiler. That was an early 64-bit “supercomputer” aimed at peak speed for floating-point code, and introduced “vector” instructions into the mainstream. The compiler was entirely written in Cray assembler. In a way, great fun - but also great pain. It was in many ways a nightmare. You haven’t lived until you try to write an involved text-processing program in assembler for an architecture that didn’t care about text :wink: For example, all the floating-point hardware is close to useless for that, and the smallest addressable storage unit was 64 bits.

By the time I left Cray, the maintenance burden had become unbearable, and a new compiler and OS had been written, in a homegrown variant of Pascal. Very much easier to work with, but also slower compilation times. We knew “by hand” tricks to speed text processing in Cray assembler that were beyond what the Pascal compiler could generate. Which latter improved over time, but we weren’t in the Pascal business, and “fastest compile times on the planet” didn’t sell multi-million dollar machines :wink:

It remembers me a co-worker that wanted to rewrite Linux in Assembly… :stuck_out_tongue:

I don’t know why you discarded MicroPython. It’s more simple to start with a simpler version of Python. There’s also CircuitPython, that’s simpler than MicroPython.

Anyway, AFAIK usually the people starts with a simpler language, then they optimize the critical sections in a more difficult language. And that’s what you can do with C: you can write some parts of the code in Assembly. Why writing all the code in Assembly? It’s a footgun.

I think it’s more interesting rewriting all the code in Rust. Anyway, an impossible task for a Solo Leveling dev :slight_smile: