Embedding Python in a runtime with no libc

Hi! I’m interested in embedding Python into an almost bare metal application (it actually runs on a normal x86 Linux machine, but has no access to libc). Basically, this vendor has an application that will take my applications (compiled as .so) and load it into their runtime. It’s honestly very analogous to a Linux kernel module (i.e. it has an init function that acts as the main, there is no strcpy or fopen, etc.).

Is there a way to port Python to a runtime like this? My dream would be to have a Python REPL that could even import packages that I pip install (understanding I’d have to package them in a special way for them to be loaded in memory). How

I imagine I’ll have to compile Python without libc and just work through the 100k errors as it complains about not having mmap or fstat or the other thousand libc functions it relies on. Any strategies?

CircuitPython or MicroPython perhaps?

What you’d wind up doing for CPython is effectively providing your own libc. So you either need a different Python with minimal requirements (like the above bare metal focused ones), or you’re going to pull in some form of libc whether you write it yourself or you pull in one of the smaller non-LGPL libc implementations and statically link it in.

Also - if anyone has already bundled a WASM runtime for your unusual environment you could instead target running within that.

3 Likes

I briefly looked at MicroPython, but dismissed it for no reason other than it wasn’t built to solve my problem and feels like it may just attempt to address a lot of problems I don’t have. Not a negative comment of it at all (it looks like a great project!), just not what I needed.

Let’s entertain that I would provide my own libc, would I just ./configure it with CFLAGS with -nostdinc and LDFLAGS with -L/path/to/mylibc -lmylibc?

Probably something of that shape. This is very much “on your own” territory as no software expects to be built in such an environment so there are probably sharp edges in plumbing the necessary build and linking bits through to be found.

2 Likes

You might want to have a look at a completely statically linked version of CPython. This essentially has the libc code compiled into the executable, so there’s no shared dependency on libc.

BuildStatically - Python Wiki should get you started.

There have been several attempts at projects for having a completely static build of Python, but most have been abandoned.

Note that you most likely won’t be able to import shared Python extensions modules, since those will typically need a shared libc as well, but regular Python scripts, modules and packages should work fine.

3 Likes

I’m doubtful that it’ll play well with your embedding/compiled as .so requirement but staticx can take an existing application and embed the libc to make it fully static.

Unfortunately, just statically linking libc won’t work as the runtime is under a fairly restrictive seccomp configuration. So instead of write(1, “Hello”, 5), I would need Python to call myRuntimeWrite(outPtr, “Hello”, 5) where outPtr is some state that I need to track (analogous to fopen). myRuntimeWrite is linked at load time when the runtime loads the built .so into memory.

I don’t quite follow. write() is a libc function and so would be compiled statically directly into the binary. It would not show up as a function in your binary for a seccomp configuration to intercept.

Perhaps you are referring to the next lower level: the syscall OS interface ?!

write() as all other libc functions requiring access to the OS, use the syscall OS API (usually via assembler routines). If you have a seccomp layer preventing such direct access, you will probably need to write a translation layer between the libc you’re using and this seccomp layer.

If you want to go down that path, I’d suggest using musl libc as libc, instead of glibc, since that’s MIT licensed: musl libc

Sorry, yes that’s what I mean. I could statically link libc, but when I use the write libc function, and it in turn calls the write syscall, the runtime will kill my process.

Ultimately, anything that calls a syscall will not work (there are a few exceptions, but they’re not really relevant here). I think you’re right that I will need to write a shim layer to handle the syscalls myself.

I’m going to explore building Python for the WASM target, and building my own wasi-libc equivalent to handle those functions myself. It seems like a more “correct” solution than trying to slowly work through whatever libc functions Python relies on.

Thanks for the suggestions everyone (really didn’t expect this much assistance so fast, I love the FOSS community)! If I get this working, I’ll try to make a GitHub repo with my solution.

2 Likes