Status of WASM in CPython's `main` branch

For those of you who may not know, there is work to get the main branch of CPython to work in WASM unpatched. You can read about it at cpython/Tools/wasm at main · python/cpython · GitHub and see nightly CI for it all at GitHub - ethanhs/python-wasm: Build scripts and configuration for building CPython for Emscripten . At this point:

  • Support for Node and browsers via Emscripten is there (i.e. the test suite should pass w/o modification).
  • WASI support is experimental.
  • @ethanhs is working on the Buildbot code so we can get a Buildbot running to classify WASM support as tier 3 for Emscripten under Node (since that’s easy to test) w/ @tiran and myself listed as the contacts.
  • The goal is to reach tier 2 (or even 1) for Python 3.12, maybe even distribute built binaries.
  • I would also like all of this for WASI.

@tiran has a keynote from PyCon DE on this topic, but I don’t know if the video is available yet.

And for those that don’t know, packaging is not handled by the Python core team, so none of these plans include a packaging story.

11 Likes

The recording of my keynote should be available in a month or two. The slides for PyCon DE 2022 keynote and PyCon US 2022 Language Summit are at:

The README.md in the Tools/wasm directory contains build instructions, known limitations/restrictions of the Emscripten platform, and various other pieces of information around CPython on WASM.

wasm32-emscripten builds come in various build flavors and additional feature flags. browser builds use preloading to ship pre-compiled stdlib bundle with a MEMFS data file. node builds have NODERAWFS enabled to directly access the raw file system of the host system. --enable-wasm-dynamic-linking enables dynamic linking, dlopen and SIDE_MODULE support. The option works with browser and node builds. --enable-wasm-pthreads enables pthread support and pthread to worker proxying. The option only works for Node builds. Dynamic linking + pthreads + libffi is still unstable, though.

WASI builds currently require a 3rd party library called WASIX to provide stubs for POSIX APIs like pthread. WASI’s sandboxing module and capability driving security concept makes testing tricky.

My goal for 3.11 was to upstream all Pyodide downstream patches, automate build system, and have the test suite pass in Node. IMHO we have accomplished these three main goals.

For 3.12 I like to look into better debugging support and stdlib file size. @hoodmane, @rth, and I discussed the matter during PyCon sprints. In the long run we might want to consider folding some Pyodide features like JS <-> Python bridge into CPython core and invite Hood + Roman as core devs.

5 Likes

Driving the size down would be great! Although it’s already down to around 6MB today?

I was just thinking about the js module and how upstreaming that might make sense since it isn’t making an opinionated API decision by directly exposing JS objects. So SGTM! :slightly_smiling_face:

2 Likes

Just out of curiosity, is this with or without the modules formally deprecated and to be removed in your PEP 594?

2 Likes

It’s with the modules. What I don’t remember when @tiran last quoted the amount at me is whether it includes the test suite or not?

2 Likes

It’s with some PEP 594 modules. The stdlib bundles omits all network modules and ships pre-compiled PYC files.

$ du --si -c python.wasm python.data python.js 
5.9M    python.wasm
3.3M    python.data
160k    python.js
9.3M    total

# emulate HTTP transport compression
$ gzip python.wasm python.js

$ du --si -c python*.gz python.data 
41k     python.js.gz
2.2M    python.wasm.gz
3.3M    python.data
5.5M    total

The pyodide bundle is much larger because they ship more stdlib modules, include more extension modules (sqlite, ctypes), and have more features.

$ du --si -c pyodide.asm.*
5,4M    pyodide.asm.data
2,0M    pyodide.asm.js
9,5M    pyodide.asm.wasm
17M     total

$ gzip pyodide.asm.*
$ du --si -c pyodide.asm.*
3,5M    pyodide.asm.data.gz
324k    pyodide.asm.js.gz
3,3M    pyodide.asm.wasm.gz
7,1M    total
5 Likes

Thanks for creating this category and very much looking to tier 2 support for WASM!

I was just thinking about the js module and how upstreaming that might make sense since it isn’t making an opinionated API decision by directly exposing JS objects.

I suppose the issue is that it’s difficult to do a lot of WASM-related work without some kind of JS/Python bridge. At the same time currently, it is still being actively developed (e.g. significant work for threading support needed when it happens) and also it’s not just C code. There are also py / and js libraries which come with it. CPython would likely not need all the functionality (e.g. loading packages), but still, it would be basically upstreaming the core of Pyodide. And then the question is what would be the advantage for users and contributors of including all these things in CPython as opposed to it being a standalone dependency one could pull when building CPython for WASM (which is I think what we would prefer on the Pyodide end so far).

Driving the size down would be great!

I’m very much looking forward to collaborating on this subject. Maybe we should open a separate thread. As far as I can tell, as soon as one starts using packages, the issue is more the init time due to having to compile in the browser all the .so libraries, rather than size by itself.

BTW, Christian in your above comment you are comparing compressed and uncompressed sizes: Pyodide has more modules (and is likely not as well optimized in that area) but it’s not 3x as heavy :slight_smile: It’s around 7MB when served from the CND for the base REPL with the stdlib.

3 Likes

Thanks for pointing it out. I have updated my reply to include compressed size.

2 Likes

I think in the long run upstreaming the Pyodide bootstrap would make sense if you are aiming to have Emscripten as a “real” target. I think what is currently in the CPython repo would be a bit like if Python only shipped libpython.a and a demo executable, but expected most people to link their own executable against libpython.a.

As Roman says, the issue is that from js import ___ is bootstrapped pretty deeply into Pyodide’s core systems. I think it is plausible that most of the stuff that we have in our src/js could be cut out, but we still need most of src/py/_pyodide and some of the stuff in src/py/pyodide.

There is a big question of how to organize things. Our current approach is that there are ~4 different ways of getting JavaScript into the final bundle:

  1. EM_JS (and a very small amount of EM_ASM) for short blobs of JavaScript that are helpers for C code
  2. include_js_file.h and an include of a separate js file, for longer blobs of JavaScript that is primarily called from C.
  3. Type script source file but we run it through the C preprocessor so we can use macros from jsmemops.h to index C structs, for code that is primarily called from JavaScript and calls into C
  4. Normal typescript

Examples:
2. js2python.js and python2js_buffer.js
3. pyproxy.ts and error_handling.ts

All the typescript code (other than the main loader in pyodide.ts) is bundled with rollup and then injected back in with --pre-js.

1 Like

Then maybe we should ignore what I suggested. :wink: I do still want to get WASI working and that could be the general focus for CPython on top of the clean building with emscripten for Node and browser.

4 Likes

There’s a lot of details between the current state and this, but it seems like it should be possible to refactor the Python/JS bridge to just be an extension module. It mainly just uses the public Python/C API and adds an import hook to make the from js import __ magic work. The tricky bit is probably having enough around and working to load that package at run time – is dynamic linking working enough in upstream CPython to load an extension module?

1 Like

Paul has created bindings to emscripten and browser API that do not need external JS and TypeScript files, emscripten embed.

The tricky bit is probably having enough around and working to load that package at run time – is dynamic linking working enough in upstream CPython to load an extension module?

Dynamic linking in CPython upstream works – but not with threading enabled. Proxy to pthread + side_module sometimes crashes in unpredicable ways. I have not figured out why Python under Node crashes. It looks like WASM memory corruption to me.

Why do we need dynamic linking anyway? What prevents us from linking the JS bridge as a static module?

1 Like

Why do we need dynamic linking anyway? What prevents us from linking the JS bridge as a static module?

We don’t strictly need it. It might make it easier to ship the Python/JS bridge on a faster schedule than Python itself (the usual “things slow down when they end up in the standard library” problem). But it could still be statically linked even if in a different repo, and the CPython build system already has good support for statically linking modules.

2 Likes

Paul has created bindings to emscripten and browser API that do not need external JS and TypeScript files, emscripten embed .

We don’t absolutely need external Js and TypeScript files, but for maintenance reasons if the project is sufficiently complex it is easier to manage an external file. For example, pyproxy.ts is 1500 lines of code. Having it as a separate file with typescript syntax highlighting that can be handled with typescript tooling makes maintenance a lot easier.

refactor the Python/JS bridge to just be an extension module

I still believe that this is impossible.

It mainly just uses the public Python/C API and adds an import hook to make the from js import __ magic work.

The main thing it does is create a foreign function interface to allow JavaScript objects to be used from Python. Without this, it doesn’t even really make sense to from js import blah because the result of an import must be a PyObject* of some sort.

It is true however that if the foreign function interface were compiled into the main module, the from js import blah magic could in principle even be added in a pure Python extension module if you wanted that. Everything in src/py/pyodide could be separated out. I’m just not sure that this is a useful thing to do.

It is true however that if the foreign function interface were compiled into the main module, the from js import blah magic could in principle even be added in a pure Python extension module if you wanted that.

That’s exactly the point. And the first use case is the cpython testsuite in the browser ( could be in Ethan’s buildbot maybe too with some headless browser ) as it is using those modules to set things up speaking directly to emscripten “os” api without need for C or javascript.

It would also allow for people to have documentation closer to python usecases as emscripten’s one is C oriented.

sidenote: those modules i’ve slightly modified for 3.11 come from Panda3D webgl-port engine and are already used and tested since patched 3.8, are only 2 commented C files with no deps that are meant to be included directly in python main.

testsuite url changed its now https://pygame-web.github.io/archives/0.3.0/testsuite.html#all