Make WASM a 1st class platform in the Python ecosystem

Hello from the Paris JuypterLite workshop.

In the past three days many people who are involved at various levels in the
Python/WASM ecosystem were in the same room: we had representatives from
Pyodide, emscripten-forge, JupyterLite, PyScript, Basthon and other projects
(sorry if I forgot any).

Among the many interesting conversations that we had, we discussed one point
which I think it’s good to share with the broader Python community:

what can we do to make WASM a 1st class platform in the Python ecosystem?

Before starting, it is probably useful to draw a parallel with existing
architectures:

  • WASM is basically the CPU, i.e. equivalent to x86 or ARM

  • on top of it, we have the “operating system” and/or platform: for WASM we
    have emscripten and WASI, so they are equivalent to e.g. Linux, Windows or
    MacOs.

By 1st class platform I mean that it should be easy to build, test and
distribute libraries and apps on Python/WASM.

Having WASM as an officially supported Tier-3 platform is a huge first step,
but in order to have a better developer experience we probably need better
tooling around it.

To better explain what I mean, let me write down some of the things which I
personally think they should happen:

  1. it should be easy for library authors to test their code on WASM: ideally,
    they should be able to do something like python-wasm -m pip install ...,
    python-wasm -m pytest, etc. The good news here is that both Pyodide and
    emscripten-forge have written tools which go in this direction. Note that for many usages, it is not even required to run them in
    the browser: it is totally possible to run and test the code inside node,
    from the command line.

  2. It should be easy to build and deploy packages. Ideally, it should be
    possible to upload WASM-specific binaries to PyPI and conda-forge.

  3. It should be easy to do the steps above in CI.

During the workshop we successuly manage to import packages built by
emscripten-forge into Pyodide, so it seems that we are at a point in which
this scenario is technically possible.

I think that these points are extremenly important for the long term
sustainability of the effort: currently, pyodide and emscripten-forge maintain
a big set of patches to be able to build a selected set of packages to WASM,
but if we make it possible to easily test it on CI, I can imagine that as a
community we can start to send these patches upstream.

Obviously it doesn’t mean that individual projects will be forced to accept
these patches, but at least it gives a possible path forward. Basically, we
would end up with a situation similar what we currently have with Windows:
many libraries are developed on Linux or Mac, but they are willing to accept
contributions to make them working on Windows, so WASM would just be another
instance of this pattern.

So, the first concrete question is: what does need to happen to be able to
upload packages in PyPI?

The second question is broader: I would love to hear general thoughts and
opinions on this, and collect suggestions for additional things which are
needed to make WASM a 1st class platoform.

15 Likes

I believe the question about PyPI should be put under Packaging, with a link back to this general question. But I don’t know how many packaging people know enough of what WASM/WASI/emscripten are to even ask good questions.

The CPU/OS analogy seems helpful, but raises the likely naive question “Why 2 OSes?” Who wants to step into the middle of an OS war? Is one likely to win? Or can one write ‘cross-OS’ code and not worry about the split?

Is it even sensible for me to ask whether at least a subset of tkinter could be or will be implemented? To me, as an IDLE maintainer, that helps make something a ‘1st class Python platform’.

Is that https://basthon.fr/ ? I don’t read French, but I see the word “Python”, so I suspect it is.

Get WebAssembly support in CPython to tier 2.

But it’s only for emscripten, right? To my knowledge there’s no WASI support.

Are we talking emscripten and WASI, or just emscripten?

I’m assuming the wheel file format is fine for this. In that case you will need to come up with a wheel tag spec for installers to be able to identify when a wheel file applies to them. And in that situation, you have different needs between emscripten and WASI.

In emscripten’s case, do you make it the libc for the build and that’s how you specify compatibility? Since emscripten’s support for various things is not standardized and thus can break with different SDK versions then I assume this would have to be the case for all of your packages to work together.

For WASI, you’re probably looking at specifying a world or the list of interfaces that are required (see https://youtu.be/phodPLY8zNE for what the “world” terminology means, but it’s basically a collection of interfaces). Now the question is who manages what worlds and/or interfaces are acceptable? And what about worlds/interfaces that overlap? I would assume you would need a way for the WASI runtime to tell you what worlds/interfaces it supports for the installer to know what it can (not) download.

If the wheel format is not okay, then you have to come up with an archive format that the files will be uploaded as first.

One is for the browser and one is not (emscripten is the browser-specific one).

Different target platforms, so there might not be a “winner”. But at least with WASI you can use a browser polyfill and it’s a standard, while emscripten can target WASI but it isn’t a specification.

Typically not (right now).

That would be more of a PyScript question and thus at a higher level since I would characterize emscripten and WASI as a platform and less of an OS, e.g. WASI is more like POSIX for WebAssembly, so there’s no concept of a UI, just syscalls.

https://console.basthon.fr/ shows a PyIodide-based Python ‘sandbox’ in the browser. I’m playing with it now.

@antocuni ‘bas thon’ translates as ‘low thon’ or ‘low python’. I feel I’m missing something. Is ‘bas’ an abbreviation for ‘base’ or something?

Somewhat OT, but some comments you can pass on to any English-speaking basthon dev.

‘permanent’ is misspelled as ‘permanant’ in ‘permanent’ before ‘link’ (in French)'.

There is no obvious way to properly enter a statement ending in a letter, such as a + b, in the file box.

Pasting a multiline statement with indents, like

for i in range(10_000_000):
    i += 1

in the shell omits the secondary prompt, so it looks like

>>> for i in range(10_000_000):
    i += 1

(I only fixed this for IDLE 1 1/2 years ago.) There is no auto indent after compound statement headers, like `if a:', and tab does not indent either.

** Running the statement above in basthon on Firefix only takes about twice as long as in CPython, same machine. I consider this good. **

I and others routinely put IDLE shell and editor windows side by side. Great that basthon does so. I am curious how one displays anything in the graph view.

I learned at the same workshop as @antocuni that “bas” stands for “bac à sable” in French i.e. sandbox in English.

Also I guess baston is likely a play on words with baston, since in French “baston” means fight or rather rumble, since “baston” is a bit informal.

1 Like

Ciao belli!

(For transparency: @antocuni and I are part of the PyScript team at Anaconda - I’m mostly working with MicroPython in a WASM context at the moment, although this will likely change in the new year.)

Some random, high level thoughts that could probably do with refining:

:+1: on web-assembly as basically just another CPU target.

OS - oh my. I think of Emscripten/WASI as something akin to win32, libc, POSIX level abstractions: they are the layer that brokers between our code and “device” capabilities such as a filesystem, network interfaces, and other “hardware”. There’s more subtlety to this context, and things are moving fast in this space… hence Dr.Evil air-quotes around “hardware” since “hardware” can, for instance, mean the browser sandbox.

The abstract notion of “Python”. Folks usually think Pyodide (for good reasons - an extraordinary amount of great work has gone into it) – but it’s not the only “Python” in WASM town. There is, of course, CPython (based on Christian’s wonderful work) and MicroPython (which has a webassembly port, that has been gaining recent attention and work) and Zython (zython.org) - a version of Python built with a Zig based alternative to Emscripten. I can’t help but think that we, as a “scripting” / interpreted language, will also be walking / paving the same cow paths as folks in the Ruby, Lua and similar communities. I mention this diversity of interpreters in the hope we are able to share and build together, rather than re-invent various types of wheels. It isn’t so much a technical challenge as a cultural one requiring empathy, an appreciation of the diversity of the eco-system and an open mind/flexibility to allow us to both contribute and cherry pick from aspects of this area of concern. I hope we can think of ourselves more like musicians in a symphony orchestra (contributing to something greater than the sum of its parts) rather than soloists trying to play over each other. :wink: :musical_note:

TK et al - if we removed tkinter from Python then turtle doesn’t work, and a million teachers and their students will raise their voices in protest (we had to pull a few funky hacks to get tkinter working properly in the version of Python we bundle in the Mu editor, for this very reason… one of our primary groups of users – teachers – told us they wanted turtle). I recently mentioned to @brettcannon that tkinter is to Python UIs as Roman numerals are to counting… i.e. it’s old, and there are perhaps “more modern” or feature-full ways to achieve the same ends, but folks still use it for all sorts of subtle technical, historical and cultural reasons, and we should respect that. YET… while web-assembly maybe just another CPU target it’s also often conflated with running in a browser context. Clearly this is not the only way to execute WASM given projects like Wasmtime. But it feels odd to port tkinter to the browser, or create bindings (in some way) so Wasmtime might run a Tk based app, in the same way that it would feel odd to add Roman numeral capabilities to Numpy - I mean, you could do it, but it feels like a strange thing to do. Brett made a fascinating point, during the recent TalkPython podcast, about the scope of Python… “Python” being the language separate to the Python standard library, while acknowledging that they are fundamentally intertwined. I like this, and it is important for several reasons:

  • In a web context, thanks to network costs, you only want to download the things you need. Python in the browser probably doesn’t need most of the standard library. Or, put another way, how do our users get the version of Python + a standard library that they require to fulfil their diverse needs…? In this context “batteries included” isn’t really a feature but a burden. Having a clear story about this will help us better define what we mean by “Python in the browser”. Damien et al over in MicroPython have lots of experience in this area, because you clearly have to make choices about what to include (or at least, how to configure what to include) when you perhaps only have 256k of flash memory and 16k of RAM.
  • I know there are moves afoot to retire aspects of the standard library. As I understand it, these are very conservative in nature and for every “surely we don’t need this any more?” there are a greater number of “but we still use it for X!” answers. I wonder if anyone has explored means of partitioning the existing standard library – or perhaps defining subsets of the standard library – such that stuff isn’t so much thrown away, but thrown into different buckets where each bucket has different support, activity and maintenance characteristics. But this starts to feel like we get to meet…
  • The elephant in the room: packaging. :elephant:

I won’t say much on packaging, except that I think in 2000 years time digital archaeologists will look at us and say something like, “hey, they had the same problems we still have”, in the same way we do when we look at Roman inventions like hypocaustum. I’m reminded of Kierkegaard’s comment that I’d paraphrase to, “packaging isn’t a problem to be solved, but a reality to be experienced”. :slight_smile: There’s a danger, as engineers, that we see everything as a technical problem “to be solved”. I can’t help but think it’s another musical analogy: I like classical, you like jazz and she prefers hip-hop. Let many flowers bloom and perhaps sometimes we should just appreciate the dynamic, diverse and colourful world in which we live while being thankful for our own little corner of it.

TL;DR: (as usual) perhaps the most important challenges we face will be of a cultural rather than technical nature. If we keep this in mind, our “technical” outcomes will be richer, and reflect a more nuanced or enlarged view of the possibilities. Given the crowd on here, I’m definitely teaching granny to suck eggs, but I’m always fascinated by this intersection of technology and culture. :wink:

18 Likes

Very nice piece. Gives me hope.

4 Likes

You missed the pun! :wink:

It already happened in Python 3.11: PEP 594 – Removing dead batteries from the standard library | peps.python.org

It’s been lightly discussed whenever the idea of breaking out the stdlib has been made. Hoping to have a deeper conversation at the language summit at PyCon US 2023.

Wow, tons of interesting topics to talk about. I’m glad that my post raised interest.
Let’s try to summarize/reply to some of the discussions.

WASI vs emscripten vs OS-war

Maybe the comparison with the OS was not the perfect one, sorry if it increased confusion instead of reducing it.
The way I see it is that both emscripten and WASI are “environments” which provide access to various resources, most notably the virtual filesystem. From some point of view they play the role of an OS, from other point of view they play the role of libc (but not only).
E.g., for the FS emscripten provides both the implementation of it (akin an OS) and POSIX APIs to access it (akin the libc).
A lot of the runtime library of emscripten is implemented in JS, so executables compiled with the emscripen toolchain can run only in the browser or inside node.js.

WASI provides only a POSIX-like API, and it’s up to the host to provide an implementation for it. Because of that, it is possible to run WASI programs “on the server” with other WASM runtimes such as wasmtime, wasmer, etc. Or, as @brettcannon said, it is possible to write a JS implementation of WASI and run it in the browser.

Another big difference is that emscripten provides support for dynamic linking (i.e., dlopen(), but not only) which is needed to actually import extension modules in Python. AFAIK WASI doesn’t support that yet.
Moreover, the emscripten ecosystem implements other well-known libraries on top of the browser, e.g. SDL (using the browser’s canvas to actually draw stuff).

The end result is that right now, emscripten is a more complete environment, especially if you target the browser, and that’s why pyodide and emscripten-forge use it.

Personally, I think that both environments will survive in the medium term because the solve different needs. Maybe in the long term they might converge, but it’s hard to predict.

I think that emscripten is more mature and probably easier to support right now. So a good strategy could be to support emscripten first, but keeping in mind that at some point we will want to support WASI as well.

WASI worlds

This is a good point. I imagine that in order to fully support the WASI model we will need a more complex dependency system in which:

  1. the host system declares which WASM components/interfaces it provides
  2. each package declares which WASM components it requires and which ones it exports
  3. the installer has complex logic to determine what is needed (e.g., by downloading packages which implement/polyfill WASM components but only if the host system doesn’t offer it natively, etc).

But I think this is a much broader discussion than just “let’s publish WASM-emscripten wheels”.

WASM wheels on PyPI

As I said above, I think I would be happy enough with emscripten-only wheels for now.
Also because AFAIK WASI doesn’t support dynamic linking, so it’s not even technically possible to publish wheels for it (is it?)

Yes, I think the wheel format should be good enough, since Pyodide is already using it effectively. Maybe @rth or @hoodmane can comment more on this, though.
But yes, I think it’s probably a good idea to move this part of the conversation on Packaging .

From what I understood the main requirement is to make sure that the dynamic modules are compiled using the very same version of emscripten as the main python executable, since the details about dynamic linking change often between emscripten versions.

More concretely, I think that a good practical way is to declare that a given CPython version supports a precise emscripten version, and when you are building a wheel you need to ensure that you are using the right one.

Tier 2

What is missing for that? Looking at the PEP it seems that the only extra requirement w.r.t tier 3 is to have two core developers taking care of it, and WASM already has both.

Tkinter/IDLE/turtle

In theory, I suppose it should be possible to implement Tk inside the browser and thus run Tkinter/IDLE in WASM. In practice, I suppose that it’s not going to happen unless someone puts serious efforts in doing it.

For the specific case of turtle, basthon provides its own implementation which uses SVG to draw the turtle. I tried to open a basthon notebook and run the following code, it seems to just work:

import turtle as t 
for i in range(4):
    t.forward(50)
    t.right(90)
t.done()

I think that their turtle module is based on the one implemented by brython, but I admit I don’t really know.

Basthon-specific questions

I am not involved with the project. I pinged some of the maintainers and sent them a link to this discussion, in case they want to intervene.

1 Like

Correct, WASI does not have a dlopen() equivalent.

And I’m the exact opposite. :grin: We’re using WASI for Experimental - Python for the Web - Visual Studio Marketplace , so that’s where my motivation/interests lie.

Depends on what you want in the wheels. If we allowed for .o files to be shipped instead of .so files then you could do a static link to a Python WASI binary. Toss in freezing Python code and you end up with a self-contained WASI app for Python code.

There’s also the other direction of using WASI to load code into Python itself, e.g. Support WASM wheels on PyPI - #4 by konstin .

Are you saying you would want e.g. CPython 3.11 to always be built with emscripten 3.1.26 (and I don’t know how strict the versioning would have to be as I don’t know what API guarantees emscripten makes)? Or that simply a CPython build would need to be able to declare what version it’s built with as the platform and that there could be various CPython builds out there and thus various wheels for different emsripten versions? And if it’s the former, who decides which version gets used and does the builds? And how long does emscripten maintain older versions, so could CPython 3.11 rely on 3.1.26 or 3.1 working in five years time?

You’re also going to have to deal with emscripten’s various back-ends, like storage. That will change whether the build is for Node or the browser, so it will potentially have to extend beyond just emscripten version depending on what the target outcome is.

This is going to require some PEP to define all of this much like manylinux.

Christian hasn’t been available as of late, so it’s just me right now (and I don’t know if that’s a permanent thing). The SC also has to approve the switch to tier 2, so it also depends on what the SC wants to see happen.

It also doesn’t help that the emscripten builds have gotten a bit of a reputation of being hard to debug when the buildbot picks up a failure (typicaly lack of stack space). So there might be asks to improve that situation when it comes to other developers trying to debug why their change broke the buildbots to prevent their PR from being reverted in 24 hours. For instance, the emscripten builds were broken for a few days last week due to stack issues; luckily they fixed themselves but people had a hard time getting a build going to try and debug why things failed.

Pyodide is thinking of going in this direction, but I agree that it might not be suitable for CPython as emscripten does not maintain older versions.

In terms of dynamic linking, Emscripten makes no ABI guarantees right now (though it doesn’t break very often). Probably we need to cooperate with Emscripten devs to make a progress on it.

3 Likes

That’d be too obvious, even for a dad-joke aficionado like me. :slight_smile:

I’m enjoying this discussion, and can’t help but think we’re in a sort of settling stage as the detritus of colliding computing paradigms form a murky river of web-assembly flotsam and jetsam. Here and in discussions elsewhere, I see so much, “X in WASM is like Y in [some established paradigm, like the OS, or a browser etc…], but not quite.”

Do you know the Zen story about a group of blind monks encountering an elephant for the first time…?

When they compare notes the monk who held the trunk claimed it was a huge snake, the one holding the tail a small snake, the one holding the ears a sort of winged creature and yet another described it as a large column (the legs) with the last monk saying it had a smooth and tapering shell (the tusks). Each monk brought two aspects: their own unique way of describing things, and the actual part of the elephant they encountered. Only by comparing notes and listening to each other did they actually figure out what the elephant was. :elephant:

I guess WASM is the elephant, and we’re Pythonic “monks” describing our little part of the WASM story to each other from the perspective of our own areas of interest. I hope we can, together, realise an enlarged picture of the creature we find before us. :wink:

As case in point: my interest is educational.

Clearly, I see WASM as an opportunity to deliver Python and facilitate empowerment and learning for beginners whose primary computing is via a browser or a device capable of running a browser. Yet there are others who see WASM as an opportunity to connect JavaScript and Python together (for example, plugging browser based interactive data visualisation libraries into Python-in-the-browser-doing-on-the-fly-data-analysis). A third group of folks see WASM as some sort of Python containerisation solution for running untrusted third party code in a sandboxed and tightly controlled environment. And so the list goes on. I’m just enumerating the trunk, tail, ears, legs and tusks of WASM.

All this to say the situation is fluid (the elephant moves!) but together we can figure out how to guide the WASM-elephant to do helpful things for us Pythonic folk. Like the monks, not only should we share notes, opinions and hacks, but we should do so with the enlarged understanding of there being a large[r] metaphorical elephant in the room. :smiley: (See what I did there, @brettcannon…?)

Finally, in this spirit of sharing… I’ve managed to get MicroPython driving and reacting to the DOM. You can try it out here – a goofy English → Pirate speak translator (built on top of the MicroPyScript test harness I built for MicroPython). To see what the end-developer has to code, just “view source” for that page to get a flavour. The actual source code for the DOM interaction aspects is here (worth pointing out, because it’s just pure JavaScript and Python with some shonky message passing, it also works with Pyodide, should work with web-workers, and could work with other interpreted languages compiled to WASM, so long as they implement something equivalent to what is found in polyplug.py in that repository).

IT’S VERY EARLY DAYS and a bit of a hack, so don’t expect much, except for refinement and changes in the new year in terms of both the implementation and the API it provides (I’ve already had Anaconda colleagues provide really useful feedback). It works well enough on mobile browsers, my current north star for navigating these waters. It turns out that print("Hello world") isn’t as exciting for a learner as having something simple running on their primary computing device - like a Pirate-talk app :pirate_flag: .

In the spring I hope to be able to integrate this work into “real” PyScript and start on making app development frameworks that sit on top of the PyScript “platform”, especially so the frameworks are hopefully simple enough for beginners (viz. the experimental PyperCard project I did with a bunch of teenagers based in London, just before COVID).

Season’s greetings folks, and happy new year. Here’s to a Pythonically WASM-ish 2023. :rocket:

3 Likes

Ditto, I just think our delivery mechanisms happen to be different.

I think the key part of this good point is figuring out the whole elephant concerns are compared to the elephant part concerns; while we want to keep the elephant healthy, the trunk has different needs to keep it healthy compared to the tusks. So I think we are at the stage of figuring out where the WASI part and the emscripten part come back together to a common WebAssembly point. Right now it’s looking like packaging is not a common point in terms of key technical details.

Yep! :slightly_smiling_face:

FYI, Nicholas literally means “view source”; the inspector in e.g. Firefox will not display the Python code:

image

1 Like

Preach it, brother.

Wherever learning, understanding, or the sharing of skills and knowledge is concerned there is an educational angle. While I’m clearly biased, I believe developers should cultivate an interest in educational skills, thinking and reflection because so much of our technical work is orthogonal or informed by these concerns.

This :point_up:, so much this. Worth adding that while there might be “trunk specialists” who geek out over all things proboscis, for them to appreciate the function and use of the trunk they’d need to understand the wider elephant.

As for packaging (see my earlier point WRT hypocaustum), such a situation indicates to me it’s not a technical problem, but a reflection of the different needs of diverse types of coder and technical contexts. A pragmatic response is to let many flowers bloom, and hope folks keep channels of communication open so it’s clear how one or another methodology in this space relates to each other… so end-users can work out what best suits their unique need. I like classical, you like jazz, she likes hip-hop, etc… :wink:

The code is so short it can be quoted in full (see below). For those who are wondering, I created the arrr module as a vehicle for learning with intermediate coders, who may be taking their first steps in writing, releasing, documenting and collaborating on their first Python module.

<form id="inputForm">
  <label for="english">Translate English 🇬🇧 to Pirate speak 🏴‍☠️:</label>
  <input type="text" name="english" id="english"
    placeholder="Type English here..." />
  <p><input type="submit" value="Translate"/></p>
  <div id="output"></div>
</form>

<py-script>
import arrr
from polyplug import plug, update, receive


@plug("#inputForm", "submit")
def handle_form(event):
    """
    Take the English input from the form, turn it into Pirate talk and update
    the DOM with the result.
    """
    english = event.target.find("#english").value
    pirate_text = arrr.translate(english)
    output = event.target.find("#output")
    output.innerHTML = f"<p>{pirate_text}</p>"
    update("#output", output)
</py-script>

(I’m already thinking of changes to this API - for instance, moving the update function to the instance so you’d do: output.update() to commit the change to the DOM.)

1 Like

In my Firefox 108.0.1, the Console tab shows the content shown above, under ‘Evaluating code’.

Are you saying you would want e.g. CPython 3.11 to always be built with emscripten 3.1.26 […]? Or that simply a CPython build would need to be able to declare what version it’s built with as the platform and that there could be various CPython builds out there and thus various wheels for different emscripten versions? And if it’s the former, who decides which version gets used and does the builds?

As mentioned above, from the Pyodide side, we plan that there would be one fixed version of Emscripten for CPython version pyodide#2951, we will start doing this from Python 3.11. For now, we have an informal agreement with emscripten-forge to also use this Emscripten version, but IMO eventually it would be better if this was decided at the CPython level, the same as it is/was for MSVC.

And how long does emscripten maintain older versions, so could CPython 3.11 rely on 3.1.26 or 3.1 working in five years time?

That’s a very good question and we need to discuss this with Emscripten devs. In the past, we have run Emscripten outdated by 1.5 years without major issues, but 5 years is a long time in this space. Even just installing a 5 year old browser is not very straightforward, as most Linux distributions actively try to prevent users from doing that.

At some level though, if you take manylinux1 (GCC 4.2 from 2007), manylinux2010 (GCC 4.5 from 2010), etc the included version of the build toolchain is no longer supported either (in the sense that upstream will not accept fixes for it), yet we still can build corresponding wheels. I guess here we could do the same with some self-contained docker images for Emscripten.

In that case you will need to come up with a wheel tag spec for installers to be able to identify when a wheel file applies to them

If we set platform when building CPython (which is already done in 3.11) e.g.

>>> import sysconfig
>>> sysconfig.get_platform()
'emscripten-3.1.14-wasm32'

then using pypa/packaging.tags.sys_tags to determine if a wheel is compatible with a current runtime should just work, as for any other platform. That’s what we are currently doing in micropip.

2 Likes

I’ve just copy/pasted your paragraph on packaging to a couple of friend chats I’m in. I plan to refer back to it in the future, whenever I encounter someone who is convinced that “there is an ideal solution to packaging, why can’t we just …”

Note that, whatever comes after the “just” will likely already be implemented in some existing packaging system, and found to be either “not quite enough”, or “perfect, unless you have these other requirements…”

2 Likes