Interpreter dispatch and performance on WebAssembly

@markshannon suggested at pycon that we might use tail calls for interpreter dispatch to improve interpreter performance in webassembly. I did some experiments based on Eli Bendersky’s 2012 blog post and found around a 12% performance benefit.

With native clang, tail calls have a 10% performance benefit over switch whereas in Emscripten they have a 12% performance benefit.

In gcc, I reproduce the same 20% performance benefit that Eli Bendersky quotes for computed goto over switch. Interestingly, switch-based dispatch is faster in clang than in gcc and computed goto is slower so computed goto has a much smaller 10% benefit over switch in clang. With gcc, tail calls are 8% slower than computed goto whereas with clang tail calls are 1% faster than computed goto.