[WASI] Importing namespaced 3rd party C extensions (with import cycles)

I’ve been trying to get C extensions working in the WASI build of Python. I started with a simple one that lives at the top-level, statically linked it, and used PyImport_AppendInittab in my C wrapper and it worked just fine. I’ve moved on to a more difficult test: numpy. I’ve been able to get numpy to compile to Wasm as static archives and I can link them with a static Python library, however, I’m having a problem importing numpy.core._multiarray_umath.

Importing that library seems to cause a cyclical import. Its module initialization function imports numpy.core._exceptions which causes some other imports to happen, eventually reaching numpy.core.multiarray which imports numpy.core._multiarray_umath again. In the standard Python build, this works fine and it just continues, but in the Wasi build it fails when it hits that import cycle. I’m not clear as to what happens that the standard build works that isn’t happening in the Wasi version (it likely has to do with static linking, not so much Wasi). If anyone has any tips, I’d greatly appreciate it.

That’s great to hear! I have been thinking that might be the best solution for WASI until we get some dynamic linking solution (I’m hoping the component model could get leveraged for this some day).

The code that handles built-in imports is in the BuiltinImporter class. There’s nothing specifically there that I can see which would suggest statically linking and using built-in imports would cause a failure that would work otherwise. Typically cyclical imports fail because someone is importing to a specific object in a module instead of the module itself since the import system sets a module in sys.modules ASAP specifically so any other imports will get the module from there and not trigger another import.

Looking at the create_builtin() function, it might be the cause if numpy is using single-phase initialization. If I’m tracing through the code by hand correctly, I think cpython/Python/import.c at cbb0aa71d040022db61390380b8aebc7c04f3275 · python/cpython · GitHub will never succeed for anything that’s being actively imported in NumPy because cpython/Python/import.c at cbb0aa71d040022db61390380b8aebc7c04f3275 · python/cpython · GitHub is where a module gets inserted into that specific single-phase module cache in create_builtin(). And since that’s passed where the module’s init function gets called, it’s going to lead to a stack of calls into create_builtin() where no one has gotten into a place to break any cycle.

Any chance you can take your build and run it under a debugger to see if my hunch is correct? If it is then please open an issue at Issues · python/cpython · GitHub ? I would also then open an issue on NumPy letting them know.

I haven’t had much luck getting a debugger to work, so I’ve had to track things down the old fashioned way (printf…). The line where things go bad is https://github.com/python/cpython/blob/main/Python/import.c#L2547. The first time this is hit by importing numpy.core._multiarray_umath, it causes the parents to get imported (numpy.core and numpy) which causes the subsequent imports to come around to importing numpy.core._multiarray_umath again. The second time it hits that line in import.c, it fails.

I’ll keep trying to get a debugger on it, and maybe post a repo with my changes and build scripts so that you can try to reproduce it if you have time.

As far as the working C extension goes, it came together relatively easily. I even have exports from the host program that are called by the Python C extension, so it’s demonstrating communication both ways from the host.

Unfortunately, it looks like debugging may not be an option.

https://github.com/bytecodealliance/wasmtime/issues/4669

Have you tried doing the same build to the native CPU of your machine so you can debug that way? In other words don’t build for WASI for the target but your machine’s own CPU, leaving everything else the same?

Can you provide a permalink? That line currently goes to a } in the middle of a function.

Sorry, here’s the link: https://github.com/python/cpython/blob/c6858d1e7f4cd3184d5ddea4025ad5dfc7596546/Python/import.c#L2552.

I’ve been trying to build a native version of Python and numpy, but I’m having trouble getting it to work for some reason that I haven’t figured out yet.

@kesmit I was hitting this last year and I just removed the assert and changed the if above it as in

-            if range_start == range_end {
+            if range_start >= range_end {
                 continue;
             }
-            assert!(range_start < range_end);

Then, with a local build of wasmtime I was able to do some debugging. It’s not as good as printf, but did give me faster iteration cycles till I find the places worthy of a printf.

I have no idea what those ranges represent, just saying that it worked for me.

That’s calling cpython/Lib/importlib/_bootstrap.py at 90f1d777177e28b6c7b8d9ba751550e373d61b0a · python/cpython · GitHub .

What’s the exact failure you’re seeing? Are you blowing out your stack due to infinite recursion?

I did get a little further by getting a static x86 build working. I found out that there was a linker option that needed to be added. That got the x86 build working, but unfortunately, it didn’t completely fix the Wasm build (although it is getting a little further). I’m still digging…

@brettcannon I’m struggling to debug the BuiltinImporter code because it is frozen. Is there a way to compile without freezing it?

I thought I would at least be able to edit Lib/importlib/_bootstrap.py with print statements, rebuild Python and they would show up, but that’s not even working so there’s something I don’t understand here.

There’s a regen-all make target to regenerate the frozen copy of importlib. You can also use importlib.import_module() and that uses the .py files.

import wasmtime
from bindings import Udf

store = wasmtime.Store()

module = wasmtime.Module(store.engine, open('s2-udf-python3.10.wasm', 'rb').read())

linker = wasmtime.Linker(store.engine)
linker.define_wasi()

wasi = wasmtime.WasiConfig()
wasi.inherit_stdin()
wasi.inherit_stdout()
wasi.inherit_stderr()
wasi.env = [
    ('PYTHONDONTWRITEBYTECODE', 'x'),
#   ('PYTHONVERBOSE', '5'),
]
store.set_wasi(wasi)

udf = Udf(store, linker, module)

# You *must* call _initialize for wasi to work
udf.instance.exports(store)['_initialize'](store)

udf.exec(store, r'''
import platform
import numpy as np

print('platform', platform.platform())
print('numpy', np.__version__)
print('')

print('a')
a = np.array([2, 3, 4])
print(a)
print(a.dtype)
print('')

print('b')
b = np.array([1.2, 3.5, 5.1])
print(b)
print(b.dtype)
print('')

print('zeros')
print(np.zeros((3, 4)))
print('')

print('ones')
print(np.ones((2, 3, 4)))
print('')

print('empty')
print(np.empty((2, 3)))
print('')

print('int range')
print(np.arange(10, 30, 5))
print('')

print('float range')
print(np.arange(0, 2, 0.3))
print('')

print('linspace')
print(np.linspace(0, 2, 9))
x = np.linspace(0, 2 * np.pi, 100)
f = np.sin(x)
print(f)
print('')

print('a - b')
print(a - b)
print('')

print('b**2')
print(b ** 2)
print('')
''')
platform wasi-0.0.0-wasm32-32bit
numpy 1.24.2

a
[2 3 4]
int32

b
[1.2 3.5 5.1]
float64

zeros
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

ones
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]

empty
[[3.24640964e-283 7.63566008e-227 2.59665904e-225]
 [9.70076201e-225 1.46511924e-225 9.70096813e-225]]

int range
[10 15 20 25]

float range
[0.  0.3 0.6 0.9 1.2 1.5 1.8]

linspace
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
[ 0.00000000e+00  6.34239197e-02  1.26592454e-01  1.89251244e-01
  2.51147987e-01  3.12033446e-01  3.71662456e-01  4.29794912e-01
  4.86196736e-01  5.40640817e-01  5.92907929e-01  6.42787610e-01
  6.90079011e-01  7.34591709e-01  7.76146464e-01  8.14575952e-01
  8.49725430e-01  8.81453363e-01  9.09631995e-01  9.34147860e-01
  9.54902241e-01  9.71811568e-01  9.84807753e-01  9.93838464e-01
  9.98867339e-01  9.99874128e-01  9.96854776e-01  9.89821442e-01
  9.78802446e-01  9.63842159e-01  9.45000819e-01  9.22354294e-01
  8.95993774e-01  8.66025404e-01  8.32569855e-01  7.95761841e-01
  7.55749574e-01  7.12694171e-01  6.66769001e-01  6.18158986e-01
  5.67059864e-01  5.13677392e-01  4.58226522e-01  4.00930535e-01
  3.42020143e-01  2.81732557e-01  2.20310533e-01  1.58001396e-01
  9.50560433e-02  3.17279335e-02 -3.17279335e-02 -9.50560433e-02
 -1.58001396e-01 -2.20310533e-01 -2.81732557e-01 -3.42020143e-01
 -4.00930535e-01 -4.58226522e-01 -5.13677392e-01 -5.67059864e-01
 -6.18158986e-01 -6.66769001e-01 -7.12694171e-01 -7.55749574e-01
 -7.95761841e-01 -8.32569855e-01 -8.66025404e-01 -8.95993774e-01
 -9.22354294e-01 -9.45000819e-01 -9.63842159e-01 -9.78802446e-01
 -9.89821442e-01 -9.96854776e-01 -9.99874128e-01 -9.98867339e-01
 -9.93838464e-01 -9.84807753e-01 -9.71811568e-01 -9.54902241e-01
 -9.34147860e-01 -9.09631995e-01 -8.81453363e-01 -8.49725430e-01
 -8.14575952e-01 -7.76146464e-01 -7.34591709e-01 -6.90079011e-01
 -6.42787610e-01 -5.92907929e-01 -5.40640817e-01 -4.86196736e-01
 -4.29794912e-01 -3.71662456e-01 -3.12033446e-01 -2.51147987e-01
 -1.89251244e-01 -1.26592454e-01 -6.34239197e-02 -2.44929360e-16]

a - b
[ 0.8 -0.5 -1.1]

b**2
[ 1.44 12.25 26.01]
2 Likes

@brettcannon There was an issue in the Python BuiltinImporter. In this case, the builtin extension was namespaced below numpy.core. The BuiltinImport is not expecting namespaced extensions and when it sees the path parameter supplied, it short-circuits and returns None. The simple fix I put in (that may not be perfect), is to check for a ‘.’ in the import name and to see if it is a builtin name before doing anything else.

class BuiltinImporter:

   . . .

    @classmethod
    def find_spec(cls, fullname, path=None, target=None):
        if '.' in fullname and _imp.is_builtin(fullname):
            return spec_from_loader(fullname, cls, origin=cls._ORIGIN)
        if path is not None:
            return None
        if _imp.is_builtin(fullname):
            return spec_from_loader(fullname, cls, origin=cls._ORIGIN)
        else:
            return None

Yeah, that’s a mistake. :sweat_smile: Would you mind opening an issue at Issues · python/cpython · GitHub ? I can then fix that upstream.

Here’s the next step: pandas. However, pandas has some much bigger problems than numpy. It only sorta works, and the build takes more hand-holding.

import platform
print(platform.platform())

import pandas as pd
df = pd.DataFrame([[1.234, 2, 'hi'], [3.14, 3, 'bye']], columns=['a', 'b', 'c'])
print('COLUMNS', df.columns)
#print('DTYPES', df.dtypes)
for i, row in enumerate(df.iterrows()):
    print('ROW', i, row[1])

Result:

wasi-0.0.0-wasm32-32bit
COLUMNS Index(['a', 'b', 'c'], dtype='object')
ROW 0 a    1.234
b        2
c       hi
Name: 0, dtype: object
ROW 1 a    3.14
b       3
c     bye
Name: 1, dtype: object

Out of curiosity, are you adding pandas due to demand or because you just want to? I totally get NumPy as it’s so foundational for the sciences/numerics, but I would interested if pandas is there or if pure Python alternatives could also work in this instance to save the headache of trying to make it work.

I didn’t want to wait on this, so I opened Allow for built-in modules to be submodules · Issue #102768 · python/cpython · GitHub .