Unstable API for pymain_run_python: run Python cli but don't finalize interpreter

hoodmane · January 31, 2024, 6:48pm

What I would like is for pymain_run_python to be switched from a static function to an unstable API.

Pyodide makes a python cli command that acts a lot like the normal python cli. The primary purpose of it is for running pytest. Handling async functions in this context is a bit tricky: ordinarily one would do something like:

python -c 'import asyncio; asyncio.run_until_complete(async_func())'

but asyncio.run_until_complete cannot currently work (at least unless we use the --experimental-wasm-stack-switching feature). Instead, what the Pyodide cli makes work is:

python -c 'import asyncio; asyncio.create_task(async_func())'

The way it does this is by calling pymain_run_python directly. Then we wait until there are no pending async tasks, or someone raises SystemExit or KeyboardInterrupt and at that point finalize the Python interpreter and exit.

Our problem is that pymain_run_python is not a public API. Py_RunMain calls pymain_run_python and then immediately cleans up but this is not suitable: the pending async tasks will segfault when they try to access nulled out interpreter fields. We just patch out the static keyword and call the function, but I would like to upstream the patch.

Does this sound reasonable?

@vstinner @encukou

steve.dower · January 31, 2024, 7:35pm

(@eric.snow might be interested)

I’d rather have some way to keep the interpreter alive while pending tasks exist. A runtime option to either wait/terminate without waiting seems like something we should have anyway, especially since it segfaults.

(IMHO, all of main.c, which includes Py_RunMain should be implemented in the interpreter executable, not the shared library. So turning more of our interactive implementation into public shared API feels like the wrong direction, when it ought to be made easier to clone and modify the part that provides the interactive experience, including environment variables and argument parsing.)

vstinner · January 31, 2024, 11:51pm

Why not calling PyRun_SimpleString("import asyncio; ...")? It runs the code and then you can call Py_Finalize(), or call again PyRun_SimpleString() with another code.

hoodmane · January 31, 2024, 11:56pm

Why not calling PyRun_SimpleString

I want python to otherwise function as a drop in replacement for the python cli, but also have a way to run code that uses asyncio. A similar issue is if someone has a file like:

import asyncio

async def do_stuff():
   ...

if __name__ == "__main__":
    asyncio.run_until_complete(do_stuff())

and then does python myfile.py, my current solution makes this work even though run_until_complete() has been defined to be the same as asyncio.create_task(): it schedules do_stuff() but cannot block for it.

encukou · February 1, 2024, 9:23am

Is there a use case outside Pyodide?
If not, patching static out might be the best option.
What unstable API gives you is stability across patch releases, but if you build CPython yourself, you don’t really need that.
What upstreaming would give you is sharing code across projects, but Pyodide might be a good home for async-friendly CLI on wasm right now.

I’d rather have some way to keep the interpreter alive while pending tasks exist.

AFAIK, general advice for asyncio is that all tasks should eventually be awaited explicitly: do use run or run_until_complete rather than create_task(...).

If I understand correctly, the use case is working around an artifact of another workaround – for a WASM issue that will be, in the long term, probably best solved with --experimental-wasm-stack-switching (judging from the option name only). If that’s the case, it’s best kept in Pyodide.

hoodmane · February 1, 2024, 10:45pm

If I understand correctly, the use case is working around an artifact of another workaround – for a WASM issue that will be, in the long term, probably best solved with --experimental-wasm-stack-switching (judging from the option name only). If that’s the case, it’s best kept in Pyodide.

Yes I agree with all of this. I have an implementation of greenlet-like semantics using wasm stack switching and it should be possible to use this to make run_until_complete work correctly and then the problem will go away. We would have to require node >= 20 for these features but this isn’t so bad.

Unfortunately it is taking a long time for JavaScript runtimes to implement stack switching. It will solve a lot of our most intransigent problems when they do.