Well, someone has to design an API to ease the greenlet implementation. So far, nobody proposed such API. In the past, there were some pieces of Stackless Python in CPython directly. It seems like they are gone. Or maybe this project was fully maintained externally, I’m not sure.
It would even be okay with me for PyState_Restore to consume state_ptr, though maybe someone could some day have a reason to restore the same state twice.
It seems to me that there is pretty complicated logic to keep track of what parts of the stack are evicted after the switch, and this logic is strongly OS / architecture dependent but not dependent on the Python version. And then there is this logic to save and restore the necessary parts of the Python threadstate which is strongly dependent on the Python version but largely OS / architecture independent. It would be nice to see the threadstate code be part of Python itself.
It doesn’t matter how “simple” such an API could appear. The “pretty complicated logic” that is platform and or version specific internally is the important part. We’ll need the exact definition of its semantics in terms of what it explicitly does and does not do. Without any underspecified undefined behaviors.
I agree that it’d be nice to have this maintained within CPython given multiple things appear to want to do it and this way we could update it as our internals evolve instead of waiting for external projects to all catch up.
The first step is defining what exactly it is even supposed to be.
The first step is defining what exactly it is even supposed to be.
Right. Here’s an attempt to describe what should happen. Maybe not quite a definition.
Prior to entering Python frames we record the current stack pointer stack_start. Then we call switch_stack(). switch_stack() records the current stack position stack_stop. We save the Python thread state with PyState_Save() restore it to the “no Python frames” ready state so that it can run other code. We copy the range of stack between stack_start and stack_stop into a buffer, and then set the stack pointer back to stack_start and call some other Python code. When we’re done, we copy the original stack data from the buffer back onto the call stack. We set the stack pointer back to stack_stop. Then we call PyState_Restore(). Now we can return from switch_stack back into the originally executing Python context without crashing.
The goal is that PyState_Save() and PyState_Restore() save whichever parts of the tstate are needed for this to work.
We see in practice that this includes stuff like cframe, use_tracing, recursion_depth, frame, etc.
But I guess with some asm it would be possible to write a test case that does all of this. And then if this code runs and doesn’t segfault, it would be working.
Note that the code in PyState_Save() and PyState_Restore() is in fact completely entirely platform independent. It’s everything else that depends on the platform.
To clarify: are you talking about the C stack, or the Python stack?
I personally don’t think Python has any business providing APIs for swapping its own C stack in and out. Assuming you mean the Python stack, there is indeed already a discussion on the bug tracker (and an open PR) for providing this API. I’m sure your input there would be greatly appreciated, since you seem like exactly the sort of user we would define this new API for!