I did encounter it part way through development. My first approach was to de-C-stack as much as possible, then spotted the stackless references, and thought ‘aha, someone else had a similar idea, nice’. Soon after I realised it was possible to do coroutines in C using setjmp(), a lightbulb moment, then went down that route to a complete result, rather than a 90% result. The proof-of-concept still has the de-stacking as far as I’d got in it, it’s unnecessary, but helpful (it keeps the C stack usage down).
You say…
… it is part of the C standard, and has been provided by most C implementations AFAIK forever (Mac, Linux & Windows for sure, others TBD). The only assumptions made in the proof-of-concept are: setjmp() & longjmp() behave as the standard allows (ie only switch & if on the return value); the stack is contiguous; the stack grows down (there is one target where the stack grow up - it’s something I need to address). The only new-to-python assumption is setjmp() & longjmp(), and I didn’t think assuming a well-supported, standard C feature available on the big-3 was much of a stretch. The growing down shortcoming will be fixed at some point.
Asyncio without function colouring , is narrowed down to just the proof-of-concept - please have a look there too. Highlights: requests library can be async ready as-is (and any other socket and ssl-based library); abstract interfaces are abstract - the interface user doesn’t need to know async is happening inside ever. There’s much more there too.