Hello,
I’m trying to compare sync and async approaches for a blog post. I’ve written a synthetic benchmark function:
async def realistic_workload(_):
await sleep(0.01)
fibb(23) # Around 5.86 ms
await sleep(0.02)
await sleep(0.01)
fibb(20) # Around 1.36 ms
await sleep(0.01)
# All together: 10 + 5.8 + 30 + 1.3 + 10 ~= 57 ms
return JSONResponse({"work": "done"})
(and a synchronous version) and I’m trying to compare some runtime characteristics of sync (Gunicorn+Flask) and async (Uvicorn/Starlette) approaches.
I’m dedicating 4 logical cores of my 12 core machine to the server, and benchmarking with the ab
utility. For the sync approach I was well aware I’d need to run a lot of workers, but I was surprised by the fact I get the best results running 8 asyncio workers, instead of the expected 4. Both the stdlib asyncio loop and uvloop give the same results.
Does anyone have any idea why I need 8 processes to saturate 4 cores? Feels like there’s something blocking in the asyncio workers.