Picking the number of workers in asyncio


I’m trying to compare sync and async approaches for a blog post. I’ve written a synthetic benchmark function:

async def realistic_workload(_):
    await sleep(0.01)
    fibb(23)  # Around 5.86 ms
    await sleep(0.02)
    await sleep(0.01)
    fibb(20)  # Around 1.36 ms
    await sleep(0.01)
    # All together: 10 + 5.8 + 30 + 1.3 + 10 ~= 57 ms
    return JSONResponse({"work": "done"})

(and a synchronous version) and I’m trying to compare some runtime characteristics of sync (Gunicorn+Flask) and async (Uvicorn/Starlette) approaches.

I’m dedicating 4 logical cores of my 12 core machine to the server, and benchmarking with the ab utility. For the sync approach I was well aware I’d need to run a lot of workers, but I was surprised by the fact I get the best results running 8 asyncio workers, instead of the expected 4. Both the stdlib asyncio loop and uvloop give the same results.

Does anyone have any idea why I need 8 processes to saturate 4 cores? Feels like there’s something blocking in the asyncio workers.