Here’s an example distilled from the Snex’s runner.py I linked in the previous post: runner_streamlined.py · GitHub
@kzemek thanks, now it is much clearer.
Do you want to execute immediatelly all callbacks for result_fut.set_result(data["value"]), the single set_result() call in your snippet?
create_task(..., eager_start=True) should continue working as is because it is fine to start handle_request() immediatelly, correct?
@asvetlov Yes, I would replace that set_result() with eager_set_result() and implement a custom EagerStreamReader, and then User’s task will be resumed immediately on read.
Relatedly, if we implement eager_set_result(), is there any disadvantage to introducing eager_set_result() to StreamReader as the first stdlib user?
Next somebody will ask for the eager version of Queue.get() and so on.
I’m inclined to think that we are going in the wrong direction.
Let’s change a mind.
What do we really need? Execute callbacks scheduled by loop.call_soon() at the same loop iteration without calling a relatively expensive epoll() call (I’ll speak for Linux, other platforms doesn’t change the overall picture).
Now event loop works like this:
- There is
_readylist of callbacks scheduled byloop.call_soon()on the previous iteration (maybe the list is empty). - Handle ready file descriptors, add corresponding callbacks to
_ready. - Handle expired time handlers, add corresponding callbacks to
_ready. - Process all callback handlers from
_readylist once: manage network activity, run tasks, etc.
We can process newly added callbacks in the same loop iteration without waiting for the next loop._run_once() call.
The blind execution of all handles from _ready list could lead to network starvation and even infinite loop if the executed callback schedules itself for the next iteration.
On my laptop (Linux), getting the current time is extremely fast; it is in the order of magnitude of a regular integer operation.
Thus, we can repeat callback execution from _ready list in a loop until there are items to process and the total processing time is less than some (very low) threshold.
Decades ago time() call was much more expensive, but I believe nowadays all modern OSes have required optimizations already. Sure, we can invent some heuristics for a number of iterations and the amount of handled callbacks in each iteration – but if time() is so cheap I don’t see any reason to do it; the direct period of execution time works better.
By adding post-execution step we can eliminate performance degradation for described situations, as well as for systems that have a cascade of processor units that communicate via asyncio queues – all will be handled on the same loop iteration until the total processing time is moderate.
As a side effect, regular tasks execute immediately after creation, but without breaking asyncio assumptions like eager tasks do.
@Tinche do you see any hidden problem with the proposal?
I don’t outright see anything problematic. It’s an easy enough polyfill: loop_eager_polyfill.py · GitHub
And the difference in Snex benchmarks (M1 Mac):
Benchmark results
##### With input eager_loop #####
Name ips average deviation median 99th %
Snex.pyeval (Elixir->Python->Elixir) 36.39 K 0.0275 ms ±6.71% 0.0267 ms 0.0323 ms
snex.call (Python->Elixir->Python, 100 times) 0.30 K 3.28 ms ±1.10% 3.28 ms 3.38 ms
Comparison:
Snex.pyeval (Elixir->Python->Elixir) 36.39 K
snex.call (Python->Elixir->Python, 100 times) 0.30 K - 119.51x slower +3.26 ms
##### With input standard_loop #####
Name ips average deviation median 99th %
Snex.pyeval (Elixir->Python->Elixir) 25.35 K 0.0395 ms ±5.59% 0.0387 ms 0.0447 ms
snex.call (Python->Elixir->Python, 100 times) 0.177 K 5.66 ms ±0.74% 5.65 ms 5.77 ms
Comparison:
Snex.pyeval (Elixir->Python->Elixir) 25.35 K
snex.call (Python->Elixir->Python, 100 times) 0.177 K - 143.39x slower +5.62 ms
Other curious numbers according to timeit:
epoll.poll(0) – 175 ns
select([], [], [], 0) – 350 ns
monotonic() – 30 ns
thread_time() – 150 ns
1+1 (__add__()) – 6 ns
That said, I guess 10 us from the eager loop example is too much.
First of all, I think optimizing the loop or giving users knobs to tweak is much, much better than implementing new APIs that live right on the edge of allowed behavior. We keep code more understandable and potentially benefit more users.
Tweaking loop scheduling is tricky. I’d be in favor of your change, but maybe we can expose some knobs and let the users tweak the behavior and tresholds? That way if we degrade someone’s workflow, they at least have a way to restore old behavior. Some people might want even more aggressive thresholds.
FYI, I’ve created a draft for eager loop implementation.
It is in the early stage, doesn’t have good test coverage, sanity checks, docs, etc.
Please feel free to comment on the code.
Upd: link to PR Implement eager loop runner by asvetlov · Pull Request #145149 · python/cpython · GitHub
Sorry, forgot to add a link: Implement eager loop runner by asvetlov · Pull Request #145149 · python/cpython · GitHub