Audit event for time.sleep

Using time.sleep can DoS an async application.
Will McGugan, author of Textual (an async framework) asked about what to do with this. Audit events came up. Adding an audit event for time.sleep sounds like a good idea.
But it won’t be a silver bullet:

  • 3.13 is a long way away
  • As Steve mentioned in the Twitter thread, Python audit hooks run for all auditable events,

Maybe adding a “set of interesting events” argument to sys.audithook, and doing preliminary filtering in C, would be a good idea?

I do not think it would help. There is a million other ways to freeze an application.

If you only want to detect the call of time.sleep(), monkeypatch it.

The method suggested by Matthew Rocklin looks more universal.

I don’t have an opinion on time.sleep, but having a blessed way of preventing blocking IO in a particular thread (all threads running event loops) would be useful for asyncio apps.

There have been many times where I’ve been unsure whether a library performed blocking io, and when. Having this be an error would have been helpful.

Unfortunately that’s nearly impossible. Coroutines are, by their nature, cooperative, so there’s really no way to block things from being done. The best solution would be some sort of watchdog thread. To expand on what Serhiy linked to from Matthew Rocklin, you can have a simple thread that just does this:

last_event_loop_time = time.time()
def watchdog():
    while True:
        time.sleep(30)
        if time.time() - last_event_loop_time > 30:
            raise_the_alarm()

async def loop_is_active():
    while True:
        await asyncio.sleep(5)
        global last_event_loop_time
        last_event_loop_time = time.time()

(Adjust the thresholds as needed - eg if even a single second of event loop lag is a problem, reduce those times and pay a bit more overhead.)

What you do in raising the alarm is up to you, but I’d be inclined to look in the direction of asyncio.current_task(), and possibly the Task object’s print_stack() method, although I have yet to use either.

1 Like

Why do you think it’s nearly impossible? Presumably we’d just need socket.connect to perform a check and raise an exception if the check fails.

You also need to worry about anything else that might block. For example, gethostbyname can (and often will) block; writing to any file descriptor might block; and lots of other things could, under some circumstances, require that something get logged to a file or something, which could block.

If you ONLY want to force socket connections to check, I’d recommend monkeypatching that, but that isn’t a fully general solution.

Conversely, if you’re wanting to check for accidental bugs (calling time.sleep() when it should have been await asyncio.sleep() etc), static analysis might be sufficient, if you can identify the sorts of problems that you keep running into.

While asyncio aims at, well, IO, an event loop and cooperative async functions are also great for animation/simulation.
Textual is a UI framework that uses async under the hood. Additionally, it tries to be beginner-friendly: users don’t need to know anything about async concepts to start using it. If you write a function instead of a coroutine, Textual will just call it.
Unfortunately, as Will found out, beginner users often use time.sleep in their experiments as a substitute for doing work. “Don’t use sleep” is one async concept they need to know about – and there’s no good way to teach them.

Static analysis might help users, but it’s not useful for a framework.

Avoiding other blocking things like gethostname is important for IO throughput, but AFAIK isn’t that important for UI: if an animation freezes for a millisecond, you aren’t likely to notice it. And if it does become a problem, the project is more likely to be at a stage where it’s reasonable to reach for static analysis or a profiler.

Also: gethostname already has an auditing event! And so does opening files, so an async library might be able to complain about files opened without an async wrapper. (But, a watchdog or profiler might be a better solution for IO.)

1 Like

That’s reasonable; in that case, I wouldn’t be averse to the framework monkeypatching a few prime targets like that, to help people notice problems.

Ahh… yes. Up until it’s not a millisecond but several seconds and your entire UI becomes completely nonresponsive. Unfortunately gethostbyname can be quite annoying to solve - your options are either bypass it and directly call on a DNS server (which forfeits the benefits of the hosts file and anything changed in nsswitch (where applicable)), or spin it off into a thread, which isn’t something people tend to stumble upon as a solution (and has its own overheads and potential issues).

I’ll admit I’m biased: I’d love to see more async UI/animation frameworks :‍)
I started this topic to get a reality check before adding the audit event.

So far I think I should add it. Maybe it’ll only help Textual, but on the other hand, audit events are cheap (esp. if unused) and easy to add/maintain.

It’s better than telling people to monkey-patch time.sleep.

And to reword this again: the intent is to teach people to be careful with blocking calls, not to find and avoid them in production code. Get a profiler/watchdog for the latter.