Audit event for time.sleep

encukou · May 29, 2023, 4:15pm

Using time.sleep can DoS an async application.
Will McGugan, author of Textual (an async framework) asked about what to do with this. Audit events came up. Adding an audit event for time.sleep sounds like a good idea.
But it won’t be a silver bullet:

3.13 is a long way away
As Steve mentioned in the Twitter thread, Python audit hooks run for all auditable events,

Maybe adding a “set of interesting events” argument to sys.audithook, and doing preliminary filtering in C, would be a good idea?

storchaka · May 29, 2023, 5:09pm

I do not think it would help. There is a million other ways to freeze an application.

If you only want to detect the call of time.sleep(), monkeypatch it.

The method suggested by Matthew Rocklin looks more universal.

Tinche · May 29, 2023, 8:59pm

I don’t have an opinion on time.sleep, but having a blessed way of preventing blocking IO in a particular thread (all threads running event loops) would be useful for asyncio apps.

There have been many times where I’ve been unsure whether a library performed blocking io, and when. Having this be an error would have been helpful.

Rosuav · May 29, 2023, 9:42pm

Unfortunately that’s nearly impossible. Coroutines are, by their nature, cooperative, so there’s really no way to block things from being done. The best solution would be some sort of watchdog thread. To expand on what Serhiy linked to from Matthew Rocklin, you can have a simple thread that just does this:

last_event_loop_time = time.time()
def watchdog():
    while True:
        time.sleep(30)
        if time.time() - last_event_loop_time > 30:
            raise_the_alarm()

async def loop_is_active():
    while True:
        await asyncio.sleep(5)
        global last_event_loop_time
        last_event_loop_time = time.time()

(Adjust the thresholds as needed - eg if even a single second of event loop lag is a problem, reduce those times and pay a bit more overhead.)

What you do in raising the alarm is up to you, but I’d be inclined to look in the direction of asyncio.current_task(), and possibly the Task object’s print_stack() method, although I have yet to use either.

Tinche · May 29, 2023, 10:38pm

Why do you think it’s nearly impossible? Presumably we’d just need socket.connect to perform a check and raise an exception if the check fails.

Rosuav · May 29, 2023, 10:57pm

You also need to worry about anything else that might block. For example, gethostbyname can (and often will) block; writing to any file descriptor might block; and lots of other things could, under some circumstances, require that something get logged to a file or something, which could block.

If you ONLY want to force socket connections to check, I’d recommend monkeypatching that, but that isn’t a fully general solution.

Conversely, if you’re wanting to check for accidental bugs (calling time.sleep() when it should have been await asyncio.sleep() etc), static analysis might be sufficient, if you can identify the sorts of problems that you keep running into.

encukou · May 30, 2023, 6:52am

While asyncio aims at, well, IO, an event loop and cooperative async functions are also great for animation/simulation.
Textual is a UI framework that uses async under the hood. Additionally, it tries to be beginner-friendly: users don’t need to know anything about async concepts to start using it. If you write a function instead of a coroutine, Textual will just call it.
Unfortunately, as Will found out, beginner users often use time.sleep in their experiments as a substitute for doing work. “Don’t use sleep” is one async concept they need to know about – and there’s no good way to teach them.

Static analysis might help users, but it’s not useful for a framework.

Avoiding other blocking things like gethostname is important for IO throughput, but AFAIK isn’t that important for UI: if an animation freezes for a millisecond, you aren’t likely to notice it. And if it does become a problem, the project is more likely to be at a stage where it’s reasonable to reach for static analysis or a profiler.

Also: gethostname already has an auditing event! And so does opening files, so an async library might be able to complain about files opened without an async wrapper. (But, a watchdog or profiler might be a better solution for IO.)

Rosuav · May 30, 2023, 7:34am

That’s reasonable; in that case, I wouldn’t be averse to the framework monkeypatching a few prime targets like that, to help people notice problems.

Ahh… yes. Up until it’s not a millisecond but several seconds and your entire UI becomes completely nonresponsive. Unfortunately gethostbyname can be quite annoying to solve - your options are either bypass it and directly call on a DNS server (which forfeits the benefits of the hosts file and anything changed in nsswitch (where applicable)), or spin it off into a thread, which isn’t something people tend to stumble upon as a solution (and has its own overheads and potential issues).

encukou · May 31, 2023, 7:51am

I’ll admit I’m biased: I’d love to see more async UI/animation frameworks :‍)
I started this topic to get a reality check before adding the audit event.

So far I think I should add it. Maybe it’ll only help Textual, but on the other hand, audit events are cheap (esp. if unused) and easy to add/maintain.

It’s better than telling people to monkey-patch time.sleep.

And to reword this again: the intent is to teach people to be careful with blocking calls, not to find and avoid them in production code. Get a profiler/watchdog for the latter.