Higher resolution timers on Windows?

daniele · June 1, 2022, 7:32am

Hello,

sorry if I don’t use precise terminology, I am new to asyncio.

Is there a way to have the asyncio runner to use higher resolution timers on Windows?

I have a process that interfaces with some hardware. It needs to poll some status at a regular interval (of the order of a second), and react to commands received on a ZeroMQ socket. I realize this with two asyncio tasks: one that does the polling and yields execution to the other awaiting on a future scheduled with loop.call_at(), and a second that awaits on socket.recv(). Reacting to commands takes much less than the polling interval and all works nicely (and I am very happy about how easy it was to code this with asyncio).

The only drawback is that the timer used by the asyncio event loop is fairly coarse and I get ~20 ms jitter [1] in the polling interval. This is not terrible but I would like to investigate whether there is room for improvement. I don’t know much about the Windows API, thus I am not sure where the limitation comes from. Searching the 'net I found that Trio may be using higher resolution timers for his event loop, but ZeroMQ does not support async operation with Trio. Also, ZeroMQ forces the use of the asyncio.WindowsSelectorEventLoopPolicy event loop.

One way to reduce the jitter is to schedule the task early and busy loop on short synchronous sleeps till the polling deadline, but it is a big ugly and wasteful.

Where can I look for a better solution?

Thank you.

Cheers,
Dan

[1] I use ctypes to call NtSetTimerResolution() to increase clock resolution to ~1 ms. If I use regular sync calls I can easily code the polling with an interval accuracy of a couple of ms, thus I suspect that the jitter comes entirely from the timer resolution.

sorcio · June 1, 2022, 9:24am

~~If you use time.sleep(), it has increased resolution since Python 3.11 (see bpo-21302: time.sleep() uses waitable timer on Windows by vstinner · Pull Request #28483 · python/cpython · GitHub)~~

EDIT: not relevant to the question, as explained by Eryk below

eryksun · June 1, 2022, 9:37am

I’ve had no involvement with the development of asyncio, and I don’t use it, so take this with a grain of salt. Anyway, I see that the BaseEventLoop uses a scheduler based on a time() method that calls time.monotonic(), and it also sets self._clock_resolution = time.get_clock_info('monotonic').resolution. In Windows, time.monotonic() uses GetTickCount64(). Regardless of the current timer resolution, the tick count increases in a sequence of 15 ms and 16 ms increments, with an average resolution of 15.625 ms. As an experiment, you could modify BaseEventLoop to call time.perf_counter() in the time() method and set self._clock_resolution = time.get_clock_info('perf_counter').resolution in the __init__() method. The performance counter has a resolution of 1 microsecond or less.

eryksun · June 1, 2022, 9:54am

Davide, gh-89592 is the issue that modified time.sleep() to use a high-resolution timer in newer Windows versions that support CREATE_WAITABLE_TIMER_HIGH_RESOLUTION. But I don’t see how the resolution of time.sleep() is related to the scheduler of the asyncio event loop. On a related note, there are open issues about improving the resolution of time.time() and time.monotonic() in newer versions of Windows that support “precise” time functions, respectively, GetSystemTimePreciseAsFileTime() and QueryUnbiasedInterruptTimePrecise().

sorcio · June 1, 2022, 10:35am

Oh right, my bad! I misread the question entirely to be about a precise polling interval (I did not realize I was on Async-SIG).

daniele · June 1, 2022, 11:26am

Thanks Eryk. I see where in BaseEnventLoop wake up timers are coalesced accordingly to the clock resolution. The waiting itself although is done on what on Windows boils down to select(). What is the timeout resolution supported by select() on Windows?

Using NtSetTimerResolution() I can improve the resolution of the clock used by time.time() and time.monotonic() to 1 ms, and this is sufficient for my application thus I don’t see a reason to switch to use time.perf_counter() which would require adding code to offset the returned time to match wall clock time (I need to timestamp the polled data) and (I suspect) logic to handle wrap-around.

I think that, for most applications, coalescing timers makes a lot of sense, thus I don’t think this is a shortcoming of asyncio event loop. I am just trying to understand how to customize the event loop to the needs of this use case (while also trying to understand the Windows time API, which I find very confusing).

daniele · June 1, 2022, 11:49am

Thanks Eryk for the hints. I just tried and what you suggested works well. Here is the code:

class HiResSelectorEventLoop(asyncio.SelectorEventLoop):
    def __init__(self):
        super().__init__()
        self._clock_resolution = time.get_clock_info('perf_counter').resolution

    def time(self):
        return time.perf_counter()
        
class EventLoopPolicy(asyncio.DefaultEventLoopPolicy):
    _loop_factory = HiResSelectorEventLoop

asyncio.set_event_loop_policy(EventLoopPolicy())

To avoid having to deal with time.perf_counter() offset and possible wrap-around (I haven’t checked if this is a thing or not) I simply use time.time() for time-stamping anyway (with the augmented resolution I can get calling NtSetTimerResolution()).

eryksun · June 1, 2022, 1:34pm

Hmm… I’m doubtful that changing the timer resolution to 1 ms is affecting time.monotonic(), i.e. GetTickCount64(). That runs counter to my experience in Windows NT 3.51 (released in 1995; tested in a VM of course) up through Windows 11. The documentation says it’s “limited to the resolution of the system timer”, but in practice it’s not the current timer resolution, but rather it’s the default timer resolution, in the range of 10-16 ms. In older versions in a VM, I usually see 10 ms. In Windows 10 and 11, I see 15.625 ms, implemented as a sequence of 15 ms and 16 ms increments.

The current timer resolution definitely should affect everything that’s based directly on the interrupt time, such as timeGetTime(), QueryUnbiasedInterruptTime() (Windows 7+), and QueryInterruptTime() (Windows 10+). It should also affect the resolution of Sleep[Ex]() and waiting on standard resolution timers and timeouts in WaitForSingleObject[Ex]() and other wait functions.

In Windows 8+, the current timer resolution also affects GetSystemTimeAsFileTime(), i.e. time.time() in Python. Prior to Windows 8, the system time was updated on a fixed interval, like GetTickCount(). Nowadays it’s updated freely on every timer interrupt. I’m pretty sure this is due to changes that were required to implement GetSystemTimePreciseAsFileTime() (Windows 8), which uses the current performance counter value to improve the precision of the system time value, based on an offset that’s calculated in the kernel. This technique is also used for QueryInterrruptTimePrecise() and QueryUnbiasedInterruptTimePrecise() in Windows 10+.

daniele · June 1, 2022, 1:51pm

I don’t know much about the (to me very confusing) Windows’ clock API, but experimentally calling NtSetTimerResolutuion() affects the observed resolution of time.time() and time.monotonic(). Here is how I am testing:

import ctypes
import statistics
import time


ntdll = ctypes.WinDLL('NTDLL.DLL')

NSEC_PER_SEC = 1000000000


def set_resolution_ns(resolution):
    """Set resolution of system timer.

    See `NtSetTimerResolution`

    http://undocumented.ntinternals.net/index.html?page=UserMode%2FUndocumented%20Functions%2FTime%2FNtSetTimerResolution.html
    http://www.windowstimestamp.com/description
    https://bugs.python.org/issue13845

    """
    # NtSetTimerResolution uses 100ns units
    resolution = ctypes.c_ulong(int(resolution // 100))
    current = ctypes.c_ulong()

    r = ntdll.NtSetTimerResolution(resolution, 1, ctypes.byref(current))

    # NtSetTimerResolution uses 100ns units
    return current.value * 100


def set_resolution(resolution):
    return set_resolution_ns(resolution * NSEC_PER_SEC) / NSEC_PER_SEC


def test(n):
    r = []

    for x in range(n):
        t1 = time.time()
        while True:
            t2 = time.time()
            if t2 != t1:
                break
        r.append(t2 - t1)

    print('measured resolution')
    print('    mean: {:.6f} s'.format(statistics.mean(r)))
    print('  median: {:.6f} s'.format(statistics.median(r)))
    print('     min: {:.6f} s'.format(min(r)))
    print('     max: {:.6f} s'.format(max(r)))


def main():
    test(1000)
    rcurr = set_resolution(1e-3)
    print('set system interrupt interval: {:.6f} s'.format(rcurr))
    test(10000)


if __name__ == '__main__':
    main()

And this is what I obtain on Windows 10:

$ python test.py
measured resolution
    mean: 0.010498 s
  median: 0.015619 s
     min: 0.000003 s
     max: 0.015658 s
set system interrupt interval: 0.000997 s
measured resolution
    mean: 0.000983 s
  median: 0.000997 s
     min: 0.000002 s
     max: 0.001114 s

eryksun · June 1, 2022, 2:10pm

The result that you showed for time.time() is expected in Windows 8+, as discussed in the last paragraph of my previous message. However, I doubt that time.monotonic() changes based on the current timer resolution. It’s based on GetTickCount64(), which always use the default timer resolution, and thus should return a mean in the range 10-16 ms, likely about 15.625 ms.

daniele · June 1, 2022, 2:11pm

Whops, I misread your reply. The resolution of time.monotonic() does not change indeed.

daniele · June 1, 2022, 6:21pm

Thank you very much for the detailed reply. I’m looking at this again and reconsidering what is the best way to solve my time-stamping issue. Would you suggest to call GetSystemTimePreciseAsFileTime() via ctypes as a solution to get an high-resolution time stamp instead than messing with the timer resolution?

eryksun · June 2, 2022, 5:31am

Here’s a replacement for time.time() that uses ctypes to call GetSystemTimePreciseAsFileTime() (Windows 8+). Take care to avoid mixing ‘precise’ timestamps with those from time.time() that are based on GetSystemTimeAsFileTime(). The precise time may be later than the normal time by several milliseconds since the normal value is updated on the timer interrupt, which can vary from 0.5 ms to 16 ms.

def get_clock_time():
    from time import time, time_ns, get_clock_info
    try:
        from ctypes import WinDLL, byref, c_ulonglong
        GetSystemTimePreciseAsFileTime = (
            WinDLL('kernel32').GetSystemTimePreciseAsFileTime)
    except (ImportError, OSError, AttributeError):
        resolution = get_clock_info('time').resolution
    else:
        def time() -> float:
            """Return the current time in seconds since the Epoch."""
            t = c_ulonglong() # in units of 100 ns
            GetSystemTimePreciseAsFileTime(byref(t))
            # Subtract 116444736000000000 (369 years, 89 leap days) to 
            # translate NT's epoch (1601-01-01) to Unix (1970-01-01).
            return (t.value - 116444736000000000) * 1e-7

        def time_ns() -> int:
            """Return the current time in nanoseconds since the Epoch."""
            t = c_ulonglong() # in units of 100 ns
            GetSystemTimePreciseAsFileTime(byref(t))
            # Subtract 116444736000000000 (369 years, 89 leap days) to 
            # translate NT's epoch (1601-01-01) to Unix (1970-01-01).
            return (t.value - 116444736000000000) * 100

        resolution = get_clock_info('perf_counter').resolution

    return time, time_ns, resolution


time, time_ns, time_resolution = get_clock_time()
del get_clock_time

Here’s a replacement for time.monotonic() based on QueryUnbiasedInterruptTimePrecise() (Windows 10+). The unbiased version is used in order to match the behavior of wait timeouts in Windows 8+, which no longer include time spent in low-power states such as when the system is suspended. Take care to avoid mixing precise counter values with normal values that are based on GetTickCount64(), timeGetTime(), QueryInterruptTime(), or QueryUnbiasedInterruptTime(). The precise value may be larger than the normal value by several milliseconds. Also note that GetTickCount64(), QueryInterruptTime(), and QueryInterruptTimePrecise() are biased by the time spent while the system is suspended or hibernated. I don’t recall whether or not timeGetTime() is biased, but it probably is.

def get_clock_monotonic():
    from time import monotonic, monotonic_ns, get_clock_info
    try:
        from ctypes import WinDLL, byref, c_ulonglong
        apiquery = WinDLL('api-ms-win-core-apiquery-l2-1-0')
        realtime = 'api-ms-win-core-realtime-l1-1-1'
        if not apiquery.IsApiSetImplemented(realtime.encode()):
            raise OSError
        QueryUnbiasedInterruptTimePrecise = (
            WinDLL(realtime).QueryUnbiasedInterruptTimePrecise)
    except (ImportError, OSError, AttributeError):
        resolution = get_clock_info('monotonic').resolution
    else:
        def monotonic() -> float:
            """Monotonic clock, cannot go backward."""
            t = c_ulonglong() # in units of 100 ns
            QueryUnbiasedInterruptTimePrecise(byref(t))
            return t.value * 1e-7

        def monotonic_ns() -> int:
            """Monotonic clock, cannot go backward, as nanoseconds."""
            t = c_ulonglong() # in units of 100 ns
            QueryUnbiasedInterruptTimePrecise(byref(t))
            return t.value * 100

        resolution = get_clock_info('perf_counter').resolution

    return monotonic, monotonic_ns, resolution


monotonic, monotonic_ns, monotonic_resolution = get_clock_monotonic()
del get_clock_monotonic

daniele · June 2, 2022, 7:34am

Thank you very much! However, it turns out that on Windows, unless I tweak it as discussed, the timer resolution used for the select.select() timeout is the “usual” 15 ms, and this limits the resolution of my asyncio based polling task

I don’t know which system call is used to implement select.select() on Windows nor whether Windows as an API that offers higher resolution timeouts… I suspect an API with higher timeouts resolution must exist, but given my very limited knowledge of Windows APIs, I don’t know if I want to embark in the task of writing a selectors backend using it…

eryksun · June 2, 2022, 11:25am

The timer resolution affects the resolution of SleepEx() and the timeout of thread wait functions such as WaitForSingeObjectEx(). If possible, a workaround would be to use WaitForMultipleObjectsEx() and include a handle for a high-resolution timer (the link is to the documentation for driver developers, which has useful implementation details), as we’ve implemented for time.sleep(). That’s not always possible.

The Winsock select() function is implemented by the Windows sockets provider function mswsock!WSPSelect(). It translates the select() call into a poll IOCTL, which includes the given timeout. Then it calls NtDeviceIoControlFile() and, if the IOCTL is pending completion, calls NtWaitForSingleObject() to wait forever for the completion event to be signaled. It waits forever because the timeout is handled internally by the device driver that’s used for socket files, which implements the poll IOCTL. Apparently, the timeout is implemented in a way that’s subject to the current timer resolution – which is likely a kernel timer set via KeSetTimer() that executes a DPC (deferred procedure call). So the only way to improve the resolution is via timeBeginPeriod() or NtSetTimerResolution() (undocumented). At best you can get 0.5 ms timer resolution, which isn’t great, and the cost is increased power consumption.