Bug? Win32 Sleep API is not always accurate below 100ms

No comments

From a friend

This is something to be aware of when coding short sleeps – e.g, 100 milliseconds or less, especially if you assume they add cumulatively e.g in a loop.

I was trying to figure out why on some machines my testing of timeslice degradation under performance gave different values. At first I thought – and stated so, erroneously this morning – that this was due to the core count, but it turns out to be more subtle than that

I was measuring a value of 1 on some machines, but on others, my baseline value was 3. This was odd and this morning I thought it was due to core count but as I said, I was wrong.

If some process on your system has called the multimedia timeBeginPeriod API and set the minimum timer resolution, this global setting affects ALL tasks and the quantisation of the time taken when the Win32 Sleep API is called. By default this is 15ms but many systems will have a driver or other process that makes this API call at some point to set the timer resolution to 1ms. This also appears to be an API that does not require elevated privileged either, so any user task can just call it.

If they do this then sleep resolution is now 1ms – which explains the discrepancy I found. With the default timer resolution a 50ms sleep can get quantized to 60ms, and a 5ms sleep can become 15ms. With the timer set to 1ms, sleep periods are quantized to 1ms. When I say ‘quantised’ I mean ‘munged to approximately something close’ since at 15ms resolution a 15ms sleep doesn’t, as you might expect, take 15ms but instead typically 30-45ms. But – for added fun – this isn’t constant. Sometimes you get 30ms, sometimes 45ms. It jitters. So exactly what algorithm is being used here I couldn’t say. Set the time period to the default of 15ms and ask for a 14ms sleep and get 15ms, 26ms, 28ms or even 32ms. Do ya feel lucky, punk, as Dirty Harry once said….

Now this is a typical half-baked Microsoft API and it has the wonderful property that the number of calls to timeBeginPeriod must apparently match the number of calls to timeEndPeriod – which means any process that terminates without a matching end call can leave the global timer resolution set unexpectedly, and furthermore I couldn’t locate an API that would tell a process what the current time resolution was, so it could put it back the way it found it. It’s kinda like the old joke about playing a country music song backwards; you get your dog, your wife and your pickup truck back…. But in this case how long you have to play the record is undefined….

[from curiosity I wondered what dreadful thing would happen if code just called timeBeginPeriod in a loop for a minute or two but I couldn’t find any evidence of a handle leak so presumably there’s a finite depth ‘stack’ involved here, but how deep that might be, I couldn’t say.]

Because your test system may well have the multimedia timers set to high resolution – about half my lab VMs did, probably due to some random piece of software being installed – you need to be careful when assuming that short sleep times will in fact be accurate on other devices. By the time you reach 100ms the error and jitter is now only 10-15% so in most cases this isn’t a big deal, but certainly worth bearing in mind.

Leave a comment