Windows Sleep Microseconds for Low Latency Thread Communication

Posted on 2012-04-05
Last Modified: 2013-08-07
Is there any way I could suspend the thread for 1-2 microseconds? (Note 1 microsecond = 1/1000 miliseond). I know this question has been asked but so far there is no good answer for this. Here is my situation why I need this:

I have a server with several threads running to serve client's requests. Requests are sent by other thread through shared memory which occurs occasionally. However, once request is sent it needs to be processed ASAP. So, I need a fast way to notify one of the serving threads when request is queued in shared memory.

Windows event object was used but it takes 6-7us from SetEvent() to WaitForSingleObject() return on my machine (tried to set process/thread priority but still not much improvement). I tried to use a busy loop to let the serving threads keep pooling the memory which lower the latency to 1-2us, which is good enough, but it burns the CPU while the requests are only sent like once per minutes. If I could insert a micro/nano second sleep into the loop I could at least get my CPU free while keep the latency low.

I would be glad if anyone could suggest me another way to do the thread communication with latency lower than 2us. Thanks
Question by:codeblue229
  • 2
  • 2
LVL 16

Expert Comment

ID: 37815902
Have you tried Sleep(0) ?

It doesn't sleep for any given set of time.  But what it is supposed to accomplish is to give up the remainder of its time slice to allow other processes access to the CPU.

Author Comment

ID: 37816351
Similiar to windows API "SwitchToThread()", it yields execution to other thread but the CPU would still at peak, because the thread is not actually in sleep mode the CPU will resume execution of the thread and never go idle.
LVL 22

Accepted Solution

ambience earned 500 total points
ID: 37817506
High resolution has always been a problem under Windows and in general there is no guaranteed way to achieve microsecond level precise sleep for durations < 1ms. Windows 7 has User Mode Scheduling as described here I'm absolutely unsure whether thats relevant or whether it could achieve higher performance compared to the system scheduler but apparently its designed to serve that purpose.

But even that would not guarantee anything because
By default the thread quantum on Windows NT based systems is about 100 milliseconds (I believe for servers). This means that a thread can “hog” the CPU for up to 100 milliseconds before another thread has a chance to be scheduled and actually execute.

BTW, why is it absolutely necessary to achieve switching time of <2us? How much time does the worker take to process and send back a reponse (if at all)? Do you think 6us of swtiching delay can be included in the time it takes to process requests?

If you are going to say no the response must be sent within 2us (for example) then its perhaps better not to have worker threads?


On a different note, if you are running on multiprocessor machine then perhaps you can assign different processor affinities to your IO and worker threads such that none would compete for the same processor. It would work because even if the threads quantum expires it would get assigned again to the same processor and wont have to wait. That would only peak one of the CPUs and the other one would be IO bound. I haven't tried such an arrangement but it might work.

Author Comment

ID: 37818226
I didn't aware when they introduced the User Mode Scheduling but this surely worth trying, although it is only supported in 64bit applications. Actually I am not quite sure about why it takes 6-7us to do the switching on my fairly idle machine. Whether it is normal context switching overhead or it's the implementation of windows event-wait mechanism? Having a chance to manage the scheduling might help finding it out. Thanks!

The worker threads simply call a blocking API, which immediately send request to another server and then block until server responds. It is the time when the request reach the server matters. There are definite number of API instance (each with different user logon) making up the throuttle rate serving a burst of several requests. That's why it has to be done in worker threads.
LVL 22

Expert Comment

ID: 37818831
But still trying to optimize for 3-4us - is this effort worth it? Does the rest of the code provide guaranteed bounded response times? I mean given delays can occur anywhere and a thread can hog all other threads, is this delay the only major obstacle?

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s been over a month into 2017, and there is already a sophisticated Gmail phishing email making it rounds. New techniques and tactics, have given hackers a way to authentically impersonate your contacts.How it Works The attack works by targeti…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

778 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question