dcdillon
asked on
C++ Volatile Read/Write
I want to do something along the lines shown below where each thread is executing on its own CPU.
What I expect this code to do is have thread1 set doJob = true (provided that doJob is currently false), thread2 to call job() after thread1 has set doJob = true, and then set doJob = false. After this, of course thread1 will set doJob = true and the process repeats.
The failure would be that the while loops checking the value of doJob outside the critical section would never end because of stale data.
I believe that the volatile keyword used will prevent the compiler from caching the value of doJob and that my lock, unlock pair will guarantee that the other CPUs cache is invalidated and thus the value loaded appropriately and everything executes as expected. The penalty here, I believe, is the possibility of spurious wakeups (of either thread), and an extra check on doJob which I am willing to suffer.
Can anyone help me to confirm that my analysis is correct/incorrect?
What I expect this code to do is have thread1 set doJob = true (provided that doJob is currently false), thread2 to call job() after thread1 has set doJob = true, and then set doJob = false. After this, of course thread1 will set doJob = true and the process repeats.
The failure would be that the while loops checking the value of doJob outside the critical section would never end because of stale data.
I believe that the volatile keyword used will prevent the compiler from caching the value of doJob and that my lock, unlock pair will guarantee that the other CPUs cache is invalidated and thus the value loaded appropriately and everything executes as expected. The penalty here, I believe, is the possibility of spurious wakeups (of either thread), and an extra check on doJob which I am willing to suffer.
Can anyone help me to confirm that my analysis is correct/incorrect?
volatile bool doJob = false;
pthread_spinlock_t lock;
void job()
{
// do some work
}
void thread1()
{
while (true)
{
while (doJob)
{
}
pthread_spin_lock(&lock);
if (!doJob)
{
doJob = true;
}
pthread_spin_unlock(&lock);
}
void thread2()
{
// keep spinning and checking when you should execute
// job()
while (true)
{
while (!doJob)
{
}
pthread_spin_lock(&lock);
if (doJob)
{
job();
doJob = false;
}
pthread_spin_unlock(&lock);
}
}
There is a current discussion here, some of which may interest you. https://www.experts-exchange.com/questions/26488392/Atomic-i-in-C.html
ASKER
I want them to be very busy. I just need to confirm that this will work as I expect.
ASKER
While I understand that this particular situation would work appropriately with atomic operations on the doJob variable, the actual case that I want to use is more complex. Basically I have tested it while locking and checking the value of doJob, but the performance level I can achieve with the pattern I have shown above is significantly better than always protecting doJob with a synchronization object.
I like the performance I can achieve with the code the way it is, I just hope to confirm that I cannot get into a situation where one of the threads has changed the value of doJob and the other is tight looping (outside the lock) and gets stuck due to cached values.
I like the performance I can achieve with the code the way it is, I just hope to confirm that I cannot get into a situation where one of the threads has changed the value of doJob and the other is tight looping (outside the lock) and gets stuck due to cached values.
>> my lock, unlock pair will guarantee that the other CPUs cache is invalidated
Offhand, I didn't see this. Is this an assumption or do you have a reference to confirm this?
In one OS where two processes shared memory, it wasn't enough just to use volatile in the compiler. We also had to mark the shared memory as volatile so as to disable CPU cache. (Before doing that, the results in the debugger might show that 1+1=3.)
Offhand, I didn't see this. Is this an assumption or do you have a reference to confirm this?
In one OS where two processes shared memory, it wasn't enough just to use volatile in the compiler. We also had to mark the shared memory as volatile so as to disable CPU cache. (Before doing that, the results in the debugger might show that 1+1=3.)
ASKER
It's an assumption that I am making and essentially is the crux of my question. I simply don't know the answer nor have I been able to find any documentation to lead me to the answer. I know that volatile isn't enough, but I suspect that the fact that I am using a well documented mutex around any changes to the value will make it work. And in testing it works fine, but you know as well as I do that testing it 1 million or 10 million or 20 billion times proves nothing.
If you have a debugger where you can run the two threads, then by using breakpoints you can control the timing. You can test your assumption by setting a value in one thread, and seeing whether it changes in the other thread. Just make sure that the two threads are on different CPU's.
I would have thought that the cache would not be invalidated merely by using those locks. I also did a quick look and found nothing to support that assumption.
I would have thought that the cache would not be invalidated merely by using those locks. I also did a quick look and found nothing to support that assumption.
ASKER
Alright. I'll give that a try and let you know what I see.
My understanding is that making doJob volatile will tell the compiler not to simplify while(doJob){} down to while(true){}. It makes no specification of 'invalidating any caching of the value of doJob' - I'm sure that makes sense but I think you may be relying on an implementation-specific feature. Another thing you ought to test in this experiment is the effect of different levels of compiler optimization because O0, O1, O2, O3 may each produce different test results.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
- Try using pthread_condition_wait instead
For the same reason, the spin lock is a bit rude for my taste but since you presumably want a cpu per machine, you could remove the initial while loops and use your spin-locks to protect your shared variable.
Thread1 would 'spin' its CPU all the time Thread2 was running on the other CPU - not much different than you have now.
A normal blocking mutex would be more friendly in a shared environment but perhaps you intend to 'own' 2 CPU's for the price of one for research purposes... If you are simply waiting in thread1 for the job being done by thread2 to complete, it makes more sense for it to go fetch me a beer while it's waiting for a pthread_condition_signal !