For NickRepin only:"Problem with a spin lock using InterlockedExchange between processes"

MFC Programming Topic Area/"Problem with a spin lock using InterlockedExchange between processes": http://www.experts-exchange.com/jsp/qShow.jsp?ta=mfc&qid=10236408 
LVL 3
stefanrAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

NickRepinCommented:
Ok, I'm pretty sure now.

Here is the code for InterlockedExchange in my SP6 kernel32.dll:

mov ecx,pVar
mov edx,lNew
mov eax,[ecx]
loop:
nop
cmpxchg [ecx],edx
jne loop
ret

Strange enough...

ntoskrnl.exe also contains InterlockedExchange:

xchg pVar,lNew
ret

Intel manual says: to make cmpxchg atomic, you have to place lock prefix before this instruction. There is no lock prefix in kernel32 (nop instead). It is definitely a bug!
On the other hand, according Intel, xchg generates the lock signal regardless of presence of the lock prefix. Moreover, Intel *suggest* to use xchg for synchronisation purpose!

Amazing, but Win2000 kernel32.dll contains lock prefix!

Ie, NT internally uses the correct version of Exchange (ntoskrnl.exe) while your app uses buggy one (kernel32).


To make your program work correctly and faster, replace interlockedExchange() with the following code:

   PVOID addr=&Your_lock_variable;
   DWORD result=TRUE (or FALSE) // New value

   __asm {
      mov edx,addr
      mov eax,result
      xchg eax,[edx]
      mov result, eax
   }
   // Now you can compare result with FALSE

   
  You see, this code is smaller than kernel32.dll's one and faster - there is no need in expensive call/ret instructions.

 I'm sure on 99.9% only, so I place this answer as comment. If it helps, you can accept my comment, else you can delete this Q.
   



0
nietodCommented:
>> to make cmpxchg atomic, you have to place
>> lock prefix before this instruction
It is atomic, it can not be interrupted in the middle of the instruction.  So in a single processor system the lock is not needed (ever).  You only need to use the lock prefix in a multi-processor system.  
0
jhanceCommented:
It would be interesting to look at this same section of code on a multi-CPU installation of NT.  It wouldn't surprise me if the nop were replaced with a lock in that code.  The lock is very "expensive" in terms of performance and replacing it with a nop makes sense on a single CPU machine.

This may be one reason why you can't just drop a 2nd CPU into a installed NT box without having all manner of trouble.
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

nietodCommented:
Almost certainly true.  Although an occasional lock is not that "expensive", but it is entirely a waste on a single CPU machine.
0
NickRepinCommented:
At first, the Q is about multiprocessor PC.

Second, I was not absolutely sure about the kernel32 and asked stefanr to send me his ver of dll (I have not receive it yet). That's why I placed the comment here, not an answer.

Yesterday, I found only one kernel32 among WinNT installation files. I checked already installed dll and found 'nop' instructions. I thought that it must be two versions of kernel32 for a single and multi proc PC.

Well, I was wrong. It seems that Windows setup patches kernel32 while installation - it replaces lock with nop for single proc PC. Still possible, until not proved, that stefanr's kernel32 contains nop instead of lock.
It's easy to check by stepping inside InterlockedExchange with debugger.

On the other hand, IMHO, InterlockedExchange code appears too stange. Why they use so complex code with loop instead of just xchg? And, according to stefanr's experience, it works not as good as expected.

So I offered him to try the fastest and safest solution with the one xchg instruction.

Any objections against xchg?



0
NickRepinCommented:
BTW, do clients and server work on the same PC or on different ones? If on different, then InterlockedExchange() is the wrong choice at all.
0
NickRepinCommented:
Also I assume that other parts of server and clients work ok, ie, don't overwrite wrong memory, pair UnlockHeap call with successful LockHeap only, etc.
0
nietodCommented:
>> Why they use so complex code with loop
>> instead of just xchg
An xchg is sufficient when there is only two states to the sempahore, i.e. locked and unlocked and when you don't need to record "who" has locked the semaphore.  (i.e. it is up to the "locker" to rember that it has locked the semaphoire.)  cmpxchg is used when the semaphore will record tyhe identify of "who" locked the semaphore, like a thread ID, for example.

>> complex code with loop instead of just xchg
Well you need the loop to wait until the lock succeeds, You are porposing that the wait (or test in any case) occurs outside of the assembly code, but you have it in your design too.  That is another advantage of cmpxchg, it does the initial testing too.

>> Any objections against xchg?
Its tried and true, it just doesn't record the "identity of the locker", but that can be remedied with more code.

Interestingly there does appear to be a bug in the cmpxchg code (unless they also patch the jmp instruction, but that seems unlikely).  The code should jump back to where edx is loaded.  This is because if the lock fails, edx will be loaded with the current "lock's ID", so edx will be corrupted for the 2nd try.
0
NickRepinCommented:
Sorry, nietod, but I am not agree with you.

Plase explain what is the functional difference between the code of InterlockedExchange with cmpxchg (see my first comment) and single xchg instruction?

<<cmpxchg is used when the semaphore will record tyhe identify of "who" >>

Sorry, what are you talking about? Don't forget - we discuss particular case of InterlockedExchange.

<<it just doesn't record the "identity of the locker", but that can be remedied with more code. >>

Sorry, where do you see 'identity of the locker' in the InterlockedExchange?

<<An xchg is sufficient when there is only two states to the sempahore, i.e. locked and unlocked >>

Why?! xchg can be used with 32-bit values. ie 2^32 states.

<<Interestingly there does appear to be a bug in the cmpxchg code (unless they also patch the jmp instruction, but that seems unlikely).  >>

Sorry again, but I'm afraid that you misunderstood something with cmpxchg. edx is not affected by cmpxchg instruction. Only destination operand and EAX does.
0
stefanrAuthor Commented:
I have mailed the KERNEL32.DLL file now, even though it seems that the original mail address were somewhat mis-spelled.
I have the same error even with the new code. I'm not sure if the loading of MSIDLE.DLL is a cause of the error, or if it has something to do with ASSERT, but it is always accompanying the assert in debug mode, indicating that the values are not as expected in the header. I have tried to write out the header too when the error occurs, and according to that it has sometimes a value of 0 in dwThreadId and nLockCount, indicating that the UnlockHeap function has updated those values successfully but not yet updated the bLocked field, and sometimes the value dwThreadId is a valid thread id of another thread, indicating that the UnlockHeap function is not yet called, or has not started to update the structure fields. Very strange!
0
nietodCommented:
>> we discuss particular case of
>> InterlockedExchange.
I'm talking about the cmpxchg and xchg instructions' use in general, with xchg, you set the semaphore to either a locked state or an unlocked state, but there are only these two values.  With cmpxchg you set the semaphore to a value that identifies "who" has locked the semaphore.  This cannot be done safely with just xchg.

>> Why?! xchg can be used with 32-bit
>> values. ie 2^32 states.
Consider this approach. The resource is unlocked whent he semaphore is 0 and is locked when a thread stores its ID in the semaphore.  Now to lock the semaphore you exchange your thread ID with the current value of the semaphore.  If the value you get "back" is 0, the semaphore was unlocked, so now you've locked it.  But if the value you get back is not 0 the semaphore was already locked and you've failed.  But you've also altered the state of the semaphore, it records the wrong thread ID now.  So now what?  Do you store the original thread ID back in the semaohore?  What if the other thread now unlocks the semaphore before you restore its ID?  Then you've made a mess.  That thread thinks its unlocked the semaphore, so it won't try to agian, but the semphore is in a locked state.

cmpxchg works by not changing the semaphore's value if it is locked.

>> edx is not affected by cmpxchg instruction.
>> Only destination operand and EAX does.
Right.  I had to look it up.  I was doing it from memory and thought that when the comparison fails, the source, is loaded with the destination value.  It is the accumulator that is loaded when the destination.  But consider the effect of that!  The next cmpxchg will operate with the value that was stored in the semaphore. That's a bug.  with cmpxchg, the accumulator should be loaded with the value that indicates that the semaphore is unlocked.  But on the 2nd try it will be loaded with a value that indicates that the semaphore is locked!  So there is a good chance it will succeed (do the exchange) the 2nd time.  That makes no sense.  The code is a mess.
0
NickRepinCommented:
To nietod:

Sure, cmpxchg and xchg are different instructions. I'm totally agree with you in general. But I was talking about particular case - InterlockedExchange() and single xchg functionally are equal.

I'm agree that cmpxchg itself may be more useful than xchg. But this concrete case - InterlockedExchange - contains too much excessive code that can be replaced with single xchg.

Sorry, nietod, you are wrong.

1) EAX is loaded just because it must contain the return value of InterlockedExchange().

2) Loop breaks if eax==[ecx] and, because of that, edx is loaded into [ecx]. So eax contains the old value of ecx, [ecx] contain the new value from [edx]

3) If eax!=[ecx], eax is loaded with new (changed by somebody else) value of [ecx] and cmpxchg is executed again. Goto step 2.

0
NickRepinCommented:
To stefanr:

If the new code with xchg doesn't help, then the problem obviously is not with InterlockedExchange itself (by the way, I've checked your dll, it's OK and contains the lock prefixes).
Especially keeping in mind that the problem happens on single-processor PC as well.

Anyway, if we find the source of the bug, you can use xchg instead of InterlockedExchange() - it will be several times faster and let you to eliminate the excessive if() statements.

Could you please answer the following questions:

1) Are clients and server running on THE SAME computer?
2) What is the exact code of Create/OpenFileMapping() and MapViewOfFileEx() for both client and server?
3) You said about assert(). Do you mean the following in the **UnlockHeap**():

if (!m_lpHeader->bLocked || (m_lpHeader->bLocked && ::GetCurrentThreadId() !=       m_lpHeader->dwThreadId))

4) You said it appears in release mode only. Could you compile your files with Lock() and Unlock() in release mode with /FAcs and send me the resulting .cod file?
     

Regarding dll relocation...

Problem may appear just because of the stress caused by the dll relocation (context switches etc). You can try to run another big programs like MS Word at the same and see what happens.

Or, may be, MMF is not really fit for using as shared memory for synchronization purpose (it is possible :(, at least I did not find the exact reference to this in the MSDN library. MS says that you can use mutex of event, but doesn't say about InterlockedExchange). Suppose, Windows expects that you use other methods to synchronize access to MMF - events, mutexes etc, but not InterlockedExchange. In this case, while dll relocation, it can remap physical storage of MMF w/o care of processes which try to write the same address at the same time.

On the other hand, MS says that InterlockedExchange may be used by different processes if variable is in the shared memory. I cannot recall another way to allocate the shared memory except of dll shared segment.

May be, it makes sence to try another type of shared memory to make sure that the problem is not in the MMF itself?
It shouldn't be too hard. I have to say that it may be done in several minutes may be.

Link the following dll.
Link your client and server with shared.lib.

In client/server, obtain the address of shared variable:

  LONG& bLock=*getAddr();

Then use this bLock in calls to InterlockedExchange()


//----------
// Shared.dll
//----------
#include <windows.h>

//-----------
// Global shared data.
//-----------
#pragma data_seg(".shared")
LONG bLocked=0;  
#pragma data_seg()
#pragma comment(linker,"/SECTION:.shared,RWS")

//-----------
__declspec(dllexport) PLONG getAddr()
{
   return &bLocked;
}










0
jkrCommented:
As I stated in the MFC thread, I'd prefer synchronization objects as semapthores. As far as I followed the discussion, using 'Interlocked*()' on MMF areas is quite 'esoteric'; and it reminds me of the 'memory interlock' techniques (what operation is guaranteed to be atomic?) in the early days of parallel processing...
0
yoffeCommented:
I am curious if using Interlocked* in this manner (instead of Mutex, Evt, etc)if truly faster - and if so, how much.

~Yoffe
0
NickRepinCommented:
Sure, it has to be much more faster (may be, in hundrends times) - compare one assembler instruction with the creation of the named NT object (event, semaphore, mutex) and then calling waitForObject() which uses timer etc etc etc .... Don't forget about security checking.

stefanr,

Before using bLocked, try:

  VirtualLock(&bLocked,sizeof(bLocked))

You can try also VirtualLock

(&m_lpHeader->bLocked,sizeof(m_lpHeader->bLocked));

Just in case...
0
stefanrAuthor Commented:
NickRepin, your comment to use a variable in a shared section of the DLL looks promising. I have tested it on the single-processor computer, and have not had the problem so far. If it holds on my dual-processor machine too, you have deserved the points.

Then were the questions:

1) The applications I use for testing runs on the same computer.

2) I have a (CFile-derived) class that opens and maps the MMF were the relevant code is this:

BOOL CMemMappedSwapFile::Open(LPCTSTR lpszFileName, UINT nOpenFlags, UINT nMaximumSize, CFileException * pError)
{
   // [snipped code; locks this object; checks and initializes variables]

   SECURITY_DESCRIPTOR sd = { 0 };
   if (!::InitializeSecurityDescriptor(&sd, SECURITY_DESCRIPTOR_REVISION))
   {
      return FALSE;
   }
   // Set the DACL to allow EVERYONE
   ::SetSecurityDescriptorDacl(&sd, TRUE, NULL, FALSE);

      // map modeNoInherit flag
      SECURITY_ATTRIBUTES sa;
      sa.nLength = sizeof(sa);
      sa.lpSecurityDescriptor = &sd;
      sa.bInheritHandle = (nOpenFlags & modeNoInherit) == 0;

      // map creation flags
      DWORD dwCreateFlag;
      if (nOpenFlags & modeCreate)
      {
            if (nOpenFlags & modeNoTruncate)
                  dwCreateFlag = OPEN_ALWAYS;
            else
                  dwCreateFlag = CREATE_ALWAYS;
      }
      else
            dwCreateFlag = OPEN_EXISTING;

      // attempt file creation
   BOOL bCreateFileMapping = (modeCreate == (nOpenFlags & modeCreate));
   HANDLE hFile = NULL;
   if (bCreateFileMapping)
   {
      // CreateFileMapping; specifies the protection desired for the file view, when the file is mapped.

      hFile = ::CreateFileMapping(HANDLE(0xFFFFFFFF), &sa, dwProtect, 0, nMaximumSize, lpszFileName);
   }
   else
   {
      // OpenFileMapping; specifies the access to the file-mapping object.

      hFile = ::OpenFileMapping(dwAccess, FALSE, lpszFileName);
   }

   // [snipped code; error checking]

   m_hFile = (HFILE)hFile;
   m_lpFileView = lpFileView;
   m_cbFileView = nMaximumSize;
   m_bCloseOnDelete = TRUE;

   return TRUE;
}

3) I have changed the code around a lot (adding checksums etc.), but essentially it is that test that asserts.

4) Sorry, I seem to have forgot to mention that the error also occurs in debug mode since my initial tests. It just seems less probable to occur in debug mode (probably because of the speed).

I have tested to simulate the message sending procedure in a single application to see what happened. I had one thread that used the allocator to produce messages, saving the handles in a list and signalling a router thread. The router thread (that also created and initialized the MMF at startup) removed the handles from the list and sent them (using PostThreadMessage) to a consumer thread, that checked the memory and freed the handle. That worked without a problem.

It would be interesting with a theory regarding the difference using a shared variable in a MMF and in a shared DLL data segment, if there is any. At least it seems that InterlockedExchange believes that.
0
nietodCommented:
nick, what a weird way to use the instruction.  Hard to follow all that.  In the end you have to wonder why they aren't just using xchg?
0
NickRepinCommented:
stefanr,

I asked you about Create() and Map() just because I want to see the flags you pass to them.

CreateFileMapping must be used with PAGE_READWRITE.
Try to specify SEC_NOCACHE also (it's interestingly).

MapViewOfFile and OpenFileMapping must be called with FILE_MAP_ALL_ACCESS or FILE_MAP_WRITE.

nietod,

<<Hard to follow all that>>
<<In the end you have to wonder why they aren't just using xchg?>>
Exactly

As I mentioned above, there are InterlockedExchange in the ntoskrnl.exe and ntkrnlmp.exe.
Its code:
   xchg edx,[ecx]
   mov  eax,edx
   ret

There are InterlockedCompareExchange() near InterlockedExchange() in the kernel32. I've taken a look on it as well. The code is the same, except of there is no jmp instruction. It seems that MS programmer just forgot about xchg and tried to solve task with cmpxchg.
0
stefanrAuthor Commented:
Oh, I see ! Yes, the flags ends up as

dwProtect = PAGE_READWRITE;
hFile = ::CreateFileMapping(HANDLE(0xFFFFFFFF), &sa, dwProtect, 0, nMaximumSize, lpszFileName);

dwAccess = FILE_MAP_ALL_ACCESS;
LPVOID lpFileView = ::MapViewOfFile(hFile, dwAccess, 0, 0, 0);

on the server side, and

dwAccess = FILE_MAP_ALL_ACCESS;
hFile = ::OpenFileMapping(dwAccess, FALSE, lpszFileName);
LPVOID lpFileView = ::MapViewOfFile(hFile, dwAccess, 0, 0, 0);

The interesting part is that
dwProtect = PAGE_READWRITE | SEC_NOCACHE;

causes an error in CreateFileMapping with GetLastError returning 87.

I'm still working with the shared DLL section, but I am pretty sure now that it will succeed. The tricky part is to make is as flexible as having the lock variable in the MMF itself (using several heaps at once and such stuff), and to make it work when using the release and debug versions of the DLL simultaneously (in different applications, of course).
0
NickRepinCommented:
<<causes an error in CreateFileMapping with GetLastError returning 87. >>

It's necessary to use this:

PAGE_READWRITE | SEC_NOCACHE | SEC_COMMIT

I'm not sure about the result. I have not used it before, but it is interesting  - will it affect the problem or not. Anyway, it seems it will not fit for final release, at least for the whole heap, because it's slow (according to MS docs). On the other hand, it can be fast for the small MMF which will contain the header only, but not the heap itself.

<<I am pretty sure now that it will succeed>>
I also was pretty sure when I added the first comment here :)

<<The tricky part is to make is as flexible as having the lock variable in the MMF itself >>
I don't see any problems... It must be easier than even MMF. The only problem is that you can use only the fixed number of the lock variables in a shared section. But may be, 1000 will be enough - it's only 4K in size. With xchg instruction, you can use the byte instead of DWORD, so 1000 vars will take 1K only. Also with byte it's not necessary to worry about alignment.
0
stefanrAuthor Commented:
OK, PAGE_READWRITE | SEC_NOCACHE | SEC_COMMIT worked, but had no effect the error. It still seems that a MMF can not contain a SpinLock variable.
I still think it's more complicated to use an array of lock variables. I have implemented it like this (in an additional DLL that is not specific for Debug or Release, which the normal DLL is):

// .H
namespace NSAllocationHeader
{
struct CAllocationHandlerDllLock
{
   LONG bLocked;  // Locking flag variable, used by InterlockedExchange.
   LONG bAquired; // Flag that tells if this element is used by an instance of CAllocationHandler.
};

const int ALLOCATION_HANDLER_LOCK_SIZE = 256;
const TCHAR ALLOCATION_HANDLER_MUTEX_NAME[] = _T("AllocationHandlerDllLockMutex");

extern YFCDATA_API CAllocationHandlerDllLock g_rgDllLock[];
}

// .CPP

namespace NSAllocationHeader
{
#pragma data_seg("YfxShared")
CAllocationHandlerDllLock g_rgDllLock[ALLOCATION_HANDLER_LOCK_SIZE] = { 0 };
#pragma data_seg()
#pragma comment(linker, "/section:YfxShared,rws")
}

.. . .

// Instance initialization

   CMutex mutex(FALSE, NSAllocationHeader::ALLOCATION_HANDLER_MUTEX_NAME);

   // Local block to iterate through AllocationHandlerDllLock.
   {
      CSingleLock lock(&mutex, TRUE);

      BOOL bAquired = FALSE;
      for (int i = 0; !bAquired && i < NSAllocationHeader::ALLOCATION_HANDLER_LOCK_SIZE; i++)
      {
         if (!NSAllocationHeader::g_rgDllLock[i].bAquired)
         {
            m_lpHeader->nDllLockIndex = i;
            TRACE(_T("CAllocationHandler::InitializeHeap: Found free lock NSAllocationHeader::g_rgDllLock[%ld]\n"), m_lpHeader->nDllLockIndex);

            NSAllocationHeader::g_rgDllLock[m_lpHeader->nDllLockIndex].bAquired = bAquired = TRUE;
            NSAllocationHeader::g_rgDllLock[m_lpHeader->nDllLockIndex].bLocked = TRUE; // Array is already locked with mutex.

            m_lpLocked = &NSAllocationHeader::g_rgDllLock[m_lpHeader->nDllLockIndex].bLocked;
         }
      }

      if (!bAquired)
      {
         return FALSE;
      }
   }

   m_lpHeader->nInstanceCount = 1; // Initial instance count when initialized.

.. . .

// Instance termination

   if (NULL != m_lpHeap)
   {
      CAllocationHandlerLock lock(this); // Gain exclusive access to the heap.

      m_lpHeader->nInstanceCount--;
      if (0 == m_lpHeader->nInstanceCount)
      {
         CMutex mutex(FALSE, NSAllocationHeader::ALLOCATION_HANDLER_MUTEX_NAME);

         // Local block to iterate through AllocationHandlerDllLock.
         {
            CSingleLock lockArray(&mutex, TRUE);

            NSAllocationHeader::g_rgDllLock[m_lpHeader->nDllLockIndex].bAquired = FALSE;
            // NSAllocationHeader::g_rgDllLock[m_lpHeader->nDllLockIndex].bLocked is automatically set to FALSE by the lock variable.
         }
      }
   }

Another, although minor, drawback is that it is now impossible to link all files statically, if desired.
0
NickRepinCommented:
Does it mean that the shared section works FINE?

Well, yes, additional code required to share locks between several heaps.

You can use the following - it's slightly more simple:

struct {
  LONG bAq;
  LONG bLock;
} Hdr;

// Shared section
Hdr  h[100]={0} //init all with FALSE


DWORD ReqHdr()  // Returns array index, or MAXDWORD if there is no free hdrs
{
   for(int i=0;i<100;i++) {
      if(!hdr[i].bAq) { // Seems free
        if(FALSE==InterlExc(hdr[i].bAq,TRUE)) {
            // Ok, locked
            return i;
         }
      }
   }
   ret MAXDWORD;
}


void FreeHdr(DWORD i)
{
   hdr[i].bAq=0;
   // or
   InterlExc(hdr[i].bAq,FALSE)
}

<<Another, although minor, drawback is that it is now impossible to link all files statically, if desired>>

What do you mean? You can export data, not function only.
// in dll
__declspec(dllexport) CAllocationHandlerDllLock g_rgDllLock;
// in app
extern CAllocationHandlerDllLock g_rgDllLock;
0
stefanrAuthor Commented:
I have now tested the shared data section during a longer time period, and even if it really works better it doesn't solve the problem. I suspect that since the DLL is small compared to the MMF itself, it is less probable that the DLL memory is "moved" (or whatever happens) than the MMF memory. I didn't manage to get the locking error on the single-processor machine during a longer testing period, but it did occur on the dual processor machine after a somewhat longer run than ususal.
I have also tried to use the VirtualLock function to make sure (?) that the MMF memory is not swapped out, but it did not help. Maybe I just don't know how to use it correctly, I am not sure that I computed the necessary amount memory to lock correct. I used the Performance Data to guess how much memory the application needed, and then added the size of each MMF heap used. It became in this particular case round 11-12 MB.

What I meant by linking statically is to create an application without the need of any custom DLL:s (other than the system and perhaps MFC DLL:s).
0
NickRepinCommented:
I have nothing more to add.

It seems that it is just a bug in MS manual that Interlocked can be used for threads of dirrerent processes.

Use mutex.
0
NickRepinCommented:
More suggestions...

Try to use xchg with byte operand instead of LONG. May be, the problem is in alignment. Although I don't thisk so.

I think the following will help.

Once you created the thread which will use InterlockedExchange, call SetThreadAffinityMask(thread,1). Check return value to make sure it works.
You have to do it both for client and server.
This will ensure that threads with InterlockedExchange will run on the same CPU.

Another Q, how does it affect the performance. Hard to say, but not forget that there are many other threads in the system except of your ones, so it may be acceptable.

Also you can try this:

 DWORD oldm=SetThreadAffinityMask(GetCurrentThreadHandle(),1)
ASSERT(oldm) make sure it works.

  // Probably, Sleep(0) is necessary here - I don't know - try with it and without.

  Sleep(0);

.... Execute LockHeap() or UnlockHeap() code here .....

SetThreadAffinityMask(GetCurrentThreadHandle(),oldm)


Ie, change affinity mask on the fly.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
NickRepinCommented:
I don't think that VirtualLock helps, but it's enough to lock only 'lock' variables.

eg,

CAllocationHandlerDllLock g_rgDllLock[ALLOCATION_HANDLER_LOCK_SIZE] = { 0 };

VirtualLock(g_rgDllLock,sizeof(g_rgDllLock))

Do it for both client and server.


I think that SetThreadAffinityMask is more promising.
0
stefanrAuthor Commented:
At last it seems that I have something that works good enough! Yes, a combination of a shared data section in a DLL and SetThreadAffinityMask did the trick, even for the dual processor machine. It did not crash during the test even without the Sleep(0). I do not for a second believe that it will always work, but as long as it is highly improbable that it happens I think we have to be satisfied.
It seems that there are some performance penalty. I will examine if the performance still suffices, can be dealt with, or if the requirements can be eased in this respect (I have not good hopes for that last point, though).

You finally got your points!
0
stefanrAuthor Commented:
BTW, I found the following sentence in the article "An Atomic Counter for Guaranteed Thread Safety" written by John M. Dlugosz:

". . . InterlockedExchange . . .
As an aside, I was surprised to find that the Microsoft compiler for x86 doesn't emit inline code for these. Although they could be optimized down to a single machine instruction, you instead call a small function. Furthermore, _the_InterlockedExchange_is_implemented_so_poorly_that_I_had_to_get_a_second_opinion_to_even_be_sure_it_was_correct_."

!
0
NickRepinCommented:
<<I do not for a second believe that it will always work>>
WIth SetAffinityMask it should work even with FileMapping.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Development

From novice to tech pro — start learning today.