Link to home
Start Free TrialLog in
Avatar of gil_mo
gil_mo

asked on

Looking for "DLL local storage" mechanism...

Similar to TLS, I'm looking for some method to have global set of variables that are similar in definition in a few DLLs, but have different addresses.

For example, consider a.dll, b.dll and lib.dll, all loaded into memory.

I need both a.dll and b.dll to have an exported global, say, char fname[10] recognized in lib.dll; in a.dll fname=="a.txt", in b.dll fname=="b.txt".
Both DLLs call at some point a lib.dll function, that uses the global to, say, output some data to the file fname.

Note: not like TLS, this all might be using a mere one thread so TLS is ruled out. That means that I need some mechanism that whenever a process "swithces" DLL, a new set of physical memory pages is mapped to the correct address space.

Any solutions?
Avatar of jkr
jkr
Flag of Germany image

The behaviour you want to realize is the default for DLLs that are mapped into a process' address space (unless you use a shared segment explicitly). BUT: Exporting a symbol with the same name from two DLLs will only work if they're loaded dynamically...
Avatar of gil_mo
gil_mo

ASKER

jkr,
The symbols have the same name but a different value. The symbol must be recognized by the lib.dll and 'receive' the correct value depending on the calling DLL.
Can a.dll/b.dll drop the info into a collection point, and tell lib.dll how to get it?  e.g., a memory mapped file; a.dll puts the data in the (well-defined) MMF, and tells lib.dll where in MMF to read it?  If you aren't a stickler for performance, any ol' file could do, too.

Another possible direction to look would be to not use DLL calls - instead use some other IPC like pipes or mailslots, or CORBA or COM.

Just shootin' out ideas as fast as I can think of 'em - no guarantees they'll make your life any easier. :-)
You could also consider doing something like so:

Have a function in LIB.DLL that the other DLLs (a.dll and b.dll, etc) must call.  Have this function take the file name as a parameter and have the function return a handle.

Have the LIB.DLL function allocate a structure that holds the file name and maybe the file handle and possibly other house keeping information.

Then, when a.dll or b.dll need to have something written to the appropriate file lib.dll can resolve the appropriate file name.



For example:

in the .h file for the lib.dll

typedef struct _ClientInfo
{
    char FileName[MAX_PATH];
    HANDLE hFile;  // use FILE * fpFile if using stdio.h

    // any other client specific info

} CLIENTINFO, * PCLIENTINFO;

In Lib.DLL
add

static PCLIENTINFO pClientInfo = (PCLIENTINFO) NULL;

then add some functions

DWORD WINAPI ClientInitialize(char * FileName)
{
PCLIENTINFO TempClientInfo = (PCLIENTINFO) NULL;


    // create the client structure
    TemphClient = (PCLIENTINFO) malloc(sizeof(CLIENTINFO));

    if(TempClientInfo)
    {
        memset(TempClientInfo, 0, sizeof(CLIENTINFO));
        pClientInfo = TempClientInfo;
    }


    // Attempt to open the file or whatever initialization it needs

    return 0;  // or an error code
}

DWORD WINAPI ClientTerminate(void)
{
   if(pClientInfo)
   {
       // close the file
       // release any memory in the
       // ClientInfo structure (if necessary)

       // free the ClientInfo structure itself and reset the pointer
       free(pClientInfo);
       pClientInfo = (PCLIENTINFO) NULL;
   }


   return 0;  // or an error code
}

Now your function calls in LIB.DLL can have the client handle as a parameter like so:

DWORD WINAPI yourAPIfunction(param, param2, parametc)
{
  // do whatever the function does

  // if it needs to write to a file
  //  just acces the pClientinfo

}


Each using application would have it's own pClientInfo static global.


Now, if as your question indicates there is only one process but several of it's DLLs are going to use lib.dll then you would have to:

Not use the static implementation above,
but return the pClientInfo pointer back to the caller from ClientInitialize and have the clients pass this pointer into the dll.lib calls as a parameter.  You would also have to pass a pointer into the ClientTerminate call to free the structure to avoid the memory leak.






Can't pass a char* - you'll get the pointer, but it's only valid in its original address space.

There may be a way to pass a string by value...?  I don't know of any, since C++ likes to handle arrays as pointers to memory.  At the least, you'd have to know how big your buffer was beforehand.  I suppose you could impose limitations on the string length, and pass that many chars to the function as params - reassemble them on the other side. Man, that'd be ugly.

Anyway, if you get to where you can pass the string "directly" - by value, or by getting both dlls looking in the same address space - then you don't need the extra call - just pass your string to the "real" function.

An anectdote - many years ago, I was on a project where we needed to pass a file name between two processes on NT.  We didn't really understand MMFs, but we did understand message passing.  Since we knew what our file extension was, and since we were using old DOS 8.3 file names (this was NT 3.5 - maybe even earlier), and since in Win32 both WPARAM and LPARAM were 32 bits.... we just packed the first four characters in WPARAM and the second four in LPARAM, and broadcast a registered message that our other program was listening for.

Ah, the good ol' days... :-)

(Disclaimer: I in no way condone pulling that kind of crap!)
azami,
You can pass the char * pointer to the DLL, because both the process calling the DLL function and the associated DLL instance itself are mapped to the same process space.

This is not a problem.

Also, any memory allocated in the DLL belongs to and is addressable by the calling process - because only a process owns memory.  Provided, of course a pointer is returned.
I just re-read the question, and I think I may have misinterpreted it to begin with.

If the address space is not the issue - that is, if lib.dll has no trouble seeing something in a.dll's address space - then this "global" data should not be global, but rather part of a "context".  Best would be for each dll to pass its context when calling lib.dll's entry points. If for some reason that is not feasible, then lib.dll can have a global context (used by all its functions), and a switch(context) entry point that a.dll/b.dll can call to "switch" it.
On re-reading, wylliker's comment is exactly the same as my last one... Client_Info is the "context", ClientInitialize() is the "switch".
Avatar of gil_mo

ASKER

I'm afraid my initial point had been missed; maybe I should have made myself more clear.
I compared the required mechanism to TLS, since the TLS system handles context swiching *automatically* without having the programmer intervene, initiate forced calls or passing mandatory handles.

Ok, here's the whole picture of what I'm trying to implement:
For debugging purposes, both a.dll and b.dll (and c.dll and d.dll etc.) each create a log file upon initialization (a.txt, b.txt etc.). Now lib.dll, in many of its functions / methods, prints out some information to the caller's log file. One of the obstacles is, the process is not created by me (namely, all ?.dll files are plug-ins) so I can't have the main application handle anything.

Thus, a typical scenario would be this:
1. The main app calls one of b.dll's many functions.
2. b.dll calls one of lib.dll's many functions, say, calc(), to perform some mathematical calculation.
3. The calc() function now prints out to b.txt: "Calc result: 0.99052" .

Determining the file for routing the text output from within lib.dll is the tough part. I can, of course, add an additional param (the file handle) to ALL calls in a.dll, b.dll etc., and to ALL function headers in lib.dll, but this is absurd.

Please help!

(Note: *why* I use this method for debugging is not the
Avatar of gil_mo

ASKER

oops:
(Note: *why* I use this method for debugging is not the debate here).
:)
If I correctly understood you,

???.dlls call function in lib.dll and this function must know which dll called it, right? Or, at least, this function must know some unique id (file handle) associated with the dll-caller.

Most safe and portable way is the adding of additional param.

There is also another way.

Here is the idea. When ?.dll calls function in lib.dll, it places the return address on the stack. This address is in the address space of ?.dll. It is possible to get the handle of the ?.dll using this address.

Next, lib.dll calls, say, GetProcAddress(hCallerDll,"GetFileHandle"). Then it calls GetFileHandle() - that's all.

To make task more simpler, functions in lib.dll must not be class members and must use _cdecl or _stdcall convention.

If you want details, I can give the to you for 400 pts+grade A.

I should note that my code will work on Intel x86 processors only.

> Determining the file for routing the text output
 > from within lib.dll is the tough part.

Yeah, it is: you haven't explained those rules to us.

Are you saying that you want LIB.DLL to write to A.TXT when A.DLL calls its interfaces, and B.TXT when B.DLL calls its interfaces?

..B ekiM
Avatar of gil_mo

ASKER

Mike,
Indeed, and I gave an example in my previous comment (see the "typical scenario" paragraph. b.dll serves as an example, and you can replace b.dll/b.txt with any x.dll/x.txt .
> and you can replace b.dll/b.txt with any x.dll/x.txt .

OK. That bit wasn't clear to me.

You can't have what you want. Win32 doesn't implement call-context specific storage. In fact, I know of no operating system that does, except Win16. There, you could get the SS you had (before you reinitailized it) and learn what module owned you. Even there, that only told you which EXE called you--not which DLL.

If you call into a function, the function doesn't know who called it. You can go digging on the stack, and then try to find the stack frame for the return address, but that's going to be very expensive at runtime. It also means that you'll have to build debug information into your executables.

Per-instance variables are marked in your DLL to be per instance of the DLL's mapping into an EXE. The one-name rule in C++ means that you can't have value dependent on locality.

MFC suffers from this same lack, by the way. It's desireable for MFC to have state information for any executable _and_ any DLL implemented with MFC. But when a cross-module call happens, there's no automatic way to know what the state is. If you examine the state management code in MFC, you can see that we ended-up using explicit macros to invoke objects that have stateful constructors and destructors. Then, to query information about the current state, we'd call into functions which would find the active state block and return the information requested.

I think your only viable solution is to implement a similar mechanism. For call-context state, I think you're going to have to implement something that logs the known state at each function in a stack. Then, walk the stack backward to find out what the most recent calling module of interest is.

..B ekiM
Hmm.  Since you have control over all the modules in question, you might consider adding a chkstk routine or a profiling prolog. That can get you a hook to the call of each function, where you might manage some state data.

..B ekiM
<<You can go digging on the stack, and then try to find the stack frame for the return address, but that's going to be very expensive at runtime. It also means that you'll have to build debug information into your executables. >>

Wrong.... Sorry, IMHO...

You can get return address as fast as 1 assembler instruction. W/o debug info.

Here is the small test program

http://skyscraper.fortunecity.com/gigo/311/dlltest.zip

lib.dll contains func() which prints file name of the module which called func().

func() contain only 7 lines, including call to GetModuleFileName() and printf();

main.exe calls a.dll, b.dll. c.dll x.dll and func().

a...x.dll call func();

func() correctly prints names of all modules.

The source code of func() I offer as answer. It is plain C, portable on all Intel CPUs.
func() retrieves the caller's module handle using  2 statements only! I think it will be fast enough at run-time...
Avatar of gil_mo

ASKER

Mike,

Win32 *does* support a call-context specific storage for TLSs! That means that there IS such a mechanism that is managed by the system on a thread-level basis. Won't it be possible to implement such a mechanism on a DLL-level?

About your suggestions: would they require touching every x.dll call to the lib.dll, meaning that hundreds of calls to lib.dll in many dlls have to be modified in some way? And, specifically how could this be implemented?

NickRepin,
I understand that you're aiming at having your pic up on the left :)

I guarantee to gladly hand the offered points to whoever gives a satisfactory answer; you are, however, obscuring it for some reason.
I also suspect that your solution does not cover the cases where there are nested calls, e.g.:
a.dll calls CalcAll() in lib.dll,
lib.dll's CalcAll() calls CalcSum() and CalcDiv(), each need to access a.txt ...
The nesting may be up to any level.
gil_mo> storage for TLSsstorage for TLSs

Thread-local storage is just that--thread local. You can't use thread local storage to decide who called you _unless_ you can guarantee that you're always making calls from a given module on one thread.

If you can make such a guarantee, then you can use TLS.  My understanding was that any thread could call something in any DLL, and at any time that implementation can call some other implementation.  

That is, you're saying that if you have this set of calls:

This is the simple case:

      Main.EXE calls A.DLL:Foo()
      A.DLL:Foo() calls A.DLL:Print()
      Print prints A.DLL

But it's my understanding that you want to trace the call stack:

      Main.EXE calls A.DLL:Foo()
      A.DLL:Foo() calls B.DLL:Print()
      Print prints A.DLL

and

      Main.EXE calls A.DLL:Foo()
      A.DLL:Foo() calls B.DLL:Bar()
      B.DLL:Bar() calls C.DLL:Bang()
      C.DLL:Bang() calls C.DLL:Print()
      Print prints A.DLL

Or am I still misunderstanding your request?

 nick_repin> func() retrieves the caller's module handle
 nick_repin> using 2 statements only! I think it
 nick_repin> will be fast enough at run-time...  

As you [should] know, the number of lines of code has no relationship  to runtime efficiency. If you've solved gil_mo's problem (the way I understand it), I'm quite eager to find out how you've done it in only two lines, though. (And I'm eager to figure out why LIB.DLL is more than 50K in size if it only has two lines of code in it! Looks like it's also got the C runtimes statically linked, but that still does not seem enough for 50K.)

Sure, you can get the return address for your own function--just look at [ebp+4] (if you've got thiscall or _stdcall).  But how to you chain to more nested calls? What if they're not of the same calling convention?

..B ekiM

At first, nested calls are something new uncovered here. May be, while we continue to talk here the new conditions will appear?

To mikeblas: sure, I know [all].
If I said 2 lines to get caller's module handle, then it's really only two lines. And size of dll doesn't matter in this case. I offer not dll, but source code. And source code will be translated to 30-50 bytes may be.

I just wanted to object against your statements that it is slow, requires debug info, etc, etc.

Both lines of my code don't use C run-time library. One line is the C statment (not assembler) translated into 2-4 assembler instructions, the second one is the WinAPI call. Result - the caller's module handle. I mentioned above that function must be declared as __stdcall of __cdecl.
There is also 3rd line - local variable. It costs absolutely nothing at runtime.
I should note that my code doesn't depend on ebp. It is portable on all x86 compilers. If you know, MS VC can, for example, address parameters using esp and not ebp.

To gil_mo. I don't want to uncover my solution until I'm not sure that you will give points to me.

Without nested calls, my solution works fine. With nested calls, you have to add the new parameter to the nested calls. There is no other way.

At least, with my solution, you have to modify only nested functions in lib.dll. No modifications required for ?.dlls.



Ok, I have solution for nested calls.
Summary:

1) Following calls allowed:
main.exe -> ... -> ?.dll -> lib.dll.Func1() -> lib.dll.Func2() -> lib.dll.Func3() ->...
main.exe -> ... -> ?.dll -> lib.dll.Func2() -> .....
main.exe -> ... -> ?.dll -> lib.dll.....() -> .....

Ie, lib.dll makes nested calls only to functions located in lib.dll.

2) All functions located in lib.dll have at least one parameter and __stdcall or __cdecl calling convention.

3) Any function in lib.dll can get the module handle of the ?.dll (ie dll which is called lib.dll) by using my solution, doesn't matter whether nested call or not.

4) Solution is portable on all compilers, x86 CPUs, multithreaded environment and is written on plain C.

5) To detect caller's, all functions inside lib.dll must have following line:
lib.dll.Func??? ()
{
   GET_CALLER_INFO();
   ... code...
}

GET_CALLER_INFO is a macro that calls function GetCallerInfo()

6) Here is the GetCallerInfo template:

global  gvar1;
global  gvar2;
GetCallerInfo(param)
{
   local var1;
   WinAPI_Call();

   local var2;
   if(gvar1==var1)
     var2=gvar2;
   else
      var2=gvar2=var1;

   // Var2 is the caller's module handle.
   // For test purpose, print name.            
   char buf[MAX_PATH+1];
   GetModuleFileName(hMod,buf,MAX_PATH);
   cout<<buf<<endl;
}


Overhead:

GET_CALLER_INFO macro expands to ~15 bytes - just call to GetCallerInfo. It is possible to declare GetCallerInfo() with __forcedinline and we will save some execution time.

GetCallerInfo() doesn't use C run-time lib and (w/o calls to GetModuleFileName() and cout<<) takes only <80 bytes and makes 1 call to Windows API.  If lib.dll will be used in signle-thread environment only, size of the code will be less.

Of course, it all takes some not significant execution time. But the only solution with minimum overhead is the additional parameter for all [hundreds] functions inside lib.dll.

Well, so the all above is the answer to gil_mo's question (of course if there will no additional requirements :) ).
I'll place the source code here for 500 points+grade A.
That's all.

Sorry, I increased price just because I spent too much my expensive time on this question.

Also it seems that this Q is really hard - my opponent worked for 7 years on Microsoft, while I'm just a self-proclaimed expert.
Avatar of gil_mo

ASKER

Mike, possibilities are:

1. Main.exe calls lib.dll:Print() (denoted Main.exe->lib.dll:Print()), Print prints to lib.txt.
2. Main.exe -> x.dll:Foo() -> lib.dll:Print(), Print prints to x.txt (x can stand for any of a, b, c, …)
3. Main.exe -> x:dll:Foo() -> lib.dll:Calc() -> lib:Output() -> lib.dll:Print(), Print prints to x.txt .
4. x.dll will *not* call another ?.dll .

Time consuming is really not the issue since this is merely for debug versions and all additions will be "#ifdef _DEBUG"'d .

NickRepin > "nested calls are something new uncovered here."

Nick, my initial Q was about having a context-dependant global variable. All the rest are exemplifications for usage of this global. As I understand, implementing a TLS-like mechanism on a DLL level basis is unfeasible; thus your solution would be fine.

Regarding your conditions:
1. You did not mention the direct main.exe -> lib.dll:Func() but I assume that's included. Indeed, lib.dll makes nested calls only within lib.dll .
2. "All functions [must?] have at least one param etc." : highly inconvenient but can be arranged (this makes it hard to remove for release versions; why is it necessary?). Convension calls are as required.
5. Mandatory call to GET_CALLER_INFO is fine.

Now, a 500A is fine with me; since I only have 80 points left in my deposit, I can give you the additional 200A via another account and a dummy Q. For settling this please contact me at gil@kswaves.com .
I'm not sure whether the dummy account is ethical or not.

Ok,

1) main.exe -> lib.dll:Func()
included

2) All functions MUST have at least
one param. You'll see why.

To avoid modification of x.dlls,
you can change declarations from
   int a()
to
   int a(int dummy=0)



Details few mins later.
380 pts + grade A will be enough.

// lib.dll
#include <windows.h>
#include <iostream.h>

HINSTANCE hDll;
//--------------------------------------BOOL WINAPI DllMain(HINSTANCE hinstDLL,DWORD fdwReason,LPVOID)
{
   if(fdwReason==DLL_PROCESS_ATTACH)
      hDll=hinstDLL;
   return TRUE;
}
//--------------------------------------#define GET_CALLER_INFO(parm) GetCallerInfo(PVOID(*(LPDWORD(&parm)-1)));
__declspec(thread) HINSTANCE hCaller;
//--------------------------------------void __stdcall GetCallerInfo(PVOID retAddr)
{
   MEMORY_BASIC_INFORMATION mbi;
   VirtualQuery(retAddr,&mbi,sizeof(mbi));

   HINSTANCE hMod;
   if(PVOID(hDll)==mbi.AllocationBase) {
      // Nested call.
      hMod=hCaller;
   }
   else {
      // We are call from another module.
      hMod=hCaller=HINSTANCE(mbi.AllocationBase);
   }

   char buf[MAX_PATH+1];
   GetModuleFileName(hMod,buf,MAX_PATH);
   cout<<buf<<" ("<<retAddr<<")"<<endl;
}
//--------------------------------------__declspec(dllexport) void __cdecl func3(char parm,...)
{
   cout<<"Func3() ";
   GET_CALLER_INFO(parm);  // Have to specify 1st parameter name.
}
//--------------------------------------__declspec(dllexport) void __cdecl func2(double parm1,float parm2)
{
   cout<<"Func2() ";
   GET_CALLER_INFO(parm1); ;  // Have to specify 1st parameter name.

   func3('a');
}
//--------------------------------------__declspec(dllexport) void __cdecl func(DWORD parm)
{
   cout<<"Func() ";
   GET_CALLER_INFO(parm); ;  // Have to specify 1st parameter name.

   func2(0,1);
}

ASKER CERTIFIED SOLUTION
Avatar of NickRepin
NickRepin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gil_mo

ASKER

<Rejection for increasing the pts>
Avatar of gil_mo

ASKER

Well done, Nick!

Two main problems:

1. For a nested func() in lib.dll, one has to place a GET_CALLER_INFO in *every* lib function calling func().

2. Having to specify the first param as GET_CALLER_INFO's argument makes it less natural to use. Isn't the return address accessible using anything else but a first parameter?
> my opponent worked for 7 years on Microsoft,

Wow! Do you think a prickish little comment like that is really appropriate? First, I'm not your "opponent". Second, no matter how long someone works someplace, they're only exposed to some subset of technology. (Believe it or not, in almost eight years, I've scraped by without writing a VDD for Win32! _Astounding_, isn't it?)

OTOH, the question is pretty difficult. Calling VirtualQuery() is a great idea, and I didn't think of that. I'm still don't see how it works for more than one level of calls; but since gil_mo accepted your answer, I'm now not sure I assumed the same requirements for "nested calls" that gil_mo had.

Of course, I wasn't trying to get the calling module name--the question was about "storage", which to me meant that an arbitrary data block needed to be associated with each calling context. That is, that more, arbitrary information besides the module name was necessary. But, again, if just having the module name is good enough for gil_mo's purpose, that's great.

 gil_mo> this makes it hard to remove for release versions; why is it necessary?).

It's not so hard, is it? It's not so clean, but maybe you can fold this same idea into a macro that calls your debug-output function(s).

#ifdef _DEBUG
#define DEBUG_MARKER_PARAM  float f
#define DEBUG_MARKER  0.0
#pragma warning(disable:4011)      // er, is that the right one?
#else
#define DEBUG_MARKER_PARAM
#define DEBUG_MARKER
#endif

//NOTE: no comma here!
void myOtherFunc(DEBUG_MARKER_PARAM int n, int x)
{
}

void myFunc(DEBUG_MARKER_PARAM)
{
   // ... code here ...
}

void main()
{
   int n, x;
   myFunc(DEBUG_MARKER);

   //NOTE: no comma here!
   myOtherFunc(DEBUG_MARKER  n, x);
}

..B ekiM
mikeblas, I'm really sorry for my stupid comment about you. I did not intend to offend you. Sorry...

<<I'm still don't see how it works for more than one level of calls>>
It works if nested calls are inside lib.dll only. We check caller's module handle. If it is different from the lib.dll's one, then the call is made outside and we store the caller's handle in the special var (hCaller). If both handles are equal, we just take value of hCaller.

Regarding "storage". Of course, there is no such thing as "DLL local storage". And we try to simulate it for our particular case. Module handle is a must.
Suppose, caller dll (x.dll) must share file handle.

------
//x.dll
HANDLE hFile;
HANDLE GetFile() { return hFile; }

// lib.dll
.... Get caller's handle ...
gfn=GetProcAddress(hCaller,"GetFile");
hFile=gfn();

-----

To be continued....
<<For a nested func() in lib.dll, one has to place a GET_CALLER_INFO in *every* lib function calling func()>>

If nested funcs are not to be called from outside, then the following is possible:

__declspec(thread) HANDLE hLogFile;
GetCallerInfo() {
   VirtualQuery(...)
   hMod=mbi.AllocationBase;
   hLogFile=.......
}

FunctionWhichIsCalledFromAnotherDll()
{
   GET_CALLER_INFO()
   Use hLogFile as log handle
   ... nested calls here ...
}

SubroutineWhichIsCalledFromThisDllOnly()
{
   Use hLogFile as log handle
   ... nested calls here ...
}

Ie, top-level function assigns value for hLogFile, all subroutines then use this variable (it is safe, because var has "thread" attribute).


<<Having to specify the first param as GET_CALLER_INFO's argument makes it less natural to use. Isn't the return address accessible using anything else but a first parameter>>

It is the most portable and safe way.
It is not necessary to add special dummy parameter if function already has at least one param. You can use just ordinary 1st parameter of the function.

Well, I have one more idea. Have to check it...





Option A.
---------

You have to insert following pragma to the top of lib.cpp:

#pragma optimize("y",off)

and change macro:

#define GET_CALLER_INFO()  \
   PVOID _temp_;         \
   __asm {mov eax,[ebp+4]}  \
   __asm {mov _temp_,eax}  \
   GetCallerInfo(_temp_);


It seems  Option B is not safe.
Well, for debug purpose Opt A will be enough.



*******
mikeblas, my apologies again for my stupid comment.

*******
> mikeblas, my apologies again for my stupid comment

Thanks.

In the meantime, have you thought of interviewing at MS?

..B ekiM
Avatar of gil_mo

ASKER

Nick,
Great macro to solve the dummy param nuisance! I wish I could grant you some more points, but I'm broke ;)
I suggest to add curly braces at the beginning and end of the block replacing GET_CALLER_INFO; No need for parens after GET_CALLER_INFO; and, no spaces allowed after the escape chars!

NickRepin: <<If nested funcs are not to be called from outside>>
.... I can't have this constraint.

Mike,
You are right about the idea of having a storage space other than the module handle; That was my original wish. But since this seems unfeasible, the module handle serves just fine and can be used to access larger portions of data in the module.

Well, any further improvements are welcome!
You mean work? It's interesting... Yes, I'm looking for a job. If it is something serious, could you contact me at nick@earthspeak.net ?
The comment above is to Mike.

To gil_mo:
<<I suggest to add curly braces at the beginning and end of the block replacing ... >> 

Now it is your problem :)

I'm afraid, it's impossible to improve the idea any more...

I should note that macro with assembler code may affect compiler's optimization. It's ok in debug version, but may be unacceptable in release one.