Link to home
Start Free TrialLog in
Avatar of gunn
gunn

asked on

Obtaining stack information at a program crash?

We have a Win32 application (no MFC) that runs on Win9x/NT4. Its built using MSVC++6 mainly on a NT4 SP5 machine.

Every once in a while, a user will complain about a program crash to desktop. We currently have a signal handler function, but all it does is give a "SIGSEGV() Program Termination error" to them before it goes away. I'm trying to find a better way to present them more information, which then hopefully they can relay to us, the developers, so its easier to debug and fix here.

Is there any functions available that would help us? I was browsing through functions like StackWalk() and AfxDumpStack (<-- MFC though). I don't know how to implement them though, and maybe their not the best solution.

Any ideas, sample code?

Thanks
Avatar of PinTail
PinTail
Flag of United Kingdom of Great Britain and Northern Ireland image

Whats wrong with the stack trace provided by Dr Watson ?
ASKER CERTIFIED SOLUTION
Avatar of jkr
jkr
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gunn
gunn

ASKER

What's Dr. Watson? No one has ever mentioned that, and I've never seen it...?!?

I'll give this code a try once I'm back at work in a couple weeks ;)
Avatar of gunn

ASKER

How would I call this function from my signal_handler() function?

DbgIntelStackWalk   (   HANDLE  hProcess,   PDBGTHRD    pDbgThrd)

I see you pass in the handle of the process (doable), but I'm not sure what to pass in as the 2nd argument...your structure that you also included....or what to fill this structure exactly with.

Thanks for that code btw, looks like it could be very useful .
Well, the process handle can be obtained via 'GetCurrentProcess()'. The problem is indeed the thread whose stack you want to examine. There are basically two options:

1. Start another thread that performs the stack unwinding. In this case, the struct has to be populated with the primary thread ID and its handle.

2. Do it from your primary thread. Here, the calls to 'SuspendThread()' and 'ResumeThread()' should be omitted, and the struct members can be set with 'GetCurrentThread()' and 'GetCurrentThreadId()'
Avatar of gunn

ASKER

jkr,

I've gotten back to this at work now and have a couple of more questions.

After thinking about this, and if I recall correctly, when my signal_handler() function gets called after a program termination error, I'm thinking that it changes the stack (from the point at what it crashed). Maybe it doesn't do this, but maybe so? In any case, what I'm after is when the program crashes, a stack view that shows where the crash occured.

So will this require a seperate thread? I'm not too familiar with how to code a separate thread....
Have you bothered to look at The information on Dr. Watson ??
Avatar of gunn

ASKER

I tried your link, couldn't really find anything useful from the entries that were on the page. Still have never used it, or seen it. This code snippet sounds like the way to go if possible. Then also, I won't have to worry about someone elses machine configuration whether they have Dr. Watson installed or running or not.

Can you maybe point me to a better page that discusses exactly what it is that it does? Thanks.
>>After thinking about this, and if I
>>recall correctly, when my
>>signal_handler() function gets called
>>after a program termination error,
>>I'm thinking that it changes the
>>stack (from the point at what it
>>crashed).

How do you set the 'signal_handler()'?
Avatar of gunn

ASKER

It is set by a call to 'signal(SIGABRT, signal_handler);'  There are a few of these that trap certain signals. So anytime one of these happens it calls a function to display a fatal error warning and then quit.

I tested this and it looks like when it jumps to this signal_handler, the stack gets wiped out from what caused the signal in the first place. So thats not gonna do me ;(

I need a way to call these stack function when the error occurs so a user can tell me *exactly* what caused it.hmmmmm.
In order to achieve this, you'll have to set an exception handler (via 'SetUnhandledExceptionFilter()'). This filter receives a pointer to a 'EXCEPTION_POINTERS' struct that holds a pointer to an appropriate 'CONTEXT' record to unwind the stack. This information is also available through 'GetExceptionInformation()' (from your signal handler)
I would still like to know what is wrong with using Dr. Watson postmortem debugger.  It ships as standard on Windows NT and Win9X.  

I have validated the link that I posted to you previously; it should return a whole list of related documents at Microsofts Web Site.

Dr. Watson is a very well known tool used by Windows developers since Win3.X
Avatar of gunn

ASKER

jkr,
Could you show me how to simply set the 'SetUnhandledExceptionFilter()' so it will compile? I can't quite figure it out( new to me).

Thanks,

Tom
Hmm, what difficulties are you experiencing?

LONG WINAPI __XceptFilter   (   EXCEPTION_POINTERS* pExp)
{
// ...
}

// ...

SetUnhandledExceptionFilter ( __XceptFilter);
Avatar of gunn

ASKER

duh. the 'WINAPI' was my problem. I was getting this "can't convert parameter 1...blah, blah, blah" error where it mentioned __cdecl and __stdcall.

I've never had to use them (or the WINAPI) before, so it hung me up for a bit.

Thanks for the quick response! I'm gonna work on this and see if this'll do what I want.
Avatar of gunn

ASKER

Ok, I'm slowly moving along ;)

Trying to implement your above stack walk function; where is GetLogicalAddress() defined? It does not appear in my MSDN Library help. I don't know the header or library it is contained in.

I'm assuming I'm going to have to install a component of the Windows SDK...

It's from a MSJ column by Jeffrey Richter:

//==============================================================================
// Given a linear address, locates the module, section, and offset containing  
// that address.                                                              
//                                                                            
// Note: the szModule paramater buffer is an output buffer of length specified
// by the len parameter (in characters!)                                      
//==============================================================================
BOOL GetLogicalAddress(
        PVOID addr, PTSTR szModule, DWORD len, DWORD& section, DWORD& offset ){
    MEMORY_BASIC_INFORMATION mbi;
    if ( !VirtualQuery( addr, &mbi, sizeof(mbi) ) )        return FALSE;
    DWORD hMod = (DWORD)mbi.AllocationBase;
      if      (      !hMod)      return      (      FALSE);
    if ( !GetModuleFileName( (HMODULE)hMod, szModule, len ) )
        return FALSE;    // Point to the DOS header in memory
    PIMAGE_DOS_HEADER pDosHdr = (PIMAGE_DOS_HEADER)hMod;
    // From the DOS header, find the NT (PE) header
    PIMAGE_NT_HEADERS pNtHdr = (PIMAGE_NT_HEADERS)(hMod + pDosHdr->e_lfanew);
    PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION( pNtHdr );
    DWORD rva = (DWORD)addr - hMod; // RVA is offset from module load address
    // Iterate through the section table, looking for the one that encompasses
    // the linear address.    
      for (   unsigned i = 0;
            i < pNtHdr->FileHeader.NumberOfSections;
            i++, pSection++ )    {
        DWORD sectionStart = pSection->VirtualAddress;
        DWORD sectionEnd = sectionStart
                    + max(pSection->SizeOfRawData, pSection->Misc.VirtualSize);
        // Is the address in this section???
        if ( (rva >= sectionStart) && (rva <= sectionEnd) )        {
            // Yes, address is in the section.  Calculate section and offset,
            // and store in the "section" & "offset" params, which were
            // passed by reference.
                  section = i+1;
            offset = rva - sectionStart;            return TRUE;        }    }
    return FALSE;   // Should never get here!
}

(Excuse the formattig ;-)
Avatar of gunn

ASKER

Ahhhhh, so thats where its at :)

Ok, so compiled and linked. Ran it. I found that my process handle (hProcess) isn't getting saved when I call your DbgIntelStackWalk() function. Meaning, that when I use hProcess = GetCurrentProcess() in the exception filter (that I set using SetUnhandledExceptionFilter() ), it is valid. But when this handle is passed to the DbgIntelStackWalk(), inside of it, the handle is now -1!??? So then, the call to SymGetSymFromAddr() fails. I get an error code in GetLastError() of 6, invalid handle.

{
hProcess = GetCurrentProcess();

DbgIntelStackWalk( hProcess, ... );
}

ULONG DbgIntelStackWalk( HANDLE hProcess, ... ) {

 < hProcess is now -1!>
}

How can this be!?
 
'GetCurrentProcess()' returns a pseudohandle, so it won't work - use 'GetCurrentProcessId()' and 'OpenProcess()' to retrieve a 'real' handle...
Avatar of gunn

ASKER

Actually, even if I call hProcess = GetCurrentProcess() in the very top of my main() routine, it returns -1 (0xffffffff). GetLastError then returns 0 though. Not sure what is going on.
Avatar of gunn

ASKER

Ahh, ok. I just tried GetCurrentProcessId() and it returned valid, so I started looking into what you just suggested. I'll give that a go.

Slow, but chugging along...thanks again.
As I said (from the docs):

"A pseudohandle is a special constant that is interpreted as the current process handle. The calling process can use this handle to specify its own process whenever a process handle is required. Pseudohandles are not inherited by child processes.

This handle has the maximum possible access to the process object. For systems that support security descriptors, this is the maximum access allowed by the security descriptor for the calling process. For systems that do not support security descriptors, this is PROCESS_ALL_ACCESS. For more information, see Process Objects.

A process can create a “real” handle to itself that is valid in the context of other processes, or that can be inherited by other processes, by specifying the pseudohandle as the source handle in a call to the DuplicateHandle function. A process can also use the OpenProcess function to open a real handle to itself."
Avatar of gunn

ASKER

I'm using the GetCurrentProcessId() and OpenProcess() which seems to be giving me a valid handle back that I then pass into the stack unwinding function. (Usually the HANDLE = 88 as int)

But, the call to SymGetSymFromAddr() is still not working; GetLastError() is still returning 6 (Invalid Handle). Arrghhhh.

Any ideas now? My sample program is very small, so I'll post if necessary.
Avatar of gunn

ASKER

Here is the sample program; hope it copies in right.

// Exception.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <stdio.h>
#include <windows.h>

#include <imagehlp.h>

typedef struct  _tagDBGTHRD
{
    DWORD                   dwThreadId;
    HANDLE                  hThread;
    struct  _tagDBGTHRD*    pNext;

}   DBGTHRD,    *PDBGTHRD;


ULONG UnwindStack( HANDLE hProcess, PCONTEXT ctx, PDBGTHRD dbgThread );
BOOL GetLogicalAddress( PVOID addr, PTSTR szModule, DWORD len, DWORD& section, DWORD& offset );
LONG WINAPI XceptFilter(  EXCEPTION_POINTERS *pExp );

int main(int argc, char* argv[])
{
  PTOP_LEVEL_EXCEPTION_FILTER filter;
  char *ps=NULL;

  filter = SetUnhandledExceptionFilter( XceptFilter );

  // Create an exception to test out code
  strlen( ps );

  printf( "We are continuing after exception\n" );

  return 0;
}


// The exception filter called when something bad happens
LONG WINAPI XceptFilter( EXCEPTION_POINTERS *pExp )
{
  PDBGTHRD pDbgThread;

  DWORD code  = pExp->ExceptionRecord->ExceptionCode;
  DWORD flags = pExp->ExceptionRecord->ExceptionFlags;
  PVOID addr  = pExp->ExceptionRecord->ExceptionAddress;
  DWORD parms = pExp->ExceptionRecord->NumberParameters;

  // Print out some info
  printf( "Hit the exception!!\nCode is %x\nFlag is %x\nAt address %x\n", code, flags, addr );
  printf( "Number of parameters: %d\n", parms );

  // Get a handle to the current process
  DWORD pID       = GetCurrentProcessId();
  HANDLE hProcess = OpenProcess( PROCESS_ALL_ACCESS, TRUE, pID );

  // Print out handle to process for debugging
  printf( "hProcess in XceptFilter = %d\n", hProcess );

  pDbgThread               = (PDBGTHRD)malloc( sizeof(PDBGTHRD) );
  pDbgThread->dwThreadId = GetCurrentThreadId();
  pDbgThread->hThread    = GetCurrentThread();  

  // Go and display the stack.
  UnwindStack( hProcess, pExp->ContextRecord, pDbgThread );

/* possible return values. EXCEPTION_CONTINUE_SEARCH=0, EXCEPTION_CONTINUE_EXECUTION=-1
   EXCEPTION_EXECUTE_HANDLER=1 */
    return 0;
}

ULONG UnwindStack( HANDLE hProcess, PCONTEXT pCtx, PDBGTHRD pDbgThrd )
{
    STACKFRAME          sf;
    DWORD               dwDisplacement;
    PIMAGEHLP_SYMBOL    pimgSym;
    int                  nErr;

    printf( "\nStack trace:\n" );

    // Print out handle to make sure its the same...
    printf( "hProcess in UnwindStack = %d\n", hProcess );

    pimgSym               = (PIMAGEHLP_SYMBOL) malloc( sizeof(IMAGEHLP_SYMBOL) + 256 );    
    pimgSym->SizeOfStruct  = sizeof(IMAGEHLP_SYMBOL) + 256;
    pimgSym->MaxNameLength = 256;

    ZeroMemory( &sf,  sizeof(STACKFRAME) );

    sf.AddrPC.Offset    =   pCtx->Eip;
    sf.AddrPC.Mode      =   AddrModeFlat;
    sf.AddrStack.Offset =   pCtx->Esp;
    sf.AddrStack.Mode   =   AddrModeFlat;
    sf.AddrFrame.Offset =   pCtx->Ebp;
    sf.AddrFrame.Mode   =   AddrModeFlat;

    // Walk through the stack, displaying info
    while( TRUE ) {
      if( !StackWalk( IMAGE_FILE_MACHINE_I386,
          hProcess,
          pDbgThrd->hThread,
          &sf,
          pCtx,
          0,
          SymFunctionTableAccess,
          SymGetModuleBase,
          0 ) )  
            break;
      
      if( 0 == sf.AddrFrame.Offset )   // Basic sanity check to make sure
          break;                       // the frame is OK.  Bail if not.
      
      printf( "ThreadID: %u ThreadHandle:0x%x AddrPC.Offset: %08X  AddrFrame.Offset: %08X  ",  
          pDbgThrd->dwThreadId,
          pDbgThrd->hThread,
          sf.AddrPC.Offset,
          sf.AddrFrame.Offset );

      SetLastError( 0 );
      printf( "\nProcess being passed in: %d\n", hProcess );

      if( !SymGetSymFromAddr( hProcess, sf.AddrPC.Offset, &dwDisplacement, pimgSym ) )  {
          char acModule[MAX_PATH] = "\0";
          DWORD dwSection          =   0;
          DWORD dwOffset          =   0;
          
          // Something happened; take alternative route.
          nErr = GetLastError();

          // Print error
          printf( "\nSymGetSymFromAdr() failed. nErr: %d\n", nErr );

          GetLogicalAddress( (PVOID) sf.AddrPC.Offset,
            acModule,
            MAX_PATH,
            dwSection,
            dwOffset
            );
          
          printf( "%04X:%08X %s\n", dwSection, dwOffset, acModule );
      }
      else
          // This is where we want to be.
          printf( "%hs+%X\n", pimgSym->Name, dwDisplacement);
    }
   
    free( pimgSym );
       
    printf( "trace complete.\n" );
   
    return( 0 );
}
You did call 'SymInitialize( hProcess, NULL, FALSE);', didn't you?