Link to home
Start Free TrialLog in
Avatar of bleech677
bleech677

asked on

ASP and App Pool Crashes in IIS 6

environment : win2k3 server iis 6 with asp and sql 2005

We've been having these app pool crashes for the longest time. We've tried enabling debugging and using debugging tools for windows (MS).  The furthest we've gotten is a message telling us that the heap was corrupted - but nothing useful. This is not isolated to a single asp page. We've created several diffrent app pools for the websites and it has help a bit but it drives me nuts to see these crashes

It is not a hardware problem as it happens on 5+ diffrent servers that are in diffrent locations in the US. We've also dug through the code to make sure it is clean - and it is as far as we can tell.

Has anyone ever gotten to the bottom of an app pool crash?
// Samples for the error messages
 
Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'GAB' terminated unexpectedly. The process id was '4468'. The process exit code was '0xc0000005'.
Device: DFW-WEB2
Category: Server
Error Condition: Critical
Generated at: May 18,2009 11:58:28 AM
 
Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'ICS' terminated unexpectedly. The process id was '4972'. The process exit code was '0xc0000005'.
Device: DFW-WEB2
Category: Server
Error Condition: Critical
Generated at: May 18,2009 11:27:29 AM
 
Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'Demos.Vii' terminated unexpectedly. The process id was '36576'. The process exit code was '0xc0000005'.
Device: CAS-WS01
Category: Server
Error Condition: Critical
Generated at: May 13,2009 10:01:58 AM
 
Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'Integrity' terminated unexpectedly. The process id was '5540'. The process exit code was '0xc0000005'.
Device: DFW-WEB1
Category: Server
Error Condition: Critical
Generated at: Apr 21,2009 10:34:54 AM

Open in new window

Avatar of meverest
meverest
Flag of Australia image

Hi,

these sorts of errors can be caused by a wide variety of events - making it very hard to provide a one-answer-suits-all kind of advice.

So rather than try to write a few paragraphs of detail, allow me to direct you to a discussion that treats this matter very well indeed.  And probably better than I can too! ;-)

http://blogs.msdn.com/david.wang/archive/2005/08/29/HOWTO_Understand_and_Diagnose_an_AppPool_Crash.aspx

Cheers.
Avatar of bleech677
bleech677

ASKER

Yea, I propbobly should have mentioned that David Wang article is where I started - thats what got me to the heap corrupted message. Its a little scary because the MS software that we enabled that was supposed to diagnose the problem ended up taking down the server after 2 days

I'm looking for maybe someone who has diagnosed and fixed a similar problem in the past -
Just read further in the post ... apparently the accepted solution did not resolve the problem ... might be worth trying though to see if it helps or makes any difference.  If it does, then this might help narrow down where to go to identify the cause of the problem.
Honestly I think it is the ActivePDF software we use to output PDF - but they have been no help to us
I looked at the link - None of these web servers are DCs

Anyone ever been able to trace a problem at the heap level?

http://windowsitpro.com/article/articleid/22275/heap-corruption-part-1.html
Is this embedded in the app or is this a different web app on the same server?
If it is a different app on the same server then try creating a new application pool for it.  This believe this will isolate its execution and memory allocation from other applications in different app pools.  If this is the offending app then this should resolve the problem for your other apps.
Active pdf is a 3rd party software - it is invoked with server.createobject() - Its not a web application itself so I can't isolate it to its own app pool. I have however, given all these web apps their own pools - the worst offender for these crashes makes the heaviest use of the Activepdf objects
I don't know if this will help but you could try looking at the example code on the following page and see if maybe there is something you are missing in your code ... http://www.activepdf.com/support/knowledgebase/viewKB.cfm?tk=kb&id=10543

A mis-referenced variable or something that is not implemented per their methodology MIGHT cause a problem.  Changing the order that the code processes objects in and does things in could make a different given the description of the heap error that was described in the article you referenced.
Hi,

>> was supposed to diagnose the problem ended up taking down the server after 2 days

you should never run diagnostic services on a production web server! :-o

at least not for two days anyway ;-)

is this tool debugdiag?  If you look through the crash dump output, you should be able to soon see if it is aspPdf - which I consider relatively likely.  Sure looks suspicious the way you have described it.

I guess the real issue then is what to do about it - if the vendor won't fix it or support it properly, it may be a good idea to seek alternatives.

Also, to be sure of it, consider using a load test tool to hammer the aspPdf object if you can, and see how often it dies.  take a look at the WAST (web application stress tester)

http://www.microsoft.com/downloads/details.aspx?familyid=e2c0585a-062a-439e-a67d-75a89aa36495&displaylang=en

Cheers.
meverest - I will look into it. I know it was a bad idea to to put this tool on the live server but IT decided to go with it but luckiliy convinced them the put it on 1 out of 2 servers in the load balance / failover config

here is one of the crash minidumps provessed by windbg : at this point I'm trying to figure out how to use / interpret the info:

*****************************************
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(12f8.1d4c): Access violation - code c0000005 (first/second chance not available)
eax=7767c30c ebx=00080000 ecx=00000004 edx=3d1b001f esi=7767c33c edi=7767c334
eip=7c82a0d0 esp=0447f7d8 ebp=0447f9f4 iopl=0         nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010216
ntdll!RtlAllocateHeap+0x1f5:
7c82a0d0 884706          mov     byte ptr [edi+6],al        ds:0023:7767c33a=69
0:038> ! analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: kernel32!pNlsUserInfo                         ***
***                                                                   ***
*************************************************************************
*************************************************************************
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: kernel32!pNlsUserInfo                         ***
***                                                                   ***
*************************************************************************

FAULTING_IP:
ntdll!RtlAllocateHeap+1f5
7c82a0d0 884706          mov     byte ptr [edi+6],al

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 7c82a0d0 (ntdll!RtlAllocateHeap+0x000001f5)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000001
   Parameter[1]: 7767c33a
Attempt to write to address 7767c33a

DEFAULT_BUCKET_ID:  HEAP_CORRUPTION

PROCESS_NAME:  w3wp.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

WRITE_ADDRESS:  7767c33a

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

ADDITIONAL_DEBUG_TEXT:  Enable Pageheap/AutoVerifer

FAULTING_THREAD:  00001d4c

PRIMARY_PROBLEM_CLASS:  HEAP_CORRUPTION

BUGCHECK_STR:  APPLICATION_FAULT_HEAP_CORRUPTION

LAST_CONTROL_TRANSFER:  from 776bcfce to 7c82a0d0

STACK_TEXT:  
0447f9f4 776bcfce 00080000 00000000 00000004 ntdll!RtlAllocateHeap+0x1f5
0447fa08 776bcf3b 77796784 00000004 0447fa48 ole32!CRetailMalloc_Alloc+0x16
0447fa18 4a7160a7 00000004 00125b08 77d045b0 ole32!CoTaskMemAlloc+0x13
0447fa48 4a7ac2f1 00000004 00125b08 00125b44 comsvcs!SafeMalloc+0x12
0447fa80 4a7950a9 00000001 00000000 0447fabc comsvcs!Array<IContextNotify *>::setSize+0x77
0447fa90 4a7997c0 0447faac 0447fadc 0447fb0c comsvcs!Array<IContextNotify *>::append+0x12
0447fabc 4a75579e 00125b08 00000000 709e0009 comsvcs!CUserProps::SetProperty+0xb6
0447faec 709ea549 00000001 000bfcbc 709e0009 comsvcs!CContext::SetProperty+0x81
0447fb24 709ea486 000bfcbc 70a33008 021a1f28 asp!CViperActivity::BindToThread+0x54
0447fb40 709e26f3 01ee1e90 025a20c8 025a2270 asp!ViperAttachIntrinsicsToContext+0x61
0447fb98 709e244a 00000000 00000000 0012ee90 asp!CHitObj::ViperAsyncCallback+0x30e
0447fbb4 4a77b5ea 02222098 0008bcb0 0447fd74 asp!CViperAsyncRequest::OnCall+0x92
0447fbd0 77720d30 0012ee90 000d52d8 00000000 comsvcs!CSTAActivityWork::STAActivityWorkHelper+0x32
0447fc1c 777217dc 00000000 000d52d8 4a77b5b8 ole32!EnterForCallback+0xc4
0447fd7c 776f03b4 0447fc54 4a77b5b8 0012ee90 ole32!SwitchForCallback+0x1a3
0447fda8 7769c194 000d52d8 4a77b5b8 0012ee90 ole32!PerformCallback+0x54
0447fe40 7772433a 0008bcb0 4a77b5b8 0012ee90 ole32!CObjectContext::InternalContextCallback+0x159
0447fe60 4a77b78c 0008bcb0 4a77b5b8 0012ee90 ole32!CObjectContext::DoCallback+0x1c
0447fecc 4a77bcf2 0010c218 0010c1f8 000e1a44 comsvcs!CSTAActivityWork::DoWork+0x12d
0447fee4 4a77c7de 0012ee90 00000001 0010c1f8 comsvcs!CSTAThread::DoWork+0x18
0447ff04 4a77cabf 00000000 018f2460 019d6cb8 comsvcs!CSTAThread::ProcessQueueWork+0x37
0447ff84 77bcb530 0010c1f8 00000000 00000000 comsvcs!CSTAThread::WorkerLoop+0x190
0447ffb8 77e64829 019d6cb8 00000000 00000000 msvcrt!_endthreadex+0xa3
0447ffec 00000000 77bcb4bc 019d6cb8 00000000 kernel32!BaseThreadStart+0x34


SYMBOL_NAME:  heap_corruption!heap_corruption

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: heap_corruption

IMAGE_NAME:  heap_corruption

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  ~38s; .ecxr ; kb

FAILURE_BUCKET_ID:  HEAP_CORRUPTION_c0000005_heap_corruption!heap_corruption

BUCKET_ID:  APPLICATION_FAULT_HEAP_CORRUPTION_heap_corruption!heap_corruption

Followup: MachineOwner
---------

HI,

OK - the first obvious detail provided there is that the fault is thrown by ntdll.dll (or ntdll.exe) which is a system component that deals with APIs like hardware drivers etc.  You can see it in two places:  at the very top of the dump ("FAULTING_IP:") and also at the top of the process stack listing.  The process stack simply lists the first process run (at the bottom) followed by all the modules and system calls triggered by the initial process, in order.  You will see that there are a couple of calls using COM object as well as a couple of database functions (oledb)

The problem is apparently thrown during a memory allocation function (ntdll!RtlAllocateHeap+1f5).  The error detail says that it is a memory corruption issue - that could be caused by faulty hardware (ram) but more likley caused by some process improperly writing to unallocated ram, perhaps a memory leak or buffer overflow.

Since a pdf module is often just a virtual printer driver, this still *could* be caused by the aspPdf module you suspect, but unfortunately unless the name of that binary shows up in the dump list, we can neither 'confirm nor deny' at this stage.

Probably your best bet way forward now is to try to confirm your 'suspicions' - try to hammer a script that does pdf creation using the WAST tool, maming sure that you generate pdfoutput of various sizes (including very large and very small).

If at all possible, try migrating just one of your apps to an alternative product, and make sure it is in a unique app pool - then if that pool stops showing up that error, then you have more clues to work with.

Unfortunately, there is no simple answer in these situations.

Cheers!


Heap errors are always a running application, so I agree it is likely the component you suspect .

You mentioned load balancing.  If this component is storing state in memory and you are flipping sessions between servers that could be creating a lot of problems.

Have you tried setting affinity for the load balancing so a session returns to the same server?  That might fix the problem.
tedbilly: the load balancer is a smart balancer type - once it direct you to a server behind it you are at that server for duration of the session.

meverest: yours has been the most helpful post thus far - I have ruled out hardware as the culprit. We actually switch to this pdf software from another, the other software had similar session loss problems - so maybe its not the pdf software after all.

Does anyone think they could guide me in analyzing the crash dump? I would create question as another concurrent question and credit you with both parts as an answer - I really think this question warrents more than 2000 points. I am far from an expert in this area but I do have a CompSci Degree and with that knowledge of data structures and OS concepts.

I have some other crash dumps with similar output an a few others with pointer errors - this kind of makes me wonder about what is flawed with windows memory managment...

I've used the debug tool before and had good results, but this time it is a heap corruption issue and a lot tougher to get to the bottom of it. Apparently though, these things are solvable

Thanks
ASKER CERTIFIED SOLUTION
Avatar of meverest
meverest
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you gentlemen - Managment did not respond to our request for MS to figure this out for us - I guess we will just have to live with it.

These kinds of things just flare up my minor OCD condition ;)