Solved

ASP and App Pool Crashes in IIS 6

Posted on 2009-05-18
18
1,839 Views
Last Modified: 2013-11-05
environment : win2k3 server iis 6 with asp and sql 2005

We've been having these app pool crashes for the longest time. We've tried enabling debugging and using debugging tools for windows (MS).  The furthest we've gotten is a message telling us that the heap was corrupted - but nothing useful. This is not isolated to a single asp page. We've created several diffrent app pools for the websites and it has help a bit but it drives me nuts to see these crashes

It is not a hardware problem as it happens on 5+ diffrent servers that are in diffrent locations in the US. We've also dug through the code to make sure it is clean - and it is as far as we can tell.

Has anyone ever gotten to the bottom of an app pool crash?
// Samples for the error messages
 

Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'GAB' terminated unexpectedly. The process id was '4468'. The process exit code was '0xc0000005'.

Device: DFW-WEB2

Category: Server

Error Condition: Critical

Generated at: May 18,2009 11:58:28 AM
 

Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'ICS' terminated unexpectedly. The process id was '4972'. The process exit code was '0xc0000005'.

Device: DFW-WEB2

Category: Server

Error Condition: Critical

Generated at: May 18,2009 11:27:29 AM
 

Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'Demos.Vii' terminated unexpectedly. The process id was '36576'. The process exit code was '0xc0000005'.

Device: CAS-WS01

Category: Server

Error Condition: Critical

Generated at: May 13,2009 10:01:58 AM
 

Message: ID=1009 Source=W3SVC Type=2 Message=A process serving application pool 'Integrity' terminated unexpectedly. The process id was '5540'. The process exit code was '0xc0000005'.

Device: DFW-WEB1

Category: Server

Error Condition: Critical

Generated at: Apr 21,2009 10:34:54 AM

Open in new window

0
Comment
Question by:bleech677
  • 8
  • 4
  • 4
  • +1
18 Comments
 
LVL 37

Expert Comment

by:meverest
Comment Utility
Hi,

these sorts of errors can be caused by a wide variety of events - making it very hard to provide a one-answer-suits-all kind of advice.

So rather than try to write a few paragraphs of detail, allow me to direct you to a discussion that treats this matter very well indeed.  And probably better than I can too! ;-)

http://blogs.msdn.com/david.wang/archive/2005/08/29/HOWTO_Understand_and_Diagnose_an_AppPool_Crash.aspx

Cheers.
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
Yea, I propbobly should have mentioned that David Wang article is where I started - thats what got me to the heap corrupted message. Its a little scary because the MS software that we enabled that was supposed to diagnose the problem ended up taking down the server after 2 days

I'm looking for maybe someone who has diagnosed and fixed a similar problem in the past -
0
 
LVL 22

Expert Comment

by:cj_1969
Comment Utility
0
 
LVL 22

Expert Comment

by:cj_1969
Comment Utility
Just read further in the post ... apparently the accepted solution did not resolve the problem ... might be worth trying though to see if it helps or makes any difference.  If it does, then this might help narrow down where to go to identify the cause of the problem.
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
Honestly I think it is the ActivePDF software we use to output PDF - but they have been no help to us
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
I looked at the link - None of these web servers are DCs

Anyone ever been able to trace a problem at the heap level?

http://windowsitpro.com/article/articleid/22275/heap-corruption-part-1.html
0
 
LVL 22

Expert Comment

by:cj_1969
Comment Utility
Is this embedded in the app or is this a different web app on the same server?
If it is a different app on the same server then try creating a new application pool for it.  This believe this will isolate its execution and memory allocation from other applications in different app pools.  If this is the offending app then this should resolve the problem for your other apps.
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
Active pdf is a 3rd party software - it is invoked with server.createobject() - Its not a web application itself so I can't isolate it to its own app pool. I have however, given all these web apps their own pools - the worst offender for these crashes makes the heaviest use of the Activepdf objects
0
 
LVL 22

Expert Comment

by:cj_1969
Comment Utility
I don't know if this will help but you could try looking at the example code on the following page and see if maybe there is something you are missing in your code ... http://www.activepdf.com/support/knowledgebase/viewKB.cfm?tk=kb&id=10543

A mis-referenced variable or something that is not implemented per their methodology MIGHT cause a problem.  Changing the order that the code processes objects in and does things in could make a different given the description of the heap error that was described in the article you referenced.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 37

Expert Comment

by:meverest
Comment Utility
Hi,

>> was supposed to diagnose the problem ended up taking down the server after 2 days

you should never run diagnostic services on a production web server! :-o

at least not for two days anyway ;-)

is this tool debugdiag?  If you look through the crash dump output, you should be able to soon see if it is aspPdf - which I consider relatively likely.  Sure looks suspicious the way you have described it.

I guess the real issue then is what to do about it - if the vendor won't fix it or support it properly, it may be a good idea to seek alternatives.

Also, to be sure of it, consider using a load test tool to hammer the aspPdf object if you can, and see how often it dies.  take a look at the WAST (web application stress tester)

http://www.microsoft.com/downloads/details.aspx?familyid=e2c0585a-062a-439e-a67d-75a89aa36495&displaylang=en

Cheers.
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
meverest - I will look into it. I know it was a bad idea to to put this tool on the live server but IT decided to go with it but luckiliy convinced them the put it on 1 out of 2 servers in the load balance / failover config

here is one of the crash minidumps provessed by windbg : at this point I'm trying to figure out how to use / interpret the info:

*****************************************
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(12f8.1d4c): Access violation - code c0000005 (first/second chance not available)
eax=7767c30c ebx=00080000 ecx=00000004 edx=3d1b001f esi=7767c33c edi=7767c334
eip=7c82a0d0 esp=0447f7d8 ebp=0447f9f4 iopl=0         nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010216
ntdll!RtlAllocateHeap+0x1f5:
7c82a0d0 884706          mov     byte ptr [edi+6],al        ds:0023:7767c33a=69
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
0:038> ! analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: kernel32!pNlsUserInfo                         ***
***                                                                   ***
*************************************************************************
*************************************************************************
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: kernel32!pNlsUserInfo                         ***
***                                                                   ***
*************************************************************************

FAULTING_IP:
ntdll!RtlAllocateHeap+1f5
7c82a0d0 884706          mov     byte ptr [edi+6],al

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 7c82a0d0 (ntdll!RtlAllocateHeap+0x000001f5)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000001
   Parameter[1]: 7767c33a
Attempt to write to address 7767c33a

DEFAULT_BUCKET_ID:  HEAP_CORRUPTION

PROCESS_NAME:  w3wp.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

WRITE_ADDRESS:  7767c33a

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

ADDITIONAL_DEBUG_TEXT:  Enable Pageheap/AutoVerifer

FAULTING_THREAD:  00001d4c

PRIMARY_PROBLEM_CLASS:  HEAP_CORRUPTION

BUGCHECK_STR:  APPLICATION_FAULT_HEAP_CORRUPTION

LAST_CONTROL_TRANSFER:  from 776bcfce to 7c82a0d0

STACK_TEXT:  
0447f9f4 776bcfce 00080000 00000000 00000004 ntdll!RtlAllocateHeap+0x1f5
0447fa08 776bcf3b 77796784 00000004 0447fa48 ole32!CRetailMalloc_Alloc+0x16
0447fa18 4a7160a7 00000004 00125b08 77d045b0 ole32!CoTaskMemAlloc+0x13
0447fa48 4a7ac2f1 00000004 00125b08 00125b44 comsvcs!SafeMalloc+0x12
0447fa80 4a7950a9 00000001 00000000 0447fabc comsvcs!Array<IContextNotify *>::setSize+0x77
0447fa90 4a7997c0 0447faac 0447fadc 0447fb0c comsvcs!Array<IContextNotify *>::append+0x12
0447fabc 4a75579e 00125b08 00000000 709e0009 comsvcs!CUserProps::SetProperty+0xb6
0447faec 709ea549 00000001 000bfcbc 709e0009 comsvcs!CContext::SetProperty+0x81
0447fb24 709ea486 000bfcbc 70a33008 021a1f28 asp!CViperActivity::BindToThread+0x54
0447fb40 709e26f3 01ee1e90 025a20c8 025a2270 asp!ViperAttachIntrinsicsToContext+0x61
0447fb98 709e244a 00000000 00000000 0012ee90 asp!CHitObj::ViperAsyncCallback+0x30e
0447fbb4 4a77b5ea 02222098 0008bcb0 0447fd74 asp!CViperAsyncRequest::OnCall+0x92
0447fbd0 77720d30 0012ee90 000d52d8 00000000 comsvcs!CSTAActivityWork::STAActivityWorkHelper+0x32
0447fc1c 777217dc 00000000 000d52d8 4a77b5b8 ole32!EnterForCallback+0xc4
0447fd7c 776f03b4 0447fc54 4a77b5b8 0012ee90 ole32!SwitchForCallback+0x1a3
0447fda8 7769c194 000d52d8 4a77b5b8 0012ee90 ole32!PerformCallback+0x54
0447fe40 7772433a 0008bcb0 4a77b5b8 0012ee90 ole32!CObjectContext::InternalContextCallback+0x159
0447fe60 4a77b78c 0008bcb0 4a77b5b8 0012ee90 ole32!CObjectContext::DoCallback+0x1c
0447fecc 4a77bcf2 0010c218 0010c1f8 000e1a44 comsvcs!CSTAActivityWork::DoWork+0x12d
0447fee4 4a77c7de 0012ee90 00000001 0010c1f8 comsvcs!CSTAThread::DoWork+0x18
0447ff04 4a77cabf 00000000 018f2460 019d6cb8 comsvcs!CSTAThread::ProcessQueueWork+0x37
0447ff84 77bcb530 0010c1f8 00000000 00000000 comsvcs!CSTAThread::WorkerLoop+0x190
0447ffb8 77e64829 019d6cb8 00000000 00000000 msvcrt!_endthreadex+0xa3
0447ffec 00000000 77bcb4bc 019d6cb8 00000000 kernel32!BaseThreadStart+0x34


SYMBOL_NAME:  heap_corruption!heap_corruption

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: heap_corruption

IMAGE_NAME:  heap_corruption

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  ~38s; .ecxr ; kb

FAILURE_BUCKET_ID:  HEAP_CORRUPTION_c0000005_heap_corruption!heap_corruption

BUCKET_ID:  APPLICATION_FAULT_HEAP_CORRUPTION_heap_corruption!heap_corruption

Followup: MachineOwner
---------

0
 
LVL 37

Expert Comment

by:meverest
Comment Utility
HI,

OK - the first obvious detail provided there is that the fault is thrown by ntdll.dll (or ntdll.exe) which is a system component that deals with APIs like hardware drivers etc.  You can see it in two places:  at the very top of the dump ("FAULTING_IP:") and also at the top of the process stack listing.  The process stack simply lists the first process run (at the bottom) followed by all the modules and system calls triggered by the initial process, in order.  You will see that there are a couple of calls using COM object as well as a couple of database functions (oledb)

The problem is apparently thrown during a memory allocation function (ntdll!RtlAllocateHeap+1f5).  The error detail says that it is a memory corruption issue - that could be caused by faulty hardware (ram) but more likley caused by some process improperly writing to unallocated ram, perhaps a memory leak or buffer overflow.

Since a pdf module is often just a virtual printer driver, this still *could* be caused by the aspPdf module you suspect, but unfortunately unless the name of that binary shows up in the dump list, we can neither 'confirm nor deny' at this stage.

Probably your best bet way forward now is to try to confirm your 'suspicions' - try to hammer a script that does pdf creation using the WAST tool, maming sure that you generate pdfoutput of various sizes (including very large and very small).

If at all possible, try migrating just one of your apps to an alternative product, and make sure it is in a unique app pool - then if that pool stops showing up that error, then you have more clues to work with.

Unfortunately, there is no simple answer in these situations.

Cheers!


0
 
LVL 51

Expert Comment

by:tedbilly
Comment Utility
Heap errors are always a running application, so I agree it is likely the component you suspect .

You mentioned load balancing.  If this component is storing state in memory and you are flipping sessions between servers that could be creating a lot of problems.

Have you tried setting affinity for the load balancing so a session returns to the same server?  That might fix the problem.
0
 
LVL 3

Author Comment

by:bleech677
Comment Utility
tedbilly: the load balancer is a smart balancer type - once it direct you to a server behind it you are at that server for duration of the session.

meverest: yours has been the most helpful post thus far - I have ruled out hardware as the culprit. We actually switch to this pdf software from another, the other software had similar session loss problems - so maybe its not the pdf software after all.

Does anyone think they could guide me in analyzing the crash dump? I would create question as another concurrent question and credit you with both parts as an answer - I really think this question warrents more than 2000 points. I am far from an expert in this area but I do have a CompSci Degree and with that knowledge of data structures and OS concepts.

I have some other crash dumps with similar output an a few others with pointer errors - this kind of makes me wonder about what is flawed with windows memory managment...

I've used the debug tool before and had good results, but this time it is a heap corruption issue and a lot tougher to get to the bottom of it. Apparently though, these things are solvable

Thanks
0
 
LVL 37

Accepted Solution

by:
meverest earned 400 total points
Comment Utility
g'day,

I don't think this is necessarily a flaw in windows MM - unless you consider supporting legacy systems is a flaw ;-)

I expect that it is pretty hard to keep a platform open to third party software titles without risking that they will try writing to null pointers or (worse) unitialised buffers etc.  That is most liley the cause of the kinds of errors you are seeing.

If you want to accurately track down the culprit, then you'll need to be doing some serious debugging so that when the crashdump report quotes a memory address location, then you'll at least be able to finger the process that did it.

And that's not something that you want to be doing on a production system either! ;-)

I suspect that this is something that you will have a hard time getting done by offering points in a public forum - perhaps you need to bite the bullet and engage the services of a professional services provider in this field.  If it's not worth spending money on, then it probably isn't that critical anyhow, I suppose.

Cheers!
0
 
LVL 51

Assisted Solution

by:tedbilly
tedbilly earned 100 total points
Comment Utility
I agree with meverest.  Your time is worth money and something like this can drag out for a long time.  The problem is that the crashdump is the symptom not the disease because the thread that is crashing is nested deep within the IIS application pool

It's like trying to figure out the make of car that ran over a squirrel on the road.

The only way I've managed to fix issues like this in the past is by subdividing the application into application pools (divide and conquer) and the process of elimination.  The one problem is if sessions are stored in memory, then you have to move the sessions to a IIS state service to then subdivide the application strategically.

If you divide and conquer you could find the offend page quite quickly.  For example, if you split 20 pages into two application pools, then if one 1/2 crashes you are down to 10 pages.  Subdivde that and you are down to five, subdivde again and you are down to 2 or 3 ...
0
 
LVL 3

Author Closing Comment

by:bleech677
Comment Utility
Thank you gentlemen - Managment did not respond to our request for MS to figure this out for us - I guess we will just have to live with it.

These kinds of things just flare up my minor OCD condition ;)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

After several hours of googling I could not gather any information on this topic. There are several ways of controlling the USB port connected to any storage device. The best example of that is by changing the registry value of "HKEY_LOCAL_MACHINE\S…
Microsoft has released remote PowerShell capabilities to all commercial Office 365 customers. So you can be controlled via PowerShell and not from the Office 365 admin center Download Windows PowerShell Module for Lync Online http://www.micros…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now