Gunwant Saini
asked on
IIS Pool Crashing
Hello Experts,
We are facing a peculiar problem related with IIS.
First let me explain the scenario.
We have hosted a ASP.Net site using .Net 4 on IIS 8 on Windows Server 2012 with SQL Server 2012 as DB.
The Website/Portal is hosted on 4 servers using a hardware load balancer.
Its a simple website asking for information from user and filling a form, upload some documents including photo/image and saved the same in DB as bytearray.
Its working properly, but sometimes, for past few days, we don't know, whether, its related with code or what, IIS App Pool is crashing automatically. We have verified the code and its working properly, but still we are unable to diagnose what/where exactly the problem is.
After restarting the IIS using iisreset or recycling the pool, resets the WebSite and then its working properly.
We have tried using debugdiag, windbg, but unable to diagnose/identify the root cause of the problem causing the crash.
The crash occurs randomly.
In local Development Server, the website is working properly, but on production machines, its causing the problem on random intervals.
Please help and suggest.
Thanks
We are facing a peculiar problem related with IIS.
First let me explain the scenario.
We have hosted a ASP.Net site using .Net 4 on IIS 8 on Windows Server 2012 with SQL Server 2012 as DB.
The Website/Portal is hosted on 4 servers using a hardware load balancer.
Its a simple website asking for information from user and filling a form, upload some documents including photo/image and saved the same in DB as bytearray.
Its working properly, but sometimes, for past few days, we don't know, whether, its related with code or what, IIS App Pool is crashing automatically. We have verified the code and its working properly, but still we are unable to diagnose what/where exactly the problem is.
After restarting the IIS using iisreset or recycling the pool, resets the WebSite and then its working properly.
We have tried using debugdiag, windbg, but unable to diagnose/identify the root cause of the problem causing the crash.
The crash occurs randomly.
In local Development Server, the website is working properly, but on production machines, its causing the problem on random intervals.
Please help and suggest.
Thanks
ASKER
Hello,
Actually the Website/portal is running in its own Unique Pool in InProc Mode by default.
I also viewed the Event Logs at the time of the crash and the entries logged in it are as
"A worker process '3324' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number." with EventID : 5138
And then this one.
"A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '3324'." with EventID : 5013
Actually, the Error Number in the Data field is also not present.
I also tried DebugDiag and Windbg, but unable to extract meaningful information for the crash,
I know, that the problem is in the Code, but unable to find out the problematic code for the crash.
Please help.
Actually the Website/portal is running in its own Unique Pool in InProc Mode by default.
I also viewed the Event Logs at the time of the crash and the entries logged in it are as
"A worker process '3324' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number." with EventID : 5138
And then this one.
"A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '3324'." with EventID : 5013
Actually, the Error Number in the Data field is also not present.
I also tried DebugDiag and Windbg, but unable to extract meaningful information for the crash,
I know, that the problem is in the Code, but unable to find out the problematic code for the crash.
Please help.
Hi
The errors are concerning the DefaultAppPool which is not best practice.
If this is by choice please check if the defaultapp is configured for using .NET4
(in IIS7 it is standardly configured to use .NET2.0)
Better practive would be to create a new application pool which is named after the website to recognition and troubleshooting.
For each application under this website you would create another app pool.
Example:
www.website.com would use app pool www.website.com created for .NET4
www.website.com\portal would use app pool www.website.com.portal created for .NET4
www.website.com\portal\V2 would use app pool www.website.com.portal.V2 created for .NET4
If you keep to this principle you see it becomes a lot easier to troubleshoot code-faillure.
BTW: How did you configure DebugDiag? crashonly or did you create rules?
The errors are concerning the DefaultAppPool which is not best practice.
If this is by choice please check if the defaultapp is configured for using .NET4
(in IIS7 it is standardly configured to use .NET2.0)
Better practive would be to create a new application pool which is named after the website to recognition and troubleshooting.
For each application under this website you would create another app pool.
Example:
www.website.com would use app pool www.website.com created for .NET4
www.website.com\portal would use app pool www.website.com.portal created for .NET4
www.website.com\portal\V2 would use app pool www.website.com.portal.V2 created for .NET4
If you keep to this principle you see it becomes a lot easier to troubleshoot code-faillure.
BTW: How did you configure DebugDiag? crashonly or did you create rules?
ASKER
Hi,
Correct, we have verified the same, its for .Net 4. and after the installation of the IIS on the Windows Server 2012, its by default is configured for .Net 4 as per my understanding.
We have created a separate application pool per website, but the problem is that, this "DefaultAppPool" pool is crashing frequently.
We have tried going through logs, but unable to identify the root cause.
Sometimes, the Pool gives a 503 error, sometimes, it gets hanged i.e. the site becomes unresponsive.
Thanks for the same.
Correct, we have verified the same, its for .Net 4. and after the installation of the IIS on the Windows Server 2012, its by default is configured for .Net 4 as per my understanding.
We have created a separate application pool per website, but the problem is that, this "DefaultAppPool" pool is crashing frequently.
We have tried going through logs, but unable to identify the root cause.
Sometimes, the Pool gives a 503 error, sometimes, it gets hanged i.e. the site becomes unresponsive.
Thanks for the same.
Hi again,
Ok, in IIS8 standard pools are created for .NET4, lesson learned.
So DefaultAppPool is bound to a website, in IIS-> application pools you can see which to website it is correct?
How did you configure DebugDiag? crashonly or did you create rules?
Ok, in IIS8 standard pools are created for .NET4, lesson learned.
So DefaultAppPool is bound to a website, in IIS-> application pools you can see which to website it is correct?
How did you configure DebugDiag? crashonly or did you create rules?
ASKER
Hi,
Only 1 Website is configured in the DefaultAppPool.
in DebugDiag, i created rules to identify the crashes due to which DebugDiag, created lots of files in the folder configured which results in nothing & small files of around 250MB, which i don't think include the or are the crash dumps related with IIS.
Please let us know, how to track IIS App Pool Crash/Hang/Site becomes unresponsive through DebugDiag, which will then result in the proper analysis of the problem.
Only 1 Website is configured in the DefaultAppPool.
in DebugDiag, i created rules to identify the crashes due to which DebugDiag, created lots of files in the folder configured which results in nothing & small files of around 250MB, which i don't think include the or are the crash dumps related with IIS.
Please let us know, how to track IIS App Pool Crash/Hang/Site becomes unresponsive through DebugDiag, which will then result in the proper analysis of the problem.
ASKER
Hi,
Also listing some of the Event Viewer Logs as logged on production servers.
1. A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '1064'. The data field contains the error number. : Event ID 5011
2. Application pool 'DefaultAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool. : Event ID 5002
3. A process serving application pool 'DefaultAppPool' failed to respond to a ping. The process id was '9120'. : Event ID 5010
4. A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '1880'. : Event ID 5013
5. A worker process '1880' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number. : Event ID 5138
Also listing some of the Event Viewer Logs as logged on production servers.
1. A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '1064'. The data field contains the error number. : Event ID 5011
2. Application pool 'DefaultAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool. : Event ID 5002
3. A process serving application pool 'DefaultAppPool' failed to respond to a ping. The process id was '9120'. : Event ID 5010
4. A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '1880'. : Event ID 5013
5. A worker process '1880' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number. : Event ID 5138
Hi
The steps Microsoft provides to create rules for- and create a dump file is documented in this Microsoft link.
The steps Microsoft provides to create rules for- and create a dump file is documented in this Microsoft link.
ASKER
Sir,
I have done all of that, gone through all the links but still don't know what exactly i am missing. I am unable to accurately identify the root cause related with the Pool Crash and Hang.
Please suggest.
I have done all of that, gone through all the links but still don't know what exactly i am missing. I am unable to accurately identify the root cause related with the Pool Crash and Hang.
Please suggest.
ASKER
Sir,
Please let me know, why the issue resolution/identification is so complex related with IIS, because now a days, its crashing frequently. I am unable to proceed for resolving the issue.
Please help.
Please let me know, why the issue resolution/identification is so complex related with IIS, because now a days, its crashing frequently. I am unable to proceed for resolving the issue.
Please help.
Hi again,
I know troubleshooting application pools can be hard, therefor it is best practice not to use the DefaultAppPool but create a unique application pool for each website and applications below it. This way you can examine the logs more accurate.
So find out which site/application is using the DefaultAppPool and give them a unique pool.
I know troubleshooting application pools can be hard, therefor it is best practice not to use the DefaultAppPool but create a unique application pool for each website and applications below it. This way you can examine the logs more accurate.
So find out which site/application is using the DefaultAppPool and give them a unique pool.
ASKER
Sir,
Please believe me, i have done that too, Also attached is the screenshot for your reference.
AppPool.png
Please believe me, i have done that too, Also attached is the screenshot for your reference.
AppPool.png
Hi
Under DefaultAppPool it says number is 1, which application is it bound to?
Under DefaultAppPool it says number is 1, which application is it bound to?
ASKER
Sir,
The one which is crashing, where the problem is & the one which i am trying to debug for the Pool Crash.
Thanks for your time.
The one which is crashing, where the problem is & the one which i am trying to debug for the Pool Crash.
Thanks for your time.
Hi
I had to re-read this thread and now i wonder, does this happen on all 4 servers or just one?
I had to re-read this thread and now i wonder, does this happen on all 4 servers or just one?
ASKER
Sir,
all of them, randomly, all of a sudden, on Server1, Server2, Server3, or Server4.
Then we either Recycle the App Pool or execute iisreset, which will then get the site running normally.
all of them, randomly, all of a sudden, on Server1, Server2, Server3, or Server4.
Then we either Recycle the App Pool or execute iisreset, which will then get the site running normally.
Hi again,
Ok so it is a application issue for sure, can you export some errors from event viewer around the last time this happenend and post here?
If you are able i would also see the IIS website logs around that time to check wether it might be one special call that is failing.
Offcourse you could obfuscate private info from the logs.
Ok so it is a application issue for sure, can you export some errors from event viewer around the last time this happenend and post here?
If you are able i would also see the IIS website logs around that time to check wether it might be one special call that is failing.
Offcourse you could obfuscate private info from the logs.
ASKER
Sir,
The logs from the Event Viewer are already posted above. There's no private info, there, i am desperately dying to resolve this issue due to which i am unable to resolve the issue which may or may not occurring in Code.
here are some of the logs as
1. A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '1064'. The data field contains the error number. : Event ID 5011
2. Application pool 'DefaultAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool. : Event ID 5002
3. A process serving application pool 'DefaultAppPool' failed to respond to a ping. The process id was '9120'. : Event ID 5010
4. A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '1880'. : Event ID 5013
5. A worker process '1880' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number. : Event ID 5138
The logs from the Event Viewer are already posted above. There's no private info, there, i am desperately dying to resolve this issue due to which i am unable to resolve the issue which may or may not occurring in Code.
here are some of the logs as
1. A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '1064'. The data field contains the error number. : Event ID 5011
2. Application pool 'DefaultAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool. : Event ID 5002
3. A process serving application pool 'DefaultAppPool' failed to respond to a ping. The process id was '9120'. : Event ID 5010
4. A process serving application pool 'DefaultAppPool' exceeded time limits during shut down. The process id was '1880'. : Event ID 5013
5. A worker process '1880' serving application pool 'DefaultAppPool' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number. : Event ID 5138
Ok,
One thing you can try is ADplus. How to get and how to work with is described here.
Also, are you able to show the outcome from windbg?
One thing you can try is ADplus. How to get and how to work with is described here.
Also, are you able to show the outcome from windbg?
ASKER
Sir,
i too tried that and installed everything & after all taking all the steps, i have posted here, actually, what happens, sometimes, the whole of IIS goes down, the pool is working but the site is not responding, i.e. it becomes unresponsive, then ADPlus results into nothing. Though, it created dumps, but those does not provide information related to the CLR Stack etc.
i too tried that and installed everything & after all taking all the steps, i have posted here, actually, what happens, sometimes, the whole of IIS goes down, the pool is working but the site is not responding, i.e. it becomes unresponsive, then ADPlus results into nothing. Though, it created dumps, but those does not provide information related to the CLR Stack etc.
So if you provide those dumps to windbg, isnt there any lead?
Next, are there signs of memory leaks through the error logs? (beside from what you posted here?)
Next, are there signs of memory leaks through the error logs? (beside from what you posted here?)
ASKER
Hi,
Currently, i deleted the file, where i have written the call stack from the memory dumps, so , i don't have that. i have to regenerate the same.
BTW, how to identify the Memory leaks through error logs, i have gone through Google, but unable to come across the identification of memory leaks.
Please suggest.
Currently, i deleted the file, where i have written the call stack from the memory dumps, so , i don't have that. i have to regenerate the same.
BTW, how to identify the Memory leaks through error logs, i have gone through Google, but unable to come across the identification of memory leaks.
Please suggest.
Hi
You would filter on type of service like e.g. W3SVC and find entries like this
Event Type: Information
Event Source: W3SVC
Event Category: None
Event ID: 1077
Date: Date
Time: Time
User: N/A
Computer: ComputerName
Description:
A worker process with process id of '1234' serving application pool 'DefaultAppPool' has requested a recycle because it reached its virtual memory limit.
Did you have your website validated? Perhaps this link to validator.w3.org can help if there is a Obvious mistake within the code.
You would filter on type of service like e.g. W3SVC and find entries like this
Event Type: Information
Event Source: W3SVC
Event Category: None
Event ID: 1077
Date: Date
Time: Time
User: N/A
Computer: ComputerName
Description:
A worker process with process id of '1234' serving application pool 'DefaultAppPool' has requested a recycle because it reached its virtual memory limit.
Did you have your website validated? Perhaps this link to validator.w3.org can help if there is a Obvious mistake within the code.
ASKER
Sir,
I have verified the Event ID 1077 from Event Viewer on one of the servers, there is no such log and even the text "its virtual memory limit" too is not logged.
What it signifies?
Though, we have not validated the site, but the issues are not cropping up on development and test server.
Thanks
I have verified the Event ID 1077 from Event Viewer on one of the servers, there is no such log and even the text "its virtual memory limit" too is not logged.
What it signifies?
Though, we have not validated the site, but the issues are not cropping up on development and test server.
Thanks
Hi
No memory leak is good. It points me into the direction of wicked SQL queries since test is not showing any signs of bad coding it is also not heavenly tested or did you actually performed a real life like load test? ( i guess not so you havent seen DOS due to bad SQL queries)
In other words, if one query take 2 seconds to execute and it is kicked off 3x per second, what will eventually happen??
Two things you can do, Have a tool like WAPT simulate the same ammount of users you serve in production (one one server that is) in the test enviroment and monitor its SQL behaviour (run a real live trace with activity monitor) *pretty sure it will break*
Second, monitor your productiondatabase and look for execution plans which are way to expensive and also monitor SQL server's activity monitor.
For me the (for now) conclusion is you have a website which under load creates a DOS attack on your own database under which it cracks.
No memory leak is good. It points me into the direction of wicked SQL queries since test is not showing any signs of bad coding it is also not heavenly tested or did you actually performed a real life like load test? ( i guess not so you havent seen DOS due to bad SQL queries)
In other words, if one query take 2 seconds to execute and it is kicked off 3x per second, what will eventually happen??
Two things you can do, Have a tool like WAPT simulate the same ammount of users you serve in production (one one server that is) in the test enviroment and monitor its SQL behaviour (run a real live trace with activity monitor) *pretty sure it will break*
Second, monitor your productiondatabase and look for execution plans which are way to expensive and also monitor SQL server's activity monitor.
For me the (for now) conclusion is you have a website which under load creates a DOS attack on your own database under which it cracks.
ASKER
Hello Sir,
Agree for the past 2/3 days, we are spending time related with optimizing the SQL Queries, and we have improved the performance. After that, don't know, all of of a sudden, the crashes are occurring less as compared to previously.
But still, i have tried certain steps using cdb.exe to generate IIS Hang dumps, though the dumps are created, but, then how to understand/identify the block of code, where the problem is.
I have opened the dumps through windbg & all the dumps are referring to ntdll.dll file, where the crash/issue is occurring. Now i am confused, as whether the problem is in IIS or in ntdll.dll i.e. Windows OS.
Thanks for your time.
Agree for the past 2/3 days, we are spending time related with optimizing the SQL Queries, and we have improved the performance. After that, don't know, all of of a sudden, the crashes are occurring less as compared to previously.
But still, i have tried certain steps using cdb.exe to generate IIS Hang dumps, though the dumps are created, but, then how to understand/identify the block of code, where the problem is.
I have opened the dumps through windbg & all the dumps are referring to ntdll.dll file, where the crash/issue is occurring. Now i am confused, as whether the problem is in IIS or in ntdll.dll i.e. Windows OS.
Thanks for your time.
Hi again,
So after profiling performance becomes quicker and more steady, you are on the right trail.
About ntdll.dll i tend to not to worry as it is a user dll which tend to generate a lot of false positives. In cases it was correctly complaining i noticed out of memory errors.
Summing up, if IIS is generating to much noise on the database then there is one way, profile profile profile until you sanitised all roque queries.
If you now monitor SQL usage in activity monitor it will show you the most expensive queries from which you can determine what application is doing this.
So after profiling performance becomes quicker and more steady, you are on the right trail.
About ntdll.dll i tend to not to worry as it is a user dll which tend to generate a lot of false positives. In cases it was correctly complaining i noticed out of memory errors.
Summing up, if IIS is generating to much noise on the database then there is one way, profile profile profile until you sanitised all roque queries.
If you now monitor SQL usage in activity monitor it will show you the most expensive queries from which you can determine what application is doing this.
ASKER
Hello Sir,
BTW, please let me know, the steps which we can follow to exactly identify the cause of the problem, even if, we think, that the queries executing are being optimized on DB Server.
How to identify what is slowing down the response of the pages on IIS even after applying all the configurations
I mean, how to identify the function, Call Stack, why the page is slow? etc.
I am done with DebugDiag, windbg, ADPlus as it requires its own expertise.
Please suggest.
BTW, please let me know, the steps which we can follow to exactly identify the cause of the problem, even if, we think, that the queries executing are being optimized on DB Server.
How to identify what is slowing down the response of the pages on IIS even after applying all the configurations
I mean, how to identify the function, Call Stack, why the page is slow? etc.
I am done with DebugDiag, windbg, ADPlus as it requires its own expertise.
Please suggest.
My gut feeling is that issue is not at your IIS side.. your SQL transactions are taking longer time. Please check SQL logs, locks etc.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
We have used various methods as code complexity, rectifying the logic to resolve the Pool Crash. Its also somewhat related to DB.
Did you creatie Unique application pools or all running the default?