Solved

DL380 Random Reboot after Win2003 SP2 is Applied

Posted on 2009-04-07
5
627 Views
Last Modified: 2012-05-06
Almost every morning 3 of my DL380 servers reboot. This didnt start until after we applied Win 2003 SP2.  I have other DL380's that are rock solid that are still running only SP1.  There is little to no information in the Event log on the servers before the reboots happen.  (In Fact I dont think they are rebooting safely, but rather just blue screening and comming back up)  The only thing I see in the Event log is "The previous system shutdown at 5:27:59 AM on 4/7/2009 was unexpected." EventID: 6008.
I have this happening on 3 servers since the apply of SP2.  One of them has upgraded CPU and memory so i dont think this is hardware related.
Ive also tried the "Disabling the RSS and TCP Offload" workaround for the NICS that some people have suggested and that didnt resolve the problem.  Of the 3 servers all of them have the most recent Bios firmware, one is running the newest HP Network Config Utility and the other two are using NCU 7.  Two of the servers are 2.8 ghz single CPU's and ther 3rd is a 3.06Ghz.  All of the servers have 4GB of memory but only one has newly replaced memory.
All of these servers are production systems and I need this resolved asap.  I would rather not rollback Win2003 SP2 because I dont know how badly its going to screw up these boxes.  I know this is a widely known issue and that HP isnt going to give me much support.  Does anyone have any suggestions for a workaround here?  Please help.
0
Comment
Question by:Infinityinfo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 2

Expert Comment

by:potva03
ID: 24090968
Which generation of servers are these and which version of windows is installed

its always recommended to update the PSPs and Firmware before updating Service packs of OS
is it rebooting instataneously
any ASR or  post error

Try updating the PSPs and Firmware to the latest version as supported by HP

we can try updating the firmware and if that do not resolve the issue then roll back the SP2,  install the PSPs and then the drivers... dont install the latest PSP... install 8.15

0
 
LVL 4

Accepted Solution

by:
madzanta earned 500 total points
ID: 24104960
I would suggest you to begin with updating your servers using PSP (Proliant Support Pack).
When you have done this you could deactivate ASR and activate full memory dump.
Then the next time your server crashes it will generate a dump file which can be very useful
when troubleshooting BSOD's.


Upgrade server using PSP
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=316529&prodNameId=3288130&swEnvOID=1005&swLang=8&mode=2&taskId=135&swItem=MTX-e397839de9eb40508728fb40ff 

Deactivate ASR (automated system recovery) from BIOS and/or HP SMHP.

Make sure your pagefile is large enough for the dump (the ammount of ram you have +1MB)
Right click my computer -> Properties -> Advanced -> performance Settings -> Advanced -> virtual memory Change -> Enter desired value (MS recommendation (1.5 times your ram) would be fine here)

Activate full dump
Right click my computer -> Properties -> Advanced -> startup and recovery Settings -> under "Write debugging information" choose Complete memory dump

Download and install debugging tools from microsoft
http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx


Now - The next time your server BSOD's it will generate a dump file called memory.dmp which will be placed in your windows directory.
So after the crash, restart your server and launch WinDbg (Windows debugging tools which you installed earlier).
now go to File -> Symbol file path and enter srv*c:\temp\symbols*http://msdl.microsoft.com/download/symbols
Then go to File -> Open crash dump and when it asks you type !analyze -v

When the analyze is done you can look for PROCESS_NAME and/or IMAGE_NAME and see what it says there.
Hopefully you will find the exe to some software or some driver you are using. Now you can start
working with the software or hardware that might be causing your problems.


Good luck and I hope it helps.
0
 
LVL 3

Expert Comment

by:SimonL-UK
ID: 24179371
There is a known issue with smart array 5 / 6 drivers used in conjunction with the Microsoft storport driver.
You need to update the smart array driver (available from HP) and the storport driver from Microsoft (http://support.microsoft.com/kb/932755)

HTH
0
 
LVL 1

Author Comment

by:Infinityinfo
ID: 24193639
Thanks so much for the help and suggestions fellas.  I am in process of trying all of these recomendations.  I have this happening on 3 of my DL380 G3 Servers and after updating the PSP it seems to be resolved but I will wait a couple more weeks before I close this thread because of what others have experienced in regard to this issue.  Some have said it some times takes several days for the servers to start rebooting again so i just want to make sure I have the issue nailed down.  Again, Thanks so much.
0
 
LVL 1

Author Closing Comment

by:Infinityinfo
ID: 31567685
Thanks so much for the help.  I believe we are out of the woods.  The servers havent rebooted in almost a month now and updating the PSP, Firmware and drivers for nearly everything on those DL380's seemed to resolve the issue.  Much appreciated.
0

Featured Post

Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

It’s been over a month into 2017, and there is already a sophisticated Gmail phishing email making it rounds. New techniques and tactics, have given hackers a way to authentically impersonate your contacts.How it Works The attack works by targeti…
I was prompted to write this article after the recent World-Wide Ransomware outbreak. For years now, System Administrators around the world have used the excuse of "Waiting a Bit" before applying Security Patch Updates. This type of reasoning to me …
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
In this video, we discuss why the need for additional vertical screen space has become more important in recent years, namely, due to the transition in the marketplace of 4x3 computer screens to 16x9 and 16x10 screens (so-called widescreen format). …

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question