Link to home
Start Free TrialLog in
Avatar of Robert Ener
Robert Ener

asked on

HP server reboots randomly

I have a client that has a HP server running server 2003 sp2.  The last couple of days it has started to reboot randomly and I have not been able to pin point the problem.  I have tried to look thru event logs and I finally came accross some that mentioned anything about a reboot, but this is the first time I have worked on a HP server and its talking about Automatic Server Recovery.  I have included the event log info below:

The system has rebooted from a Automatic Server Recovery (ASR) event.
 
User Action
Determine the nature of the Automatic Server Recovery (ASR) event, and take corrective action.
 
WBEM Indication Properties
AlertingElementFormat: 0 0 (Unknown)
AlertType: 5 0x5 (Device Alert)
Description: "The system has rebooted from a Automatic Server Recovery (ASR) event."
EventCategory: 16 0x10 (System Power)
EventID: "1"
ImpactedDomain: 4 0x4 (System)
IndicationIdentifier: "{CE107FEA-511F-4CF2-8F5F-15225481014E}"
IndicationTime: "20120711083732.586000-300"
NetworkAddresses[0]: "192.168.100.13"
OSType: 69 0x45 (Microsoft Windows Server 2003)
OSVersion: "5.2.3790"
PerceivedSeverity: 5 0x5 (Major)
ProbableCause: 111 0x6f (Timeout)
ProbableCauseDescription: "ASR Reboot Occurred"
ProviderName: "HP Recovery"
RecommendedActions[0]: "Determine the nature of the Automatic Server Recovery (ASR) event, and take corrective action."
Summary: "ASR reboot occurred"
SystemCreationClassName: "HP_WinComputerSystem"
SystemFirmwareVersion[0]: "2009.07.10"
SystemFirmwareVersion[1]: "2008.11.01"
SystemGUID: "30303734-3436-5355-4539-32344E374433"
SystemModel: "ProLiant ML370 G5"
SystemName: "server.LawOfficeofJorgeRangel.local"
SystemProductID: "470064-774"
SystemSerialNumber: "USE924N7D3"
TIME_CREATED: 129864874881019004 0x1cd5f6a68156c7c

Any help would be great
ASKER CERTIFIED SOLUTION
Avatar of jgerbasi
jgerbasi
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My first thought would be this server is 6 years old and needs to be replaced.  At least the motherboard, have you run the Diags?
Latest BIOS is P%& July 2001 can be found here:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3279719&prodTypeId=15351&prodSeriesId=1121474&swLang=8&taskId=135&swEnvOID=1005#12212
also there is a critical update for the Smart Array controller
Avatar of Member_2_231077
Member_2_231077

ASR simply restarts the server if it is hung (you can disable it in BIOS) so you need to find out why it is hanging. Is it on a UPS, if so is it a pure sine wave UPS and does it have dual PSUs in the server, if so are they connected to the same power source?
ILO driver and firmware will most likely be the ones to upgrade if possible

Also check this post:
https://www.experts-exchange.com/questions/24123630/HP-ML350-Automatic-Server-Reboot-ASR-troubleshooting-0x2D.html
Do you have a raid 5 volume and if so do you have a backup battery in place? We had a similar issue with a 350 G6, with 3 disks on the onboard 410i Smartarray controller in raid5. Seems that this is only supported when a battery is inplace. Server ran perfectly for 6 months, then suddenly started to shut down and restart.

KG
I find that one hard to believe, the cache battery is only used when power is lost. More chance that it just helped a few components survive a momentary dip in power with the real culprit being the mains feed if fitting a battery really fixed your problem KG.
I totally agree. However, this is the conclusion HP made inspecting the various logs we had to send them. And as a matter of fact when you start to look for info on the HP site, there is some conflicting information on the commercial pages for this server. It is mentioned that the 410i supports Raid 5, yet another document describes that raid 5 is only supported when the backupbattery is in place.  Very odd indeed.

Nevertheless HP found nothing was wrong with the hardware as is, and since the battery had a long delivery time, we replaced the complete server and  put the disks in. It is now running for 2 months (again) with no problem. The problem server is  running with battery and 3 disks in raid 5 in our test bench, and has not faulted either. Of course this server is not worked on, yet when it had the problems it would fail in the middle of the night, also with workload, not even the backup.

So, as Holmes said: if you have dismissed every other possibility, the remaining one, how hard to believe it is as such, must be the cause.

I also had my suspicion for MoBo or Drivecage backplane, but they are both running without problems.

Oh, and the UPS is a Smart-UPS 1500. No outtage what so ever...

KG
If you have another UPS, I would try a switch.  I had  a user with a HP xw4600 workstation that just shutdown randomly everty day or two, no errors whatsoever.  Ran all diags all passed.
Lights flickered and he went down again so we switched smartups with no "change battery" or any other warning.  He hasn't gone down again for over a week.
Avatar of Robert Ener

ASKER

I will let y'all know what happens, I have it scheduled next week