Link to home
Start Free TrialLog in
Avatar of joshcallahan1
joshcallahan1

asked on

HP Proliant possible HD failure

I have been having some problems with our server  (HP ProLiant ML570 G3 running SBS 2003) rebooting randomly at night and in the event of trying to pinpoint the issue I first noticed an error in the Event Viewer

Error  source: LSI_SCSI
The driver detected a controller error on \Device\RaidPort0.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

This led me to run HP Insight Diagnostics which is telling me that one of the two OS drives is failing in its RAID 1 array.  My questions are how can I be sure the drive is bad (before I spend money to have it replaced)? The Online LEDs are solid on both drives with the Activity LEDs flashing again on both drives.  How can I be sure the drive has failed?

Also is there a possibility this can be causing the random server restarts?  Is there anyway someone can point me to a particular log file or place where I can gather more info as to why the server would be restarting on its own?  

Also worth noting; recently we started backing up with Symantec BESR (Backup Exec System Recovery) to an external USB.  Although BESR reports all backups thus far as successful I think the reboots could possibly have something to do with the scheduled backups its running.

Would appreciate any and all help.

Thanks
Avatar of lnkevin
lnkevin
Flag of United States of America image

My questions are how can I be sure the drive is bad (before I spend money to have it replaced)?

When HP diag report a failed drive, basically is the time to spend money. Also, you can double check by get in ACU to see if the status is red X failed drive in there, it's confirmed.

Also is there a possibility this can be causing the random server restarts?  

Could be a main reason for server restart. If you have a failed drive in RAID 1, your OS is still running as degraded mode. You just don't want to wait until the second one fails so you have to rebuild the whole server.

K
I think the reboots could possibly have something to do with the scheduled backups its running

BESR backup does not cause reboot unless you set it that way. By default, you can schedule something on the background to backup your image without stopping any services or rebooting the server.

K
Avatar of Hypercat (Deb)
If you did not install the Server Administrator software (not Insight Manager, which personally I dislike intensely, but the Server Administrator program supplied by HP), you should do so. This software has a much simpler interface than Insight Manager and will show you exactly what's going on with your RAID array.  Once you confirm that the drive is failed (which is a high probability and is also highly likely to be causing your reboots), replace it ASAP and rebuild the array.  Again, the Server Administrator program will allow you to manage that process.

You can generally trust the OEM diagnostic tools. What I mean is, if the drive was under warranty and you told HP that the diags failed, HP would send you another drive. You could shut down the machine, pop the suspect drive out and run on the 'good' one and see if your problem stops. That won't help the event logs however as the fact that the drive is out will generate its own logs. If you choose to reinsert the drive later, do so with the machine up.



I apologize for any duplicate posts by our proxy server

Justen

You can generally trust the OEM diagnostic tools. What I mean is, if the drive was under warranty and you told HP that the diags failed, HP would send you another drive. You could shut down the machine, pop the suspect drive out and run on the 'good' one and see if your problem stops. That won't help the event logs however as the fact that the drive is out will generate its own logs. If you choose to reinsert the drive later, do so with the machine up.



I apologize for any duplicate posts by our proxy server

Justen
Avatar of joshcallahan1
joshcallahan1

ASKER

lnkevin,

Thank you for your response.  I tried running the ACU to confirm but it told me that "ACU is already running as an application" and upon searching for an answer it seems that ACU has problems running in IE8 which of course I have.  I will try and roll back to IE7 after hours as I'm pretty sure uninstalling IE8 will require a reboot which I  cannot do at this moment.  

Also what other indicators can I look for to assure myself the server is running in degraded mode?  

I do not understand why the LEDs on the drives are not showing signs of fault.  Also the Internel System Health LED on the front of the server is green.  Any insight as to why this is the case?

 I will uninstall IE8 tonight and confirm what ACU is reporting about the status of the HDs.  Thanks again
hypercat,

I have a web based app called HP System Management Homepage...is this what you are referring to?  If it isn't could you possibly post a link or something to get me a download of the software you are talking about?  

Whatever the System Management Homepage is ISN'T showing any HD failures.  It is showing these items as failed:

 Home -> Management Processor -> Embedded NEC98431
 Home -> System -> Power Subsystem
 Logs -> Integrated Management Log  
 
Ironically the Integrated Management Log section is showing:

Insight Diagnostics Note: Physical Hard Drive 6, Controller Slot 0-Diagnosis: Failed

Which is probably what I saw in Insight Diagnostics.  So I'm still unsure.  And again,  why are the LEDs not amber or showing some other indication that the drive is failing?  
Josh
There are other monitors built into the disk drive itself that show predictive failures that these diagnostics will report on BEFORE the disk fails and the light lights up, thats why you can still use it, it has not failed yet.



I apologize for any duplicate posts by our proxy server

Justen
Josh
There are other monitors built into the disk drive itself that show predictive failures that these diagnostics will report on BEFORE the disk fails and the light lights up, thats why you can still use it, it has not failed yet.



I apologize for any duplicate posts by our proxy server

Justen
Whoo Boy, is my face red!! I was thinking about the Dell System Administrator utility which, of course, will not work with an HP Proliant server...too bad. Once I started looking for the software, I realized what I had done... We use both Dell and HP servers at our sites, so I can excuse myself on account of working with both manufacturers at the same time. Sorry for the confusion - all on my part!!
On the Proliant, what you want is the HP Array Config Utility (ACU), which I see you already tried but had a problem with IE8.  Once you roll back to IE7, you should be good to go using the ACU to manage your array.
Deb
 
JustenC:  I guess that makes sense.  I would still like to see something other than the HP Insight Diagnostics tell me that the drive is bad.  I'll worry about that after a reboot and a peek with ACU.  

hypercat:  Ha!  No worries.  At least you recognized the problem before you linked me to any software downloads  ^_-  

So here is where I am at,  I am going to try and roll back to IE7 which I am pretty sure will require a reboot and while I'm doing that I will look and see if the RAID controller or the BIOS reports anything during startup and check and confirm any health statuses there.  Once I boot back up I will further more confirm the drives state with ACU.  

Thanks to all who have responded.  I will keep you all posted.  
Ok, sorry about the delay.  I was able to roll back to IE7 and successfully ran the ACU.  This Array Configuration Utility reported no errors nor did the RAID controllers during POST.

So at this point the Insight Diagnostics is reporting but nothing else is.  After looking around in the HP forums it seems i am not the only one with this dilemma.  If no one else has any suggestions I will close the thread and split points up accordingly.  Thanks for all the responses thus far.    
Sounds good, thanks. I would still keep a drive onhand if you can locate one. One more thing you might check is if the firmware on the controller and the hard drive itself, is up to speed.
Sounds good, thanks. I would still keep a drive onhand if you can locate one. One more thing you might check is if the firmware on the controller and the hard drive itself, is up to speed.
Josh - Sounds like a good plan to keep an extra on hand, since this is RAID1.  I would trust the HP ACU but keep an eye on it. It might be a transient problem that will become more persistent over time.
Cheers!
Deb
ASKER CERTIFIED SOLUTION
Avatar of lnkevin
lnkevin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We don't have service pack 1 installed just SP2 which you need both installed on SBS 2003, we'll install that and see if it works then,. Thanks,

Josh
The link for the solution doesn't work properly:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3716247&prodTypeId=18964&prodSeriesId=3716246&swLang=13&taskId=135&swEnvOID=1005


Product not found
Important Note      Not the product you are looking for? If you cannot find your product on this site, go to HP Support Center - Hewlett Packard Enterprise .
This operation requires that you select a valid HP Inc. product.