Solved

HP server: Raid 5 - predictive failure

Posted on 2010-09-18
11
4,625 Views
Last Modified: 2012-08-13
HP proliant ml350 g5 windows 2003 server with raid 5 (5 hard disks). I checked the system management homepage and see that the port1|box:1 bay3 is predictive failure and it was actually flashing orange light. I relaced the hard disk, but it still doing the same thing and showing same error here. Also SCSI bus faults: 7, Hard Read Erros:  28536

Checked bay 4 and bay 5, eventhou the status says ok, but I also see some errors there:
Bay 4: Hard Read Erros:  80, Recovery Write Errors:  15

Bay5: Hard Read Erros:  49, Recovery Write Errors:  9

I also run HP Insight Diagnostics and found errors in hard disk 3, 4 and 5 here: please check screenshot:

any idea? system homepage insight 1 insight 2 insight 3
0
Comment
Question by:okamon
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 3

Expert Comment

by:rajkumartech
ID: 33710212
Try update the firmware and latest BIOS.
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33710238
Before you update the firmware and/or bios of any kind with a questionable drive system, I would look at the backplane (where the drives get connected to the system) and your raid array controller. I would shut the system down, pull the drives out one by one or mark them so you can keep them in order, blow some air in the drive cage, reseat them. Take the array controller out and reseat it. Better yet, if you can get a replacment raid array controller of the same model, replace it. Then power the system back up and see what you up against. If the array starts to rebuild let it finish before you do anything else. Once you have a healthy raid, then do any firmware / bios updates. If you update the firmware on your raid array controller, BE SURE to also update the drivers in Windows before booting up with the new firmware. Otherwise, your system may not boot up.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33710816
Hard Read Errors:  28536 is unacceptable, the other two aren't so bad. But if that is the replacement drive then it doesn't make much sense since it should have started off at a count of zero unless it was replaced with an equally faulty drive. DF072A9844 doesn't come up as having any firmware apart from initial release HPD0 but strangely it's only listed under Integrity servers, not Proliants and then only on one single document on HP's site. It's got 20 months service hours on it, so it isn't new. Maybe it's the stats for the old one that you've posted.

If you search the link below then you'll see that that drive model isn't listed at all, whereas if you just search it for DF072A you'll find 6 other drives that must be pretty similar and they are at HPD7 or above firmware.

I would question where this replacement disk came from, I suspect it may be a model that HP has pulled or forgotten existed, there just aren't enough references of that model number on HP's website to give me confidence in it.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3182562&prodTypeId=329290&prodSeriesId=1157688&swLang=8&taskId=135&swEnvOID=1005 lists all the drive firmware for Proliants and that disk isn't there.
0
 

Author Comment

by:okamon
ID: 33711817
The hard disk was ordered from HP. and I already updated the firmware and driver... didn't help

Now I just noticed that all models of other hard disk are DF072A8B56, but the one one in Bay3 is DF072A9844...... different model hard disk can cause the problem??
1.JPG
2.JPG
3.JPG
4.JPG
5.JPG
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33711847
Not so much the different part number but that that particular disk doesn't have firmware available for it. I suspect HP have sent you one they've "tested" rather than a new one. They could have just made a mistake by fixing it but forgetting to clear down the S.M.A.R.T data back to zero. Phone them up first thing and tell them they've sent you a pup.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:okamon
ID: 33729996
I will try to order another new hard disk and try it again. and post the result here.
0
 

Author Comment

by:okamon
ID: 33740590
Hi I have repalced the hard drive (hard disk 3) and this time It shows passed in array configuration utility and system management hompage. But in HP insight Diagnostics, the new hard disk now shows passed, but the hard disk 4 and 5 still shows error. Is the error normal? Can I ignore the error?
1.JPG
2.JPG
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33741095
It seems the errors are pointing toward an expired warranty which is not something to worry about. However, it is also reporting errors beyond threshold. Where are these drives coming from? Are these used? If they are saying its beyond warranty I can't imagine they are coming direct from HP. You will need to may be replace these other drives as well but one at a time giving time for the array to rebuild fully.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33742064
4 and 5 will show as bad until replaced as it says the read and write hard error count is above threshold.
0
 

Author Comment

by:okamon
ID: 33749068
so are you guys saying the 2 hard disks are going to fail soon?
0
 
LVL 55

Accepted Solution

by:
andyalder earned 500 total points
ID: 33751999
They won't necessarily fail but they have bad blocks that the drive hasn't mapped out so you may get read failures, If you value your data I would replace them (but not at the same time). You may just get away with the server being under warranty if you're lucky.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Usually shares are where we want them for our users and we tend to take them for granted. There are times, however, when those shares may disappear causing difficulty for your users. One of the first things to try is searching for files that shou…
A Bare Metal Image backup allows for the restore of an entire system to a similar or dissimilar hardware. They are highly useful for migrations and disaster recovery. Bare Metal Image backups support Full and Incremental backups. Differential backup…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now