Solved

HP server: Raid 5 - predictive failure

Posted on 2010-09-18
11
4,726 Views
Last Modified: 2012-08-13
HP proliant ml350 g5 windows 2003 server with raid 5 (5 hard disks). I checked the system management homepage and see that the port1|box:1 bay3 is predictive failure and it was actually flashing orange light. I relaced the hard disk, but it still doing the same thing and showing same error here. Also SCSI bus faults: 7, Hard Read Erros:  28536

Checked bay 4 and bay 5, eventhou the status says ok, but I also see some errors there:
Bay 4: Hard Read Erros:  80, Recovery Write Errors:  15

Bay5: Hard Read Erros:  49, Recovery Write Errors:  9

I also run HP Insight Diagnostics and found errors in hard disk 3, 4 and 5 here: please check screenshot:

any idea? system homepage insight 1 insight 2 insight 3
0
Comment
Question by:okamon
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 3

Expert Comment

by:rajkumartech
ID: 33710212
Try update the firmware and latest BIOS.
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33710238
Before you update the firmware and/or bios of any kind with a questionable drive system, I would look at the backplane (where the drives get connected to the system) and your raid array controller. I would shut the system down, pull the drives out one by one or mark them so you can keep them in order, blow some air in the drive cage, reseat them. Take the array controller out and reseat it. Better yet, if you can get a replacment raid array controller of the same model, replace it. Then power the system back up and see what you up against. If the array starts to rebuild let it finish before you do anything else. Once you have a healthy raid, then do any firmware / bios updates. If you update the firmware on your raid array controller, BE SURE to also update the drivers in Windows before booting up with the new firmware. Otherwise, your system may not boot up.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33710816
Hard Read Errors:  28536 is unacceptable, the other two aren't so bad. But if that is the replacement drive then it doesn't make much sense since it should have started off at a count of zero unless it was replaced with an equally faulty drive. DF072A9844 doesn't come up as having any firmware apart from initial release HPD0 but strangely it's only listed under Integrity servers, not Proliants and then only on one single document on HP's site. It's got 20 months service hours on it, so it isn't new. Maybe it's the stats for the old one that you've posted.

If you search the link below then you'll see that that drive model isn't listed at all, whereas if you just search it for DF072A you'll find 6 other drives that must be pretty similar and they are at HPD7 or above firmware.

I would question where this replacement disk came from, I suspect it may be a model that HP has pulled or forgotten existed, there just aren't enough references of that model number on HP's website to give me confidence in it.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3182562&prodTypeId=329290&prodSeriesId=1157688&swLang=8&taskId=135&swEnvOID=1005 lists all the drive firmware for Proliants and that disk isn't there.
0
Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

 

Author Comment

by:okamon
ID: 33711817
The hard disk was ordered from HP. and I already updated the firmware and driver... didn't help

Now I just noticed that all models of other hard disk are DF072A8B56, but the one one in Bay3 is DF072A9844...... different model hard disk can cause the problem??
1.JPG
2.JPG
3.JPG
4.JPG
5.JPG
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33711847
Not so much the different part number but that that particular disk doesn't have firmware available for it. I suspect HP have sent you one they've "tested" rather than a new one. They could have just made a mistake by fixing it but forgetting to clear down the S.M.A.R.T data back to zero. Phone them up first thing and tell them they've sent you a pup.
0
 

Author Comment

by:okamon
ID: 33729996
I will try to order another new hard disk and try it again. and post the result here.
0
 

Author Comment

by:okamon
ID: 33740590
Hi I have repalced the hard drive (hard disk 3) and this time It shows passed in array configuration utility and system management hompage. But in HP insight Diagnostics, the new hard disk now shows passed, but the hard disk 4 and 5 still shows error. Is the error normal? Can I ignore the error?
1.JPG
2.JPG
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33741095
It seems the errors are pointing toward an expired warranty which is not something to worry about. However, it is also reporting errors beyond threshold. Where are these drives coming from? Are these used? If they are saying its beyond warranty I can't imagine they are coming direct from HP. You will need to may be replace these other drives as well but one at a time giving time for the array to rebuild fully.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33742064
4 and 5 will show as bad until replaced as it says the read and write hard error count is above threshold.
0
 

Author Comment

by:okamon
ID: 33749068
so are you guys saying the 2 hard disks are going to fail soon?
0
 
LVL 55

Accepted Solution

by:
andyalder earned 500 total points
ID: 33751999
They won't necessarily fail but they have bad blocks that the drive hasn't mapped out so you may get read failures, If you value your data I would replace them (but not at the same time). You may just get away with the server being under warranty if you're lucky.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
While rebooting windows server 2003 server , it's showing "active directory rebuilding indices please wait" at startup. It took a little while for this process to complete and once we logged on not all the services were started so another reboot is …
In this Micro Tutorial viewers will learn how to use Windows Server Backup to create full image of their system. Tutorial shows how to install Windows Server Backup Feature on Windows 2012R2 and how to configure scheduled Bare Metal Recovery backup.…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question