Solved

HP server: Raid 5 - predictive failure

Posted on 2010-09-18
11
4,773 Views
Last Modified: 2012-08-13
HP proliant ml350 g5 windows 2003 server with raid 5 (5 hard disks). I checked the system management homepage and see that the port1|box:1 bay3 is predictive failure and it was actually flashing orange light. I relaced the hard disk, but it still doing the same thing and showing same error here. Also SCSI bus faults: 7, Hard Read Erros:  28536

Checked bay 4 and bay 5, eventhou the status says ok, but I also see some errors there:
Bay 4: Hard Read Erros:  80, Recovery Write Errors:  15

Bay5: Hard Read Erros:  49, Recovery Write Errors:  9

I also run HP Insight Diagnostics and found errors in hard disk 3, 4 and 5 here: please check screenshot:

any idea? system homepage insight 1 insight 2 insight 3
0
Comment
Question by:okamon
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 3

Expert Comment

by:rajkumartech
ID: 33710212
Try update the firmware and latest BIOS.
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33710238
Before you update the firmware and/or bios of any kind with a questionable drive system, I would look at the backplane (where the drives get connected to the system) and your raid array controller. I would shut the system down, pull the drives out one by one or mark them so you can keep them in order, blow some air in the drive cage, reseat them. Take the array controller out and reseat it. Better yet, if you can get a replacment raid array controller of the same model, replace it. Then power the system back up and see what you up against. If the array starts to rebuild let it finish before you do anything else. Once you have a healthy raid, then do any firmware / bios updates. If you update the firmware on your raid array controller, BE SURE to also update the drivers in Windows before booting up with the new firmware. Otherwise, your system may not boot up.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33710816
Hard Read Errors:  28536 is unacceptable, the other two aren't so bad. But if that is the replacement drive then it doesn't make much sense since it should have started off at a count of zero unless it was replaced with an equally faulty drive. DF072A9844 doesn't come up as having any firmware apart from initial release HPD0 but strangely it's only listed under Integrity servers, not Proliants and then only on one single document on HP's site. It's got 20 months service hours on it, so it isn't new. Maybe it's the stats for the old one that you've posted.

If you search the link below then you'll see that that drive model isn't listed at all, whereas if you just search it for DF072A you'll find 6 other drives that must be pretty similar and they are at HPD7 or above firmware.

I would question where this replacement disk came from, I suspect it may be a model that HP has pulled or forgotten existed, there just aren't enough references of that model number on HP's website to give me confidence in it.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3182562&prodTypeId=329290&prodSeriesId=1157688&swLang=8&taskId=135&swEnvOID=1005 lists all the drive firmware for Proliants and that disk isn't there.
0
Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

 

Author Comment

by:okamon
ID: 33711817
The hard disk was ordered from HP. and I already updated the firmware and driver... didn't help

Now I just noticed that all models of other hard disk are DF072A8B56, but the one one in Bay3 is DF072A9844...... different model hard disk can cause the problem??
1.JPG
2.JPG
3.JPG
4.JPG
5.JPG
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33711847
Not so much the different part number but that that particular disk doesn't have firmware available for it. I suspect HP have sent you one they've "tested" rather than a new one. They could have just made a mistake by fixing it but forgetting to clear down the S.M.A.R.T data back to zero. Phone them up first thing and tell them they've sent you a pup.
0
 

Author Comment

by:okamon
ID: 33729996
I will try to order another new hard disk and try it again. and post the result here.
0
 

Author Comment

by:okamon
ID: 33740590
Hi I have repalced the hard drive (hard disk 3) and this time It shows passed in array configuration utility and system management hompage. But in HP insight Diagnostics, the new hard disk now shows passed, but the hard disk 4 and 5 still shows error. Is the error normal? Can I ignore the error?
1.JPG
2.JPG
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33741095
It seems the errors are pointing toward an expired warranty which is not something to worry about. However, it is also reporting errors beyond threshold. Where are these drives coming from? Are these used? If they are saying its beyond warranty I can't imagine they are coming direct from HP. You will need to may be replace these other drives as well but one at a time giving time for the array to rebuild fully.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33742064
4 and 5 will show as bad until replaced as it says the read and write hard error count is above threshold.
0
 

Author Comment

by:okamon
ID: 33749068
so are you guys saying the 2 hard disks are going to fail soon?
0
 
LVL 55

Accepted Solution

by:
andyalder earned 500 total points
ID: 33751999
They won't necessarily fail but they have bad blocks that the drive hasn't mapped out so you may get read failures, If you value your data I would replace them (but not at the same time). You may just get away with the server being under warranty if you're lucky.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
This article provides a convenient collection of links to Microsoft provided Security Patches for operating systems that have reached their End of Life support cycle. Included operating systems covered by this article are Windows XP,  Windows Server…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question