?
Solved

HP server: Raid 5 - predictive failure

Posted on 2010-09-18
11
Medium Priority
?
4,807 Views
Last Modified: 2012-08-13
HP proliant ml350 g5 windows 2003 server with raid 5 (5 hard disks). I checked the system management homepage and see that the port1|box:1 bay3 is predictive failure and it was actually flashing orange light. I relaced the hard disk, but it still doing the same thing and showing same error here. Also SCSI bus faults: 7, Hard Read Erros:  28536

Checked bay 4 and bay 5, eventhou the status says ok, but I also see some errors there:
Bay 4: Hard Read Erros:  80, Recovery Write Errors:  15

Bay5: Hard Read Erros:  49, Recovery Write Errors:  9

I also run HP Insight Diagnostics and found errors in hard disk 3, 4 and 5 here: please check screenshot:

any idea? system homepage insight 1 insight 2 insight 3
0
Comment
Question by:okamon
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 3

Expert Comment

by:rajkumartech
ID: 33710212
Try update the firmware and latest BIOS.
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33710238
Before you update the firmware and/or bios of any kind with a questionable drive system, I would look at the backplane (where the drives get connected to the system) and your raid array controller. I would shut the system down, pull the drives out one by one or mark them so you can keep them in order, blow some air in the drive cage, reseat them. Take the array controller out and reseat it. Better yet, if you can get a replacment raid array controller of the same model, replace it. Then power the system back up and see what you up against. If the array starts to rebuild let it finish before you do anything else. Once you have a healthy raid, then do any firmware / bios updates. If you update the firmware on your raid array controller, BE SURE to also update the drivers in Windows before booting up with the new firmware. Otherwise, your system may not boot up.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 33710816
Hard Read Errors:  28536 is unacceptable, the other two aren't so bad. But if that is the replacement drive then it doesn't make much sense since it should have started off at a count of zero unless it was replaced with an equally faulty drive. DF072A9844 doesn't come up as having any firmware apart from initial release HPD0 but strangely it's only listed under Integrity servers, not Proliants and then only on one single document on HP's site. It's got 20 months service hours on it, so it isn't new. Maybe it's the stats for the old one that you've posted.

If you search the link below then you'll see that that drive model isn't listed at all, whereas if you just search it for DF072A you'll find 6 other drives that must be pretty similar and they are at HPD7 or above firmware.

I would question where this replacement disk came from, I suspect it may be a model that HP has pulled or forgotten existed, there just aren't enough references of that model number on HP's website to give me confidence in it.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3182562&prodTypeId=329290&prodSeriesId=1157688&swLang=8&taskId=135&swEnvOID=1005 lists all the drive firmware for Proliants and that disk isn't there.
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 

Author Comment

by:okamon
ID: 33711817
The hard disk was ordered from HP. and I already updated the firmware and driver... didn't help

Now I just noticed that all models of other hard disk are DF072A8B56, but the one one in Bay3 is DF072A9844...... different model hard disk can cause the problem??
1.JPG
2.JPG
3.JPG
4.JPG
5.JPG
0
 
LVL 56

Expert Comment

by:andyalder
ID: 33711847
Not so much the different part number but that that particular disk doesn't have firmware available for it. I suspect HP have sent you one they've "tested" rather than a new one. They could have just made a mistake by fixing it but forgetting to clear down the S.M.A.R.T data back to zero. Phone them up first thing and tell them they've sent you a pup.
0
 

Author Comment

by:okamon
ID: 33729996
I will try to order another new hard disk and try it again. and post the result here.
0
 

Author Comment

by:okamon
ID: 33740590
Hi I have repalced the hard drive (hard disk 3) and this time It shows passed in array configuration utility and system management hompage. But in HP insight Diagnostics, the new hard disk now shows passed, but the hard disk 4 and 5 still shows error. Is the error normal? Can I ignore the error?
1.JPG
2.JPG
0
 
LVL 5

Expert Comment

by:sosinc3
ID: 33741095
It seems the errors are pointing toward an expired warranty which is not something to worry about. However, it is also reporting errors beyond threshold. Where are these drives coming from? Are these used? If they are saying its beyond warranty I can't imagine they are coming direct from HP. You will need to may be replace these other drives as well but one at a time giving time for the array to rebuild fully.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 33742064
4 and 5 will show as bad until replaced as it says the read and write hard error count is above threshold.
0
 

Author Comment

by:okamon
ID: 33749068
so are you guys saying the 2 hard disks are going to fail soon?
0
 
LVL 56

Accepted Solution

by:
andyalder earned 2000 total points
ID: 33751999
They won't necessarily fail but they have bad blocks that the drive hasn't mapped out so you may get read failures, If you value your data I would replace them (but not at the same time). You may just get away with the server being under warranty if you're lucky.
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question