Solved

HP Dl585 G6 Disk Failure Question

Posted on 2010-08-13
4
761 Views
Last Modified: 2012-05-10
Hey all,

We just rolled out 30+ new DL585 G6's (with most recent firmware). They are setup as follows:

1. Disk Bays 1-2: RAID 1 (2x146gig disk, 15k)
2. Disk Bays 3-8: RAID 5 (6x156gig disk, 10k)

These all run Server 2003 R2 Enterprise, 64-bit. For some reason, on almost all the servers, the disk in Bay 3 (the first disk of the second array) keeps going bad. We've replaced some disk, moved others around, etc., which sometimes works for a day or two but then goes bad again). We figure it can't be that we have bad disk in the same bay of every server. HP doesn't seem to know at this point either. We even tried a different array config (that is, turned the second array to a RAID10)...no luck.

I figured maybe this was a known issue or something, but no luck. Any ideas?

Thanks.
0
Comment
Question by:exadmin2006
  • 3
4 Comments
 
LVL 47

Expert Comment

by:dlethe
ID: 33429105
Specifically what make/model of disk?
0
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
ID: 33429130
If this is all HP kit, so under HP warranty .. then I would just demand that HP comes out and fixes it.  Geez, you bought, what, $100,000 worth of hardware?   Make it their problem, talk to the regional service manager if you have to, and get them to send out a team to make it right, or tell them to send out somebody to deinstall it and take it back.  This is unacceptable.
0
 

Author Comment

by:exadmin2006
ID: 33429239
Good point...the disk makes are:

First array (good array): 146GB 2-port SAS 15k EH0146FARWD
Second array (failing): 146GB 2-port SAS 10K DG146BB976

Not sure of the make (like Seagate, etc.) as I dont access.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 33429426
Well, they are HP disks, so at least you aren't dealing with 3rd-party, so HP is on the hook.   What you can do is
1) check to see if firmware is old, and upgrade.  The HP support site will have upgrades, and more importantly, release notes.   There are ALWAYS bugs in disk (and for that matter), controller firmware, so make sure everything is current.

2) If you have nothing else to do in the interim, you can get yourself a JBOD SAS controller (can't do this with the HP controllers), and run some extreme diagnostics that will tell you exact nature of what is going on, but that has cost associated with it, especially if you don't have a JBOD controller and a way to hook up the drives.     Instead, look at all the event logs in the controller.  It won't give you much, but it might be enough.   SAS drives present a great deal of reportable information, dozens of fields, and the totals are kept in non-volatile memory within the disks, so you could take a few drives that failed and run the software on a JBOD controller   (Look at http://www.santools.com/smart/unix/manual, and goto log pages for SAS disks)

This is from the site to give you an idea what the disks will report, and I'm just scratching the surface as you can run self-tests, get link speeds, verify data.   So if you run diagnostics on some of the disks that failed, and see the nature of the errors (if any), then this will tell you if you just have bad luck with some disk drives.  Or maybe the disks are perfectly fine, and pass all diagnostics.  If so, blame the controller or backplane.  

 Write errors corrected with possible delays: 0 [4]
 Total Write errors: 0 [4]
 Write errors corrected: 0 [4]
 Times correction algorithm processed (on Writes): 0 [4]
 Bytes processed (on Writes): 353948013568 [8]
 Unrecovered errors (on Writes): 0 [4]
 Read errors corrected without substantial delay: 605260 [4]
 Read errors corrected with possible delays: 9 [4]
 Total Read errors: 0 [4]
 Read errors corrected: 605269 [4]
 Times correction algorithm processed (on Reads): 605996 [4]
 Bytes processed (on Reads): 652188835328 [8]
 Unrecovered errors (on Reads): 727 [4]
 Verify errors corrected without substantial delay: 590 [4]
 Verify errors corrected with possible delays: 0 [4]
 Total Verify errors: 0 [4]
 Verify errors corrected: 590 [4]
 Times correction algorithm processed (on Verifys): 590 [4]
 Bytes processed (on Verifys): 0 [8]
 Unrecovered errors (on Verifys): 0 [4]
 Total Non-medium errors: 0 [4]
 Current temperature +/- 3 degrees C: 32
 Reference temperature +/- 3 degrees C: 68
 Background scanning status: 8
 Number of background scans performed: 35
 Background scan percentage completed: 35
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Invalid dwords:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Running disparity errors:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Loss of dword syncs:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Reset problems:  0
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Usually shares are where we want them for our users and we tend to take them for granted. There are times, however, when those shares may disappear causing difficulty for your users. One of the first things to try is searching for files that shou…
Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

790 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question