?
Solved

HP Dl585 G6 Disk Failure Question

Posted on 2010-08-13
4
Medium Priority
?
765 Views
Last Modified: 2012-05-10
Hey all,

We just rolled out 30+ new DL585 G6's (with most recent firmware). They are setup as follows:

1. Disk Bays 1-2: RAID 1 (2x146gig disk, 15k)
2. Disk Bays 3-8: RAID 5 (6x156gig disk, 10k)

These all run Server 2003 R2 Enterprise, 64-bit. For some reason, on almost all the servers, the disk in Bay 3 (the first disk of the second array) keeps going bad. We've replaced some disk, moved others around, etc., which sometimes works for a day or two but then goes bad again). We figure it can't be that we have bad disk in the same bay of every server. HP doesn't seem to know at this point either. We even tried a different array config (that is, turned the second array to a RAID10)...no luck.

I figured maybe this was a known issue or something, but no luck. Any ideas?

Thanks.
0
Comment
Question by:exadmin2006
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 47

Expert Comment

by:David
ID: 33429105
Specifically what make/model of disk?
0
 
LVL 47

Accepted Solution

by:
David earned 2000 total points
ID: 33429130
If this is all HP kit, so under HP warranty .. then I would just demand that HP comes out and fixes it.  Geez, you bought, what, $100,000 worth of hardware?   Make it their problem, talk to the regional service manager if you have to, and get them to send out a team to make it right, or tell them to send out somebody to deinstall it and take it back.  This is unacceptable.
0
 

Author Comment

by:exadmin2006
ID: 33429239
Good point...the disk makes are:

First array (good array): 146GB 2-port SAS 15k EH0146FARWD
Second array (failing): 146GB 2-port SAS 10K DG146BB976

Not sure of the make (like Seagate, etc.) as I dont access.
0
 
LVL 47

Expert Comment

by:David
ID: 33429426
Well, they are HP disks, so at least you aren't dealing with 3rd-party, so HP is on the hook.   What you can do is
1) check to see if firmware is old, and upgrade.  The HP support site will have upgrades, and more importantly, release notes.   There are ALWAYS bugs in disk (and for that matter), controller firmware, so make sure everything is current.

2) If you have nothing else to do in the interim, you can get yourself a JBOD SAS controller (can't do this with the HP controllers), and run some extreme diagnostics that will tell you exact nature of what is going on, but that has cost associated with it, especially if you don't have a JBOD controller and a way to hook up the drives.     Instead, look at all the event logs in the controller.  It won't give you much, but it might be enough.   SAS drives present a great deal of reportable information, dozens of fields, and the totals are kept in non-volatile memory within the disks, so you could take a few drives that failed and run the software on a JBOD controller   (Look at http://www.santools.com/smart/unix/manual, and goto log pages for SAS disks)

This is from the site to give you an idea what the disks will report, and I'm just scratching the surface as you can run self-tests, get link speeds, verify data.   So if you run diagnostics on some of the disks that failed, and see the nature of the errors (if any), then this will tell you if you just have bad luck with some disk drives.  Or maybe the disks are perfectly fine, and pass all diagnostics.  If so, blame the controller or backplane.  

 Write errors corrected with possible delays: 0 [4]
 Total Write errors: 0 [4]
 Write errors corrected: 0 [4]
 Times correction algorithm processed (on Writes): 0 [4]
 Bytes processed (on Writes): 353948013568 [8]
 Unrecovered errors (on Writes): 0 [4]
 Read errors corrected without substantial delay: 605260 [4]
 Read errors corrected with possible delays: 9 [4]
 Total Read errors: 0 [4]
 Read errors corrected: 605269 [4]
 Times correction algorithm processed (on Reads): 605996 [4]
 Bytes processed (on Reads): 652188835328 [8]
 Unrecovered errors (on Reads): 727 [4]
 Verify errors corrected without substantial delay: 590 [4]
 Verify errors corrected with possible delays: 0 [4]
 Total Verify errors: 0 [4]
 Verify errors corrected: 590 [4]
 Times correction algorithm processed (on Verifys): 590 [4]
 Bytes processed (on Verifys): 0 [8]
 Unrecovered errors (on Verifys): 0 [4]
 Total Non-medium errors: 0 [4]
 Current temperature +/- 3 degrees C: 32
 Reference temperature +/- 3 degrees C: 68
 Background scanning status: 8
 Number of background scans performed: 35
 Background scan percentage completed: 35
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Invalid dwords:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Running disparity errors:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Loss of dword syncs:  0
 SAS Phy #0 (50-00-C5-00-06-94-BF-FD) - Reset problems:  0
0

Featured Post

10 Questions to Ask when Buying Backup Software

Choosing the right backup solution for your organization can be a daunting task. To make the selection process easier, ask solution providers these 10 key questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On July 14th 2015, Windows Server 2003 will become End of Support, leaving hundreds of thousands of servers around the world that still run this 12 year old operating system vulnerable and potentially out of compliance in many organisations around t…
this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question