?
Solved

Powervault 220s - Three Hard Drives Failed

Posted on 2009-12-28
11
Medium Priority
?
1,519 Views
Last Modified: 2013-11-14
We had three (3) hard drives fail on us over the weekend. The hard drives are configured within a Dell Powervault 220s disk array. Fortunantely we were able to rebuild the hard drives successfully. After rebuilding the hard drives there were no more failures. Does anyone know why three hard drives would fail at one time? Do you think it is the disk array? Below is what was taken from the event logs. Any help on diagnosing the cause of the failure would be appreciated:

Event Type:      Error
Event Source:      mraid35x
Event Category:      None
Event ID:      9
Date:            12/25/2009
Time:            12:15:15 PM
User:            N/A
Computer:      <ComputerName>
Description:
The device, \Device\Scsi\mraid35x2, did not respond within the timeout period.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 10 00 01 00 6a 00   ......j.
0008: 00 00 00 00 09 00 04 c0   .......À
0010: 01 01 00 50 00 00 00 00   ...P....
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 03 00 00 00 01 00 00 00   ........
0030: 00 00 00 00 07 00 00 00   ........


Event Type:      Warning
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2049
Date:            12/25/2009
Time:            12:16:52 PM
User:            N/A
Computer:      <ComputerName>
Description:
Physical disk removed:  Physical Disk 0:1 Controller 1, Connector 0


Event Type:      Error
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2048
Date:            12/25/2009
Time:            12:17:29 PM
User:            N/A
Computer:      <ComputerName>
Description:
Device failed:  Physical Disk 0:2 Controller 1, Connector 0


Event Type:      Warning
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2123
Date:            12/25/2009
Time:            12:17:30 PM
User:            N/A
Computer:      <ComputerName>
Description:
Redundancy lost:  Virtual Disk 1 (Virtual Disk 1) Controller 1 (PERC 4/DC)


Event Type:      Warning
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2057
Date:            12/25/2009
Time:            12:17:30 PM
User:            N/A
Computer:      <ComputerName>
Description:
Virtual disk degraded:  Virtual Disk 1 (Virtual Disk 1) Controller 1 (PERC 4/DC)


Event Type:      Error
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2048
Date:            12/25/2009
Time:            12:17:30 PM
User:            N/A
Computer:      <ComputerName>
Description:
Device failed:  Physical Disk 0:12 Controller 1, Connector 0


Event Type:      Warning
Event Source:      Server Administrator
Event Category:      Storage Service
Event ID:      2049
Date:            12/25/2009
Time:            12:18:07 PM
User:            N/A
Computer:      <ComputerName>
Description:
Physical disk removed:  Physical Disk 0:1 Controller 1, Connector 0



 
0
Comment
Question by:illfusion82
10 Comments
 
LVL 10

Accepted Solution

by:
LMiller7 earned 672 total points
ID: 26133421
For 3 hard drives to fail at the same time would be an enormous coincidence. It is highly probable that there was some outside cause for these failures. This could include a power supply over voltage or a failure in the cooling system.
0
 
LVL 14

Expert Comment

by:charlestasse
ID: 26133425
Can you please provide the firmware and driver versions for the controller card and the firmware version on the hard drives
0
 
LVL 11

Assisted Solution

by:gikkel
gikkel earned 664 total points
ID: 26133456
Looks like power issues...are you using a UPS with a line conditioner?  Do you have redundant power supplies?  Unless someone was messing around with the drive bays, it would be highly unusual for 3 devices to fail within a minute and a half.  Could also be the backplane (or any other type of hardware failure).  If its under warranty, try for a new unit.  Otherwise, just replace the power supply unit and monitor the situation.  Make sure your backups are in working order :)
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 1

Author Comment

by:illfusion82
ID: 26133544
The firmware version for controller (PERC 4/DC) is 352D and the driver version is 6.46.2.32. I am not sure where to find the firmware and driver version of the HDD.
0
 
LVL 1

Author Comment

by:illfusion82
ID: 26133576
We do have redundant power supplies and a UPS. I checked Dell OpenManage and did not see any other failures besides the hard drive failures.
0
 
LVL 14

Expert Comment

by:charlestasse
ID: 26133730
If you look at the physical disk entry you willf find the model and revision
Highlight the perc 4, click on the information / configuration link at the top then click the pulldown for the controller tasks. Export the log file, it puts it in the windows directory, look for the most receint file and post it here please
0
 
LVL 1

Author Comment

by:illfusion82
ID: 26134216
Attached is the requested log file.
lsi-1228.log
0
 
LVL 1

Author Comment

by:illfusion82
ID: 26134258
The model of the hard drives is Maxtor ATLAS10K5_146SCA and the revision is JNZ6. We also have two other model hard drives in the array: SEAGATE ST336607LC Rev. DS09  and MAXTOR   ATLAS10K5_73SCA  Rev. JNZM
0
 
LVL 14

Assisted Solution

by:charlestasse
charlestasse earned 664 total points
ID: 26134428
Yes, you are running JNZ firmware, this firmware can cause drives to go offline due to communication time-outs
The controller log shows Timing Wheel expired notifications along with bus resets
12/15 20:33:14: Ch <0> TM Wheel <0> Slots: [0]=1 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 Cur= [4]
12/15 20:33:18: CQM: Timing wheel expired - Chip 0
12/15 20:33:18: Ch <0> TM Wheel <0> Slots: [0]=1 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 Cur= [0]
I can't tell how long this has been going on as the log only goes back to 12/15 but there was massive communication issue on 12/25
This was the cause of your multiple drive issue
Good news is that there is a firmware update that resolves these issues
 
http://ftp.us.dell.com/scsi-drv/DELL_SCSI-HARD-DRIVE-FIRMWAR_A09_R182831.exe
Download and run this on your local computer and follow the directions to burn the program to CD
Boot your server to this CD and let it run in un-attended mode, it will update the firmware on all of the hard drives.
*******Caution*****
Best pratice is to always ALWAYS, have a solid backup before updating any firmware or drivers.
0
 
LVL 1

Author Comment

by:illfusion82
ID: 26134736
Thanks I will update the firmware and let you know how it goes.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month14 days, 17 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question