sagetechit
asked on
ARECA 1220 FAILED DRIVE/Degraded Volume?!
We have a Areca 1220 Raid Card with a RAID 6 array containing 8 drives. The card started beeping a week back and we checked it out the raid was degraded. It said Ch01 failed so we popped it out and popped in a new one. It repaired for about 2 and a half days, and then it failed. We tried it again and it failed. We checked the log and the only thing we can see is that IDE Channel 3 is consistently getting a Reading Error. What should we do?
ASKER
How do I get the block number? When I get it, how do I write to that black? I checked the event log this is all I get.
Time Device Event Type Elapse Time Errors
2009-09-28 07:51:25 Proxy Or Inband HTTP Log In
2009-09-26 06:05:35 ARC-1220-VOL#00 Complete Rebuild 045:31:39
2009-09-26 03:22:51 IDE Channel 3 Reading Error
2009-09-26 03:22:42 IDE Channel 3 Reading Error
2009-09-26 03:22:34 IDE Channel 3 Reading Error
2009-09-26 03:22:25 IDE Channel 3 Reading Error
2009-09-25 23:52:40 IDE Channel 3 Reading Error
2009-09-25 23:52:25 IDE Channel 3 Reading Error
2009-09-25 18:24:02 IDE Channel 1 Device Failed
2009-09-25 18:24:02 Raid Set # 00 RaidSet Degraded
2009-09-25 18:24:02 ARC-1220-VOL#00 Volume Degraded
2009-09-25 10:02:36 Proxy Or Inband HTTP Log In
2009-09-25 09:56:58 Proxy Or Inband HTTP Log In
2009-09-25 07:50:09 IDE Channel 3 Reading Error
2009-09-25 07:50:00 IDE Channel 3 Reading Error
2009-09-25 07:49:52 IDE Channel 3 Reading Error
2009-09-25 07:49:39 IDE Channel 3 Reading Error
2009-09-25 07:49:31 IDE Channel 3 Reading Error
2009-09-24 08:46:13 Proxy Or Inband HTTP Log In
2009-09-24 08:44:44 Proxy Or Inband HTTP Log In
2009-09-24 08:32:11 IDE Channel 3 Reading Error
2009-09-24 08:32:03 IDE Channel 3 Reading Error
2009-09-24 08:31:55 IDE Channel 3 Reading Error
2009-09-24 08:30:00 ARC-1220-VOL#00 Start Rebuilding
2009-09-24 08:30:00 ARC-1220-VOL#00 Abort Rebuilding 000:02:32
2009-09-24 08:29:59 Raid Set # 00 Rebuild RaidSet
2009-09-24 08:29:59 IDE Channel 1 Device Inserted
Time Device Event Type Elapse Time Errors
2009-09-28 07:51:25 Proxy Or Inband HTTP Log In
2009-09-26 06:05:35 ARC-1220-VOL#00 Complete Rebuild 045:31:39
2009-09-26 03:22:51 IDE Channel 3 Reading Error
2009-09-26 03:22:42 IDE Channel 3 Reading Error
2009-09-26 03:22:34 IDE Channel 3 Reading Error
2009-09-26 03:22:25 IDE Channel 3 Reading Error
2009-09-25 23:52:40 IDE Channel 3 Reading Error
2009-09-25 23:52:25 IDE Channel 3 Reading Error
2009-09-25 18:24:02 IDE Channel 1 Device Failed
2009-09-25 18:24:02 Raid Set # 00 RaidSet Degraded
2009-09-25 18:24:02 ARC-1220-VOL#00 Volume Degraded
2009-09-25 10:02:36 Proxy Or Inband HTTP Log In
2009-09-25 09:56:58 Proxy Or Inband HTTP Log In
2009-09-25 07:50:09 IDE Channel 3 Reading Error
2009-09-25 07:50:00 IDE Channel 3 Reading Error
2009-09-25 07:49:52 IDE Channel 3 Reading Error
2009-09-25 07:49:39 IDE Channel 3 Reading Error
2009-09-25 07:49:31 IDE Channel 3 Reading Error
2009-09-24 08:46:13 Proxy Or Inband HTTP Log In
2009-09-24 08:44:44 Proxy Or Inband HTTP Log In
2009-09-24 08:32:11 IDE Channel 3 Reading Error
2009-09-24 08:32:03 IDE Channel 3 Reading Error
2009-09-24 08:31:55 IDE Channel 3 Reading Error
2009-09-24 08:30:00 ARC-1220-VOL#00 Start Rebuilding
2009-09-24 08:30:00 ARC-1220-VOL#00 Abort Rebuilding 000:02:32
2009-09-24 08:29:59 Raid Set # 00 Rebuild RaidSet
2009-09-24 08:29:59 IDE Channel 1 Device Inserted
Darn, I was hoping the event log gave you the block number. Once you had it, then you could use a binary editor, or just write a simple program to write to it. I have access to the Areca API as I am a RAID architect, and could write a program to force repair, but I would have to charge you. You should be able to get a utility from Areca that gives you the exact block number. So my advice is to contact them directly. It sucks when a rebuild dies due to a bad block, and there is no obvious way to force a write and move on.. The data is lost anyway.
Based on results, you have numerous read errors. So you could have XOR errors, hard read errors, or combination of both. So recovery has to be meticulously done and require somebody who knows what they are doing. Since the problem is happening in the middle of a RAID6 rebuild, then you can forget using one of those off-the-shelf RAID reconstruction software packages. The RAID6 rebuild uses a proprietary layout and algorithms that mark what has and hasn't been done, and you will really muck things up if you even try to go down that path.
Best to contact Areca and hope they have a solution for you. The alternatives are all expensive unless you just want to either recover from backup, or ghost the system as it is to a backup, rebuild the array from scratch, then restore.
(My opinion is that you will ultimately have to get some scratch disks, do a ghost backup, rebuild the array - with full data destructive initialization, then restore. No matter what you do, do it quickly ... lose one more disk in the interim, and you have 100% data loss)
Based on results, you have numerous read errors. So you could have XOR errors, hard read errors, or combination of both. So recovery has to be meticulously done and require somebody who knows what they are doing. Since the problem is happening in the middle of a RAID6 rebuild, then you can forget using one of those off-the-shelf RAID reconstruction software packages. The RAID6 rebuild uses a proprietary layout and algorithms that mark what has and hasn't been done, and you will really muck things up if you even try to go down that path.
Best to contact Areca and hope they have a solution for you. The alternatives are all expensive unless you just want to either recover from backup, or ghost the system as it is to a backup, rebuild the array from scratch, then restore.
(My opinion is that you will ultimately have to get some scratch disks, do a ghost backup, rebuild the array - with full data destructive initialization, then restore. No matter what you do, do it quickly ... lose one more disk in the interim, and you have 100% data loss)
ASKER
Since the error comes from drive channel 3, could I take that drive out, and also channel 1(which is the one it said that failed) and rebuild both at once? It says we can lose up to 2.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
In future make sure you regularly run data consistency check/restores! Had you have been doing this, then it is likely this never would have happened.