Solved

Areca ARC-1882-ix24 "Failed Migration" status and Lost Rebuilding/Migration LBA while migration RAID5 to RAID6

Posted on 2014-12-11
9
461 Views
Last Modified: 2015-01-20
hi mates, experts,

I have such problem at my lab..., I'm really frustrated that I can't help  myself alone and Areca support can't help too.

it's same SUPERMICRO custom server running ESXi with ZFS NAS as a VM with VMDP with such specs:
- SuperMicro MotherBoard (Server) X8DAH+-F
- Areca RAID ARC-1882-ix24 with 4GB cache and BBU and 4x SSD RAID 10, 6x SATA RAID 6 and 4x SATA RAID5 (which failed right now, and where problem occur during migration to RAID 6 after adding a 1 disk, so 5 disks in that faulted RAID 5/6)
- 2x 6 core low vol CPU, 144GB ram, 1 Quad Ethernet NIC and 1 FC 4Gbps card
- ESXi v5.5 (boot from local ssd, its v old many times upgraded installation) with VMDP Areca RAID to a ZFS NAS Nexenta 3.x VM with 24GB of ram and ZFS deduplication on that failed vol...:((
- 460W server/rack PS - yep, I know its little bit too weak (server consume 300W in idle and 400W with high load)
- NAS VM,

Everything working ok, when in my head a wise idea appeared to migrate a RAID 5 of 4 disks to RAID 6, generally there is no big objection towards to do that online when this RAID 5 dedup volume will not be used for a while (there ware only VM templates and other images and backups there, so why it was deduplicated). So I add 1 SATA hdd and start migration (from RAID 5 to RAID 6), after some hours I observe my whole server just hang (it not happens often but I recognize that that iw was an error on Areca RAID, controller react like that when it have some real error or huge problem with disk timeouts), so I restart the server. Everything works fine but that RAID 5/6 migration...:(
It stops on "Failed Migration" status on RAID hierarchy and also I observe Lost Rebuilding/Migration LBA in event log".
So in my investigation the reason of fault (and server hang) ware most probably new disk that looks not working well (I test it on the other server, it just connect and disconnect occasionally) or/and too low power/long power cable to that new disk, generally what do a problem, I'm not fully sure but the status now is that volume on that RAID 5/6 is in failed status and RAID Set is in Failed Migration status, and in the event log of a Areca RAID Controller 1 time (during hang) appear a " Lost Rebulding/Migration LBA" log.

So what I try.
as some Areca KB said and Areca support suggest try to do:
- power off the server
- put off all hdd from Areca RAID Controller
- power on the server and got to Areca BIOS
- power off the server
- connect only that faulted RAID 5/6
- try to start OS and finish migration, not working

second try
- in the Areca Web GUI I set in Rescue RAID Set I type RESCUE
- power off the server
- put off all hdd from Areca RAID Controller but that faulted RAID 5/6
- power on the server and boot OS
- start NAS VM, RAID 5/6 migration not started :(
- in the Areca Web GUI I set in Rescue RAID Set I type SIGNAT    (I know it's not required then but I do that to be sure)
- repeat the boot and migration start check procedure, not working :(

I also try to change that 5'th (faulted) disk to new one, hmm, everything not working at all :((
The status now is that I have only 4 disk, that first that working ok with RAID 5 and volume under OS is in faulted status, RAID Set is in Faulted Migration status and I have no idea what to check next...

I know, it will be hard to save this RAID but maybe we exchange our knowledge what can be eventually done (and what was eventually done wring) and why such error occur and how to investigate it in the future.

thanks
NTShad0w
0
Comment
Question by:Dawid Fusek
  • 6
  • 3
9 Comments
 
LVL 30

Expert Comment

by:pgm554
ID: 40495459
What are the make and model of drives being used?
If you're using hardware RAID,using anything less than server class drives is a mistake.
Non server drives have a higher bit error rate and issues with time limited error recovery.
0
 
LVL 5

Assisted Solution

by:Dawid Fusek
Dawid Fusek earned 0 total points
ID: 40495499
hi mate,

because it's my lab only and because I have a lot of that drives/hdd some years ago, I use ST2000DM001 (these RAID 5/6 which failed, I know that these drives are highly unreliable), and for good RAID 6 RAID Set I use ST3000DM001.
So I know and understand that the problem is come most probably from a faulted not server class drive (or lack of PS power).
I just looking for any possibility to repair/safe that RAID Set because in official support version it should be possible.
You know these disks are working in that server with Areca for 3 years now, I know they are sometimes drop, if any is drop too much I replace it to newer one, I also know and really not recommend use these drives directly under ZFS because it
Is really not like errors that these disks sometimes generate (I observe them on Areca HW RAID log during weekly Volume Set Check) on bus or internal, but on HW RAID it works quite ok if I have selected ones for a years. So I know, but I dont care too much, I ask a question because I'm looking for any solution that can show me how to (if possible) repair that problem.

best regards
NTShad0w
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 40496336
?but on HW RAID it works quite ok

Well the issue with desktop drives on HW RAID is that when they try to do an error recovery it can take up to 15 seconds on a on a desktop drive and a hardware RAID controller will flag the drove as bad.

If you're using straight ZFS ,no hardware RAID this wouldn't matter as ZFS rewrites a bad sector as soon as detects one.

I think you're just in a situation where too many double bit parity errors killed your RAID in the middle of a rebuild.
0
 
LVL 5

Author Comment

by:Dawid Fusek
ID: 40496984
hmm,

in theory it may happens (double bit parity errors during RAID migration), but then it rather not hang the server because of it, the situation where this server hang up is (from my exp) where Areca RAID Controller hang or there is a problem with too weak PS (and too many hdd), not any kind of parity errors on hdd.... and Areca RAID Controller rather not hang when parity errors will occur.

so the question from my perspective is still open.

regards
NTShad0w
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 40497042
A migration is the same as a rebuild.
The problem with desktop drives is that with HW RAID 5 you can have two drives timeout at different times and the controller will get stuck during a rebuild.
Lose 2 drives in 5 and data is history.
I've got a Intel server chassis with a 375 watt that will handle 6 drives ,so unless it's a cheap supply ,it should be OK.
0
 
LVL 5

Accepted Solution

by:
Dawid Fusek earned 0 total points
ID: 40497941
truly, I don't investigate how RAID level migration on Areca is really done, and how it's safe (as we can observe on my example it's not as safe as it should be). Areca RAID Controller also don't notice (in the event log) 2 drives timeout, just one drive, that "new" one on which RAID migration was run...

And with PS, PS is a server class PS, so no problem with too low quality but from my calculations it's little too weak for that config (it should have close to 550W to have 15-20% power reserve) when using server with 10 SATA HDD and 4 SATA SSD and rest of hw, so 460W is very close to maximal usage of that config under maximal load (probably something around 450W).

regards
NTShad0w
0
 
LVL 5

Author Comment

by:Dawid Fusek
ID: 40505917
any other ideas about main subject (possibilities of reanimation that RAID 5/6) mates?

regards
NTShad0w
0
 
LVL 5

Author Comment

by:Dawid Fusek
ID: 40553247
Also for any RAID or data recovery there is a best tool that I used yesterday to recover this array (and get back my VM's data!!!) is DiskInternals VMFS Recovery tool from here:
http://www.diskinternals.com/download/

good to know it for future use, it's incredible powerful tool (but not cheap, 700 usd).

regards
NTShad0w
0
 
LVL 5

Author Closing Comment

by:Dawid Fusek
ID: 40559351
thanks for give it a try and your time mate.

regards
NTShad0w
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

Hyper-convergence systems have taken the IT world by storm and have quickly started to change our point of view of how the data center should and could be architected. In this article, I’ll explain the benefits of employing a hyper-converged system …
Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now