Solved

RAID1 Drive Fail. Replaced Drive. Something's still wrong.

Posted on 2011-02-25
10
652 Views
Last Modified: 2012-05-11
Windows 2003 server with 6 disks setup in RAID1. In the middle of the work-day yesterday it suddenly went down. I looked at the monitor... flashing cursor on black screen.

I rebooted. While booting up the computer displays some info about the RAID volumes and this time it showed a failed drive error or something like that. The failed drive was a member of the boot volume. I decided to replace both drives in the failed array because its always been my understanding that having two identical hard drives is best (and they're small SATA drives so cheap). So i popped in both drives, went to the RAID manager and setup a new volume in RAID1 using the two blank disks. Then, using a separate computer, cloned the original, working drive to one of the blanks that I had just set up in RAID1. After cloning, I popped the new disk back in the server (with it's mate still attached as well). Computer boots, passes the RAID status check but then just gets "Error Loading Operating System". I wasted 2 hours trying to get to recovery console so I could run FIXMBR, then I finally gave up on that and decided to try booting up with all but one boot drive unplugged.

Worked... got into Windows. Upon logging in I get a few errors about corrupt log files (random log files too, like one that is resident to the OS and one that is from a phone monitoring program we have installed). It suggested I run chkdsk and I was nervous so I obeyed. Ran chkdsk /r, rebooted, made sure disk check started and then left for the night.

I came back in this morning and the server was once again stuck at a black screen with blinking cursor. Rebooted and watched for RAID errors. Boot volume was failing (not just one disk but the volume itself) went into the config menu, was prompted to fix error... fixed error... Rebooted. Got into Windows.

I installed a windows based "Intel Matrix Storage Console" so I could get some info. See attached...

Also, no idea why there's suddenly a missing drive????? That should be totally unrelated. Yes, I already made sure it's connected.

Coworkers are here so I can't really work on it until tomorrow but in the meantime I'd LOVE some help. I truly apologize for length and sloppiness of this post. I'm not sure exactly what details are most important, so I gave them all.

Thanks a million!!
imsc.JPG
0
Comment
Question by:jpfulton
10 Comments
 
LVL 47

Expert Comment

by:dlethe
ID: 34979818
you screwed up.  You don't clone RAID drives because there is metadata, which is going to include information about all the members of array, boot order, serial number and so on.   So the metadata has the serial # of the disk you got rid of.

No elegant way to repair this other than to use a scratch drive to strip off metadata, boot from a non-RAID clone, backup, build the RAID, restore onto the raid.  
0
 
LVL 2

Expert Comment

by:BITCooler
ID: 34980147
dlethe is right.

When the 2nd disk of your RAID 1 mirror failed, you should have replaced only the failed drive to see if rebuild would have a) completed successfully, and b) solved your initial issue.

Sorry for your misfortune.
0
 

Author Comment

by:jpfulton
ID: 34980175
Is there no way for me to go back to the original 1st disk of the mirror and then rebuild on to a 2nd hard drive? It seems like the metadata on that 1st drive would still all be good, no?
0
 

Expert Comment

by:fabrimago
ID: 34980608
Hi jpfulton, you effectively did some mistakes following a wrong drive failure procedure.

By the way, since you got the operating system still alive you should have two paths to reach your goal. It should be very useful for all of us to correctly know the topology of your system. That's due to better provide help on your specific case.
I can see on picture you got three arrays of discs, each one composed with two phisycal disks.

Array_000 : The ROOT volume show a failed hard drive (failed but present), both drives are manufactured SEAGATE. I suppose this is your faulty disk.
Array_001 : The storageRAID volume show a missing disk, should be a western digital like the twin drive on Port 1
Array_002 : actually working, I suppose it provides extra storage as described

The drive you highlighted is failing but present.

Can you please let me know the correct topology of your system, confirming and detailing my message ? Consider that  your second SEAGATE drive (port 2) is not actually a member of your ROOT volume, since I read you never reconstruct the array.

Regarding your metadata for that volume, it's corrupted due to different physical drives.

Try to proceed booting with only the working disk inserted for Array_000, check integrity of your array in your intel storage console, (secondary disk should be "not present"), then hot-add your secondary disk, try reconstruct the array, this will also update metadata for that volume.

0
 

Author Comment

by:jpfulton
ID: 34980740
Topology:
Array_0000 - Windows knows this as drive C. this is where windows installed. The disk marked Port 0 is the one I cloned to. The one marked Port 2 is essentially empty... That's what I want to rebuild to. Both of the disks that are now in the ROOT volume are brand new. The old disks are sitting on my desk.

Array_0001 - Windows knows this as drive R. this is a storage drive only. I have no idea why it says missing hard drive. I just grabbed the serial for it and I'm checking into it as we speak. I had no problems with this at all yesterday. It's somewhat possible I accidentally unplugged but I already checked for that once.

Array_0002 - Windows knows this as drive S. This is extra storage. I do nightly backups of drives C and R... the backups are stored here until they are copied to external storage.

Consider that  your second SEAGATE drive (port 2) is not actually a member of your ROOT volume, since I read you never reconstruct the array.

Maybe I'm misunderstanding this, but I actually DID reconstruct the array with the brand new drives... however then I cloned onto one of them using an image of the old drive... perhaps that completely negates the reconstructed array and maybe that's what you mean?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 47

Expert Comment

by:dlethe
ID: 34980845
You reconstructed a range of physical blocks, and just got REALLY lucky that the intel controller is pretty stupid. Other controllers would have barfed all over what you did and never would have booted in the first place.

So if you have 50  blocks of metadata, and disk is 1000 blocks total (just to use easy numbers to illustrate) ... then your boot block is at physical block #50, and the partitioning thinks the disk is 950 blocks total, and highest block # is 949 (everything starts at zero)

You need to shift it over 50 blocks to the left, and you will end up, in this case with 50 blocks at end of the drive.  

See , it is non trivial unless you can purchase something like runtime.orgs raid reconstructor, or use LINUX + dd and do a raw bit copy with offset.




0
 

Author Comment

by:jpfulton
ID: 34980987
dlethe, thanks for the reply. I read through and I'm going to understand but first let me mention this... not sure if this is relevant or if I've made progress... see attachment...

I right clicked the drive in Array_0000 Port 2 and selected "Mark As Normal"... it gave me an option to rebuild. I clicked Okay. So it's rebuilding right now... 56% complete, 36 minutes left.

As far as Port 3 in Array_0001, I located the drive, unplugged it, pulled it out, plugged it in to new sata power connector and re-plugged.

So, am I good now probably?
imsc-1.JPG
0
 
LVL 47

Expert Comment

by:dlethe
ID: 34983289
You very well could be.  The firmware is overwriting that disk and has written correct metadata on it.  (presumably).  
Now there is no way of knowing that this is correct, and it is likely that it is ... but if this was my computer, I would take the opportunity to do a full backup while it is still online.   Then once you have done the backup and it has completed the rebuild, shut it down
 - make sure both disks are set up in boot path properly in BIOS. The current disk you are booted is primary, the one it rebuilt is secondary.
 - fix if not correct, then boot.  Then run the intel program and make sure it doesn't see any problems.

If it comes up and sees no problems, then it would be prudent to yank the primary disk while it is hot and make sure that you lose no data and the system stays online.  If you took a full backup and know how to do a restore, and know for a fact that you do know how to do a restore, then you can safely test.  If you have never tested a restore then I would not risk it, and accept the fact that there is a small possibility that my RAID can't handle a drive failure and plan a weekend testing sometime in the future.
0
 

Accepted Solution

by:
jpfulton earned 0 total points
ID: 35008059
I got it. I pretty much did exactly what I already discussed in this thread. Once everything appeared right and without error I tested all of the drives by pulling data cables one at a time while the server was running. Everything is A-Ok! According to everyone here it looks like I got lucky.
0
 

Author Closing Comment

by:jpfulton
ID: 35045506
Sorry... no points awarded because I didn't get any info here that I needed short of "you screwed up"
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

Lets start to have a small explanation what is VAAI(vStorage API for Array Integration ) and what are the benefits using it. VAAI is an API framework in VMware that enable some Storage tasks. It first presented in ESXi 4.1, but only after 5.x sup…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now