Link to home
Start Free TrialLog in
Avatar of Reginald Meyer
Reginald MeyerFlag for United States of America

asked on

Recovering data on a Raid 5 after OS drive failure

I have an older server Dell Power Vault NF 500.  It was running Windows 2003 storage server.  A week back the drive holding the OS failed.  There are two drives on one controller (C: drive)  and 3 drives on another controller.  The data is held on an array (Raid5) with three drives.  I simply want to get the data off the array (Raid 5) drive and onto a backup disk. I am guessing the two drives holding the OS were a RAID 0. How do I go about getting the server back up and not risk the drive with the data?  I can stick another drive in there and install an OS on the drive but will it see the data drive after?   Will the install leave the RAID 5 alone?

I have never had to do this in the past and I cannot afford to make a mistake.  I would love to get some advice and queue up a conversation with someone who has experienced this before.

Reggie
Avatar of Wakeup
Wakeup
Flag of United States of America image

I am assuming that the Raid 5 Array is as you say on another controller?
So assuming that the Raid 5 array didn't go bad or didn't fail, the Raid should still be intact.
I believe installing the OS on the C: drive (assuming the drives are good) would be fine.  I also would hesitate to run Raid 0 on a Server OS.  It might give you a bit of speed and more space, but if (as you can see) one drive fails, you will have to reinstall the OS and rebuild.  If you run a Raid 1, if one of the drives fails, you should be able to continue as you were, and then rebuild to a new drive.

Anyway once you install the new OS, as long as you do not tell it to overwrite/format/or change anything with the Raid 5, you should be completely fine.
Avatar of Reginald Meyer

ASKER

Wakeup,

Thanks, do you think it matters what OS I use?


I will put the disk in and let you know what I run into.


I am guessing about the Raid 0 since the server has been around a while and that both drives seem to have failed at the same time.
Avatar of Member_2_231077
Member_2_231077

NF500 only has one controller and the two OS drives were mirrored when Dell installed the OS AFAIK. It's only a PowerEdge 2950 with slightly tweaked BIOS to display a different logo during boot.

As it's only DR you can put one drive in it in place of the two failed OS ones, configure the drive during POST under the PERC utility [ctrl-R] as a single drive volume (RAID0) then install the OS on it. You will probably have to delete the old array in the PERC utility first - don't delete the wrong one.

When installing Windows you must not install it on your data drives but you should be able to tell which is which because of the different sizes.

Alternatively you can remove the current drives, install the OS onto a new drive and then plug the old ones back in and import the foreign config but I prefer the first method unless you have another PowerEdge with PERC in it that you want to move the drives to.

Can you confirm the config via the CTRL-R utility? Pretty sure most NF500s had a PERC/5i in them.
Avatar of noci
The OS and really vital data should be on RAID1 (mirrored drives).
Raid-0 (strip without parity is only a speed option for data that can be lost without incurring cost.).
Super computing datacenters use that option to get massive data streams. Any other use is ill-advised.
Thanks for the comments.  Just realizing my spare drives will not work in this server.   At least the power pins are different.  

Question for AnyAlder, are  you saying I could pull the 3 drives out of the older server and put them into another server that has the same controller?  I have a 2950 and 2850 moth balled in racks right now.  I will confirm the controller.
I believe that the drives can be pulled out and inserted into a different server that is potentially still up and running.  You may have to reconfigure the RAID controller on the spare 2950/2850 respectively to see the RAID 5 from the 3 drives.  I am not sure if it makes any difference if you plug the drives into the RAID controller in, in the same manner as the troubled NF500.  But also if you start running into trouble, if the RAID Controller wants to reconfigure the drives and reformat or reinitialize the drives, you may want to be weary as that could potentially ruin your attempt at recovery.  You obviously don't want to format.  You just want to let the RAID Controller know that you have a RAID 5 and be configured for that.  I am sure someone will chime in shortly.  I haven't messed with the RAID controller on these units before.  And to be honest it has been a while since I have messed with RAID in general.  I am going from my memory and I am getting old! :)
You will have to verify the RAID controller in this NF500 and the other two PowerEdges, 2950/NF500 could have PERC5/i or PERC6/i but they could also have the non-RAID SAS/5i, there were several generations of 2950, I think the NF500 was a 2950III but it's easier for you to boot into BIOS than it is for me to check. If you have the serial numbers you can enter them all into Dell support site to get a bill of materials.

When a new disk is introduced to a Dell PERC it reads the metadata (which is data stored on the disk defining its role in an array) and gives you two options, delete or import the "foreign" disks. If you import all 3 disks (or even just two of them but not advisable) it merges the configuration so both arrays are seen. The boot volume may get set to the wrong one by default but that is changeable. If possible you should use the same slots but drive roaming is allowed,

Can you do that live? Yes in theory but I wouldn't gamble.

First though you need to boot them up and see what controllers you have and post it here, old controllers don't necessarily import from newer ones since they can't cope if the metadata has features that were not around when they were made.
You can do it the other way around as well, delete the faulty boot volume on the NF500, shove a working disk from a PoweRdge/PERC with Windows installed into the NF500, boot up and import at the PERC BIOS error message and it will probably boot up and the RAID 5 will show up as well. Just don't delete the foreign config or you'll be paying for data recovery.
Boot the system using liveCD or windows 2008, make sure you have PERC 6i or H700 raid controller on USB and make sure you also attach a USB drive to which you want to copy the data.

enter the repair option, windows CMD and...

If you've not anything to the drive, during bootup, enter the raid controller and look at the log, to see which drive was kicked out and why, then you can try forcing the drive back online.  If it was merely kicked because it did not respond in a timely fashion, forcing it back online could get the system to boot. Unless as you noted the drive actually is dead.
Hi,

If you just simply wants to recover the data from your RAID 5 without taking any risk of losing data. I suggest you to use RAID Recovery software from Stellar. The software recovers the complete data from RAID 0, 5 & 6. You can download the demo version of the software from here
Everyone,

Sorry for the delay.  I finally got a replacement drive and I will be putting it in today and installing the OS on the drive.  I am planning to simply pull out the 3 drives in the array while I install the OS.  Once the server is back up I am going to install the Steller software.  I will give an update later today.
You have to be careful with the outlined process, the RAID config is both on the DISKS and no the controller.

Personally, I would not be pulling the 3 raid 5. Have you had a chance to boot the system and copy/backup the data on this RAID 5 volume? (this is the first thing that should be done to preserve.......)

but would make sure that during the OS install, to carefully...

Deals with if you pull the drives not remove completely (while the system is off!!!!!) then clear the config on the controller. add the new drive install the OS, you would need to be in the RAID controller bios following bootup to import the RAID volume back.  This process is where danger lies.

Leaving them in place adding the new drive getting into the controller ctrl-r ..ctrl-m depending on your situation, then making sure that the Single Drive you added is the primary from which the system will boot.

..then cautiously ....
once the server is back up I am going to install the Steller software.  I will give an update later today.>>>>>>>>>>

Reginald Meyer, You can check this User Guide in case if you face any issue in using the software or just comment or DM me.


Thanks!!
You cannot use destriping software on a RAID controller, it needs a non-RAID HBA to read each individual disk. If you do not feel safe importing the array to a different controller then boot with a Linux CD as Arnold suggested.
Andyalder and experts, so I pulled the Raid 5 disks out and put in replaced the old disk and installed the OS and we are up and running.  Do you do you not think I can simply push the RAID 5 disks back in and start the server?
That's what I would do, just make sure you import them if the server says they are foreign disks because if you erase the metadata you will need de-striping software.
NO, you should boot the system into the controller bios. and then insert the RAID 5 drive and import the config from the drive. if you plug them in while the system is running, they will be seen having a foreign config. it is more risky to .. the issue is that you have a drive on the controller that is a stand alone ......

risks are abound .......

main part you hope for is that the import of the config from the RAID 5 will reestablish the RAID 5 (make sure they are placed in the same slots from which they were removed) and hope it does not alter the ability of the system to boot from the other drive.....
Arnold can you explain this "boot the system into the controller bios"  Is this an option as the server boots or something I have to do while in Windows?
I think Arnold misread what you said and thinks you were going to insert them live. Putting them in with the server off and then powering it on will give you a prompt to import or erase the foreign disks.
AndyAlder, do you believe I will get the prompt or have to go into a menu?
Andy, I do not believe I misread the drives can be inserted while the system is powered on. if it is inserted while the system is in the OS as you noted, the drives will be seen as foreign as their config does not match the configuration on the controller.

I do not trust the prompt as the option is continue or go to setup. Have myself in a rush made the wrong choice long time ago.
So in these cases during the system's bootup, ctrl-r, CTRL-M to get into the raid controller config.

while in the controller BIOS/settings you can then do sevaral things including inserting the three RAID 5 drives and telling the controller to import/read in the config from the drives....
User generated image
This is the screen I get.  Something seems missing, but are you saying at this point push the drives back in?
Should there not be two controllers here?
It would be very strange to have two controllers.

It's your call on whether to put the disks in hot or cold as Albert and I disagree on that one. Either works, you can even put them in with the OS running and import them using OMSA.
Killing me smalls!  Ok I am going to bring down the server and put them in and then bring them back up.  Should I go into the config when I reboot?
If I loose this data I am going to be in trouble!
I appreciate the help guys.
Put the disks back in and booted up and the utility looked exactly the same so I let it boot into the OS.  It does not see the drive of course.  

So back in the config there is an import option under Controller 0. If you choose import I get the screen below.
User generated image
Check all the disks are there in the right hand pane of the screen behind the yes/no box (so press no for now), if they are all there then repeat and press yes.
So this is what I want to see right?  Hitting yes will not erase the data correct?

User generated image
All 3 are there so hitting "yes" will import the disks and preserve the data on them.
Got an error 1 out of 1 failed to import. Any idea?
If you can't import them then maybe it is time to use a software de-striper, but you need a non-RAID HBA for that.

If you have time to experiment then remove all the drives, boot to the bios RAID utility and erase the controller, then plug these 3 in on their own and the controller ought to automatically import them but the controller has to be the same as the one they were on before, preferably the same version of firmware or a later version is normally OK.
So pull out the drive I have the OS on and then pull the 3 drives I am hoping data exist on and do the above then push back in all 4 drives?  

This server had 5 drives on it.  Is there a chance that the 5 drives were one array and not what I thought originally with the first 2 RAID 1 (Mirror) .  The first two drives 0 and 1 were both blinking red when I discovered the server was down.  2,3,4 are all green.  If two drives failed on a RAID 5 I am dead in the water, right?

I software de-striper is Steller?
Stellar or RAID Reconstructor, there are probably a few others too. All work fairly similarly, take a raw image of each drive, make educated guesses as to how the data is laid out on them (drive order and stripe element size) then check if there's valid data and if not they make another educated guess. But you have to image the drives first and to read them you need a non-RAID SAS host bus adapter.
Before proceeding to answer your question.
When getting to a server that does not boot and has drives blinking red.
What I do is get into the RAID controller (ctrl-R, CTRL-M) .,.,,

Once there, you can access the log to see what happened.
the drive could have been kicked out for delayed response. in which case the last drive that was kicked out from the ARRAY that is preventing the bootup can be forced back online to get to a point just before the failure. once that is done, the other drive can be replaced in hopes that the rebuild of the array will complete before the drive dies.
This will get the system back up and operational.
The other issue is that the last rejected drive died while the prior one when kicked out went unnoticed



Now back to your question, You do not touch the existing functional drive. it has the configuration that the controller currently has which is a single volume/single drive.
While in the controller setup settings, you need to import the RAID 5 configuration from the three drives.
The process is to insert at least two of the drives that made up the RAID 5 (inserting all will avoid the impact of the rebuild of the raid 5 when the third drive is inserted) and then telling the controller to import the config from the drives.
at the completion provided the volume names of the existing single drive and the raid 5 you are adding do not use the same name, the controller at the conclusion will indicate that it now has two volumes ,.,,,,
once it does that, make sure the original single drive volume is still reflected as the boot volume, and you should be able to boot the system and access the data on the RAID 5. volume provided it was not restricted/encrypted in the prior setup.
Arnold, thank you for that information.  Currently I have a new drive in the server with OS installed on that drive (drive 0).  I had taken out the 3 drives I believed were part of the array to do the install.  


I am trying to figure out what to do next from your reply.  As you can see from above responses I can see the 3 drives I believe are part of the array in the configuration utility as Foreign but I cannot import them. I am worried the error is caused by the fact that the 3 drives are not the complete part of the array.

I can put all 5 drives back in, I imagine, if that is what you are suggesting.  

All I need is to recover the data off the RAID5 drives.  Unless I am totally wrong about the initial configuration.  I cannot risk losing the data.

I am now considering sending them to a recovery service.
All I need is to recover the data off the RAID5 drives.  Unless I am totally wrong about the initial configuration.  I cannot risk losing the data.

I am now considering sending them to a recovery service.
>>>>>>>>


As I have suggested above you can recover data off the RAID 5 drives with Stellar Phoenix Windows Data Recovery- Technician. In case you are worried how much data the software will recover. So I would like to inform you that the  software demo version provides the preview windows. It will show all the data that software can recover. And if you will be satisfied with the recovery results you can go for the full version
The issue is that your RAID volume was not faulty, the RAID was optimal, the options provided were to boot off a OS disk or Linux disk where the RAID volume is accessible and copy the data off.
The other was to add a drive in the system. boot with OS install and make sure to install the OS on the new single drive volume, but you went to the pull the existing, and ....

go into the raid controller while the tree drives are not plugged in. What is the config reflect, does it list the prior RAID 5 with a failed components?

technically, sending it to a recovery is unecessary as long as you have a records in which slots these drives were.
UNDER NO CIRCUMSTANCES APPROVE if PROMPTED TO INITIALIZE THE ARRAY: DECLINE DECLINE DECLINE.

one option, boot the system into the raid controller, Pull the os drive (means open it so it detaches from the controller). Make sure no drive is illunicated/powered/connected,  clear the raid controller config.
insert the raid 5 volume drives. and have the config imported from the drives. see if it now reflects the RAID 5 in optimal. If you have a destination to which you can copy the data. Booting the system using a CD/DVD and copying the data off..

The other option/step
then insert the OS drive and import its config  (Make sure the single drive VOlume is the designated boot logical volume)
now you have two logical volumes in optimal state
booting the system into your installed os and you will now see the RAID 5 volume/data presented and accessible within the OS.

(if for some reason a process similar to yours was undertaken and now I am tasked with completing it, I would try to get the data off between the two steps using clonezilla or any other Linux OS/LiveCd before trying to get the system back up with the import of the config from the two sets of drives; better safe than sorry)
Arnold,  I took out all the drives and cleared the configuration.  So it now says "no configuration" and I then inserted back in just the 3 drives.  It clearly sees the 3 Seagate drives and labels them as 'Foreign' but when I go to F2 on the controller and then down to 'Foreign drives" and import I get an error immediately saying failed.  Any more thoughts?  Thanks for the help.
They have to be reinserted in the same slots that they were before you pulled them.
I.e. DriveA was in slot 2
DriveB was in Slot3
DriveC was in slot5

if you took them out without noting which is which.

3 drives, 3 slots
I think you have the following options:
Label each drive
DriveA, DriveB, DriveC
Slotposition2 Slotposition3 Slotposition4
DriveA          DriveB                  DriveC
DriveA          DriveC                  DriveB
DriveB          DriveA                  DriveC
DriveB          DriveC                  DriveA
DriveC          DriveA                  DriveB
DriveC          DriveB                  DriveA

you should be in the logical volume view to see whether it can see the volume it now sees as foreign.

The config on the drives identifies that they are a RAID and each drive identifies itself and where the others should be


one you have at least two in the correct positions the Volume should be brought back.
try pulling these three, clearing the config, reinsert the OS drive in slot 0, repeat the import config from disk (since it is a single it should bring it in)
So when I put the OS disk back in it sees it just fine even after I had cleared the configuration.  Do you think I may have not actually cleared the configuration?
Also the 3 drives in what I hope was the RAID 5 are in the exact slots they came out of.
When the card is clear it automatically imports the config from the disks without any prompt normally as there is no conflict between card and disk metadata.

Maybe putting all 5 disks back would work (including the faulty OS. Assuming they are all the same type/capacity we don't really know which ones were the OS and which the data as drive roaming is allowed.

A trial version of a software de-striper would at least confirm which disks are which but you'll need an HBA for that.
AnyAlder and Arnold Thanks so much, I have solved the issue.  So I put in all the drives except the drive 0 that was blinking orange.  Drive 1 was blinking Green to orange.  I notice this when I put it back in by itself and realized the drive may not be totally gone. I put the disks 1-4 in together and  I restarted booting to a Linux boot disk and I was able to see the drive!  It was one Raid 5 all 5 disks apparently.  I got super lucky that drive 1 is not yet fully "gone" so it recognized the RAID 5/ Drive.  I am in the midst of backing up my data to an external drive right now.  This experience has me working on my server documentation and my disaster recovery SOP.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial