Link to home
Start Free TrialLog in
Avatar of Lorenzo Cricchio
Lorenzo CricchioFlag for United States of America

asked on

Foreign Disks RAID 5 recovery?

Need some super duper help.  Clients server seems to be going through hard drives at an alarming rate.  HDs go down or offline, and then when critically it goes down, I have to recover.  I replaced one of the 5 raid hds as a global spare, however, the system isn't absorbing it into the raid array  Disk group 1, raid 1, 2 disks, 0:0 and 0:1 are doing fine.  The remaining 5 are an issue.  When one of the hard drives in the raid5 failed, I recommended to the client they get a new drive.  We have a new one, however, 2 drives now went down.  I forced the last one to fail back online and restarted the server, thinking the global spare would get absorbed into the array.  For a good while, the hard drives looked like they were rebuilding and the one I forced back online stayed online.. until it failed.  Whats funny is I could see the files and folders, move to different folder but when  tried to fix the exchange DB (there are 3) eseutil was stating the files weren't there.  I could see them, but couldn't copy, open or do anything with them.  Looking at the server, there were 2 amber lights out of the 5 which composed of the raid 5 array.  Knowing I couldn't really do anything else, and that includes make a backup of anything there (I have a good backup minus the exch dbs) I had to restart.  Now, upon restarting and going into the perc controller, 2 drives are "foreign" 2 are OK, and 1 is the global spare (STILL!!!).  Why didn't it get put into the raid 5 array when the system was restated as "degraded"???  
 I don't know, however, now I am stuck with 2 online, 2 missing (turned into foreign) and 1 global spare STILL for the array I need to get at.
  What do I do? I read something about clearing the "foreign" but was looking to salvage the array and the mailbox dbs prior to wiping the array and making another one.  I truly was under the impression I swapped the drives correctly and didn't move any around to different slots, but who knows.

 Any thoughts?  Thanks in advance!


 Lorenzo
Avatar of Mal Osborne
Mal Osborne
Flag of Australia image

Bit difficult to answer without specific, but maybe the following will help.

1. Power. Sometimes a marginal PSU will cause drives to be marked as bad sporadically.
2. Cooling. Check you have adequate airflow, and the drives are not overly hot.
3. Cabling. If this is SCSI, check cable type, length, and termination. Using the wrong type of  cable or incorrect termination can do this.
4. Drive firmware. Some drives have firmware issues that cause sporadic failure.
Avatar of Lorenzo Cricchio

ASKER

I can appreciate that going forward, however, will that help in my current situation as described above?
There is an option for rebuild automatically on your controller and it's probably turned off.  You need to reboot or access the lights-out console or remote administration card in order to issue a rebuild.  However, you are not degraded in a RAID-4 at 4-2=2 disks healthy.  You're at a rebuild state.  The drive must be rebuilt with parity and then a hotspare can be added to re-add a parity stripe.  2 drives hosed means the OS will not see the array as a volume.  What's worse, if your controller presented the degraded array writeable to the OS, automatic disk repair features may have nuked 1/3 of the file table.

I'd recommend, if you have a backup, nuke the two drives and build a RAID-1 if they'll suffice with the size and restore your data onto that.  If not, I'd recommend using a product like R-studio to take a baremetal image of each drive and rebuild the array logically, avoiding any more mechanical trauma.
Well.. as I was researching, the recommendation was to configure or import the "foreign" disks.. And I did just that.  The array came back up as degraded.. And while its up, I am getting the exchange edb files onto a usb where I can have them just in case, and work on them also.  needless to say, the copy is taking a long time as they total over 100gb.  One of the edb files is undergoing a eseutil and is at 25%.. the others I think I will have to start when its done.  If I get the last EDB file over, I will use eseutil to bring back the other edb file as some of the log files were deleted when I forced one of the hard drives back online the first time around.  I know the client has new HDs coming in, they just cant get here quick enough..
the Openmanange server program wont start as I imagined it was damaged also upon the diskcheck, so I am unsure if a simple reinstall will take care of that.
SOLUTION
Avatar of rindi
rindi
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You don't mention the operating system of the server.  Nonetheless, you just need to re-install on the openmanage agent on the server.  If it's VMWARE, it's going to force a host reboot.
Thanks for the input.  OS is WIndows SBS 2008.  

the system came on, when I imported the config to get rid of the "foreign" disks.  Was up and running through the night, and the spare was flashing as much as can be with the other drives which were up in the system, so I am assuming it was brought into the raid5 array.  Just this am when I checked on it, the array went down with the same 2 drives, 03 and 04.  03 failing (flashing green / amber) and 04 flashing fast amber.
 there SHOULD be 3 good drives in the array (it is made up of 4) and it SHOULD be running AOK.  Before I bring it down, why isn't the spare being brought into the mix?  Why is it still a spare?  Why do fools fall in love?  Of course, I am assuming it is still a spare before I bring it down.

As far as the OMSA, does it rely on IIS?  somewhere something isn't running on it bc when I did try to run it and IE says it cannot be displayed.  I am assuming that AD is AOK also.  I can log in, and can use the DNS and DHCP.. its just the data drive that went down.  I am having difficulty with.

 Any ideas?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I tried to run OMSA live, but am unsure what to boot to or how to log in.  It asks for a username and password, but don't know what username or password its looking for.  I use the domain\admin user with the correct password and it fails..  The backup isn't as complete as I want it, which is further exacerbated by the client who has countless folders within folders and super long file names which is making it a nightmare to copy over files when the server is up to an alternate location.  The path to some of the files are so long it kills the copy.  and they have several hundred folders.  The backup software wont run bc of some files that were truncated through a chkdsk.. oy..
  Was able to get the spare into the array which then failed the rebuild.    Before trashing the array, I need a good copy.  HDs on the way and have an open slot for a new volume where I can copy the data to with some difficulty.
Try root and calvin as pw
OK.. UPDATE:  Was able to get a replacement drive in there, force the last disk back online and ultimately was able to rebuild the array.. WOO HOO!
  Of course, when doing that, some files were damaged (not the OS) but some installed apps that were on the data drive, like the backup software, etc.
  I have to now recover the EDB.. and that will be my next question....
The best possible way to recover data from RAID array is using Stellar Phoenix Windows Data Recovery-Technician. It will rebuild a virtual array even if you don’t remember the RAID parameters. Once the RAID has been rebuilt, you can easily recover all the data. Download the demo version from the official website here:  https://www.stellarinfo.com/windows-raid-recovery.php