Foreign Disks RAID 5 recovery?

Need some super duper help.  Clients server seems to be going through hard drives at an alarming rate.  HDs go down or offline, and then when critically it goes down, I have to recover.  I replaced one of the 5 raid hds as a global spare, however, the system isn't absorbing it into the raid array  Disk group 1, raid 1, 2 disks, 0:0 and 0:1 are doing fine.  The remaining 5 are an issue.  When one of the hard drives in the raid5 failed, I recommended to the client they get a new drive.  We have a new one, however, 2 drives now went down.  I forced the last one to fail back online and restarted the server, thinking the global spare would get absorbed into the array.  For a good while, the hard drives looked like they were rebuilding and the one I forced back online stayed online.. until it failed.  Whats funny is I could see the files and folders, move to different folder but when  tried to fix the exchange DB (there are 3) eseutil was stating the files weren't there.  I could see them, but couldn't copy, open or do anything with them.  Looking at the server, there were 2 amber lights out of the 5 which composed of the raid 5 array.  Knowing I couldn't really do anything else, and that includes make a backup of anything there (I have a good backup minus the exch dbs) I had to restart.  Now, upon restarting and going into the perc controller, 2 drives are "foreign" 2 are OK, and 1 is the global spare (STILL!!!).  Why didn't it get put into the raid 5 array when the system was restated as "degraded"???  
 I don't know, however, now I am stuck with 2 online, 2 missing (turned into foreign) and 1 global spare STILL for the array I need to get at.
  What do I do? I read something about clearing the "foreign" but was looking to salvage the array and the mailbox dbs prior to wiping the array and making another one.  I truly was under the impression I swapped the drives correctly and didn't move any around to different slots, but who knows.

 Any thoughts?  Thanks in advance!


 Lorenzo
LVL 1
Lorenzo CricchioPresidentAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Mal OsborneAlpha GeekCommented:
Bit difficult to answer without specific, but maybe the following will help.

1. Power. Sometimes a marginal PSU will cause drives to be marked as bad sporadically.
2. Cooling. Check you have adequate airflow, and the drives are not overly hot.
3. Cabling. If this is SCSI, check cable type, length, and termination. Using the wrong type of  cable or incorrect termination can do this.
4. Drive firmware. Some drives have firmware issues that cause sporadic failure.
Lorenzo CricchioPresidentAuthor Commented:
I can appreciate that going forward, however, will that help in my current situation as described above?
Chris HInfrastructure ManagerCommented:
There is an option for rebuild automatically on your controller and it's probably turned off.  You need to reboot or access the lights-out console or remote administration card in order to issue a rebuild.  However, you are not degraded in a RAID-4 at 4-2=2 disks healthy.  You're at a rebuild state.  The drive must be rebuilt with parity and then a hotspare can be added to re-add a parity stripe.  2 drives hosed means the OS will not see the array as a volume.  What's worse, if your controller presented the degraded array writeable to the OS, automatic disk repair features may have nuked 1/3 of the file table.

I'd recommend, if you have a backup, nuke the two drives and build a RAID-1 if they'll suffice with the size and restore your data onto that.  If not, I'd recommend using a product like R-studio to take a baremetal image of each drive and rebuild the array logically, avoiding any more mechanical trauma.
SolarWinds® VoIP and Network Quality Manager(VNQM)

WAN and VoIP monitoring tools that can help with troubleshooting via an intuitive web interface. Review quality of service data, including jitter, latency, packet loss, and MOS. Troubleshoot call performance and correlate call issues with WAN performance for Cisco and Avaya calls

Lorenzo CricchioPresidentAuthor Commented:
Well.. as I was researching, the recommendation was to configure or import the "foreign" disks.. And I did just that.  The array came back up as degraded.. And while its up, I am getting the exchange edb files onto a usb where I can have them just in case, and work on them also.  needless to say, the copy is taking a long time as they total over 100gb.  One of the edb files is undergoing a eseutil and is at 25%.. the others I think I will have to start when its done.  If I get the last EDB file over, I will use eseutil to bring back the other edb file as some of the log files were deleted when I forced one of the hard drives back online the first time around.  I know the client has new HDs coming in, they just cant get here quick enough..
the Openmanange server program wont start as I imagined it was damaged also upon the diskcheck, so I am unsure if a simple reinstall will take care of that.
rindiCommented:
First of all always make sure you only use disks that are certified for use with your server and RAID controller. It is best to get them directly from the Server manufacturer for that server model, as they will have the proper firmware. Never use consumer grade disks but rather enterprise class disks which were built for RAID.

RAID 5 is outdated and unreliable. It was only something you would have used 10 or 20 years ago as the disks used to have a low capacity and were expensive, with RAID 5 you were able to maximize the total capacity. Today's disks have a very large capacity so there is no need to maximize the total capacity with as few disks as possible using an unreliable array type. Today you can easily use RAID 1 or 6 and still have plenty of space. Besides that, rebuilding RAID 5 array take a lot of time, and during that time your other disks in the array are at high stress and the risk of other disks failing is very high during the rebuild. This is less of a problem with RAID 6 where you have better redundancy. So when you setup an array, don't choose RAID 5.

RAID of course isn't an excuse to not backup your system. That is still the most important part of any IT setup, backup, backup, backup.
Chris HInfrastructure ManagerCommented:
If that is the case, It sounds like you had a 3+1 RAID-5 and it had already failed over to one of the hot spares.  More than likely, one of the 3 failed and the hotspare failed during the rebuild and that's why you have a two-disk degraded array.
Chris HInfrastructure ManagerCommented:
You don't mention the operating system of the server.  Nonetheless, you just need to re-install on the openmanage agent on the server.  If it's VMWARE, it's going to force a host reboot.
Lorenzo CricchioPresidentAuthor Commented:
Thanks for the input.  OS is WIndows SBS 2008.  

the system came on, when I imported the config to get rid of the "foreign" disks.  Was up and running through the night, and the spare was flashing as much as can be with the other drives which were up in the system, so I am assuming it was brought into the raid5 array.  Just this am when I checked on it, the array went down with the same 2 drives, 03 and 04.  03 failing (flashing green / amber) and 04 flashing fast amber.
 there SHOULD be 3 good drives in the array (it is made up of 4) and it SHOULD be running AOK.  Before I bring it down, why isn't the spare being brought into the mix?  Why is it still a spare?  Why do fools fall in love?  Of course, I am assuming it is still a spare before I bring it down.

As far as the OMSA, does it rely on IIS?  somewhere something isn't running on it bc when I did try to run it and IE says it cannot be displayed.  I am assuming that AD is AOK also.  I can log in, and can use the DNS and DHCP.. its just the data drive that went down.  I am having difficulty with.

 Any ideas?
Chris HInfrastructure ManagerCommented:
No on OMSA iis--Thank God.  It uses java web or something.  The executable service on windows actually hosts the webpage and I think it uses SSL over port 1311.  

You have bad logic written to your RAID and it's not passing consistency.  It's either a double fault or a puncture, but it's quite apparent.  Stop trying to rebuild it...  Kill it and start over.  I'd even push to replace the RAID card, all drives and also make sure to update the server with the latest drivers and firmware from the hardware vendor's repository moving forward.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Lorenzo CricchioPresidentAuthor Commented:
I tried to run OMSA live, but am unsure what to boot to or how to log in.  It asks for a username and password, but don't know what username or password its looking for.  I use the domain\admin user with the correct password and it fails..  The backup isn't as complete as I want it, which is further exacerbated by the client who has countless folders within folders and super long file names which is making it a nightmare to copy over files when the server is up to an alternate location.  The path to some of the files are so long it kills the copy.  and they have several hundred folders.  The backup software wont run bc of some files that were truncated through a chkdsk.. oy..
  Was able to get the spare into the array which then failed the rebuild.    Before trashing the array, I need a good copy.  HDs on the way and have an open slot for a new volume where I can copy the data to with some difficulty.
Chris HInfrastructure ManagerCommented:
Try root and calvin as pw
Lorenzo CricchioPresidentAuthor Commented:
OK.. UPDATE:  Was able to get a replacement drive in there, force the last disk back online and ultimately was able to rebuild the array.. WOO HOO!
  Of course, when doing that, some files were damaged (not the OS) but some installed apps that were on the data drive, like the backup software, etc.
  I have to now recover the EDB.. and that will be my next question....
Marshal HubsEmail ConsultantCommented:
The best possible way to recover data from RAID array is using Stellar Phoenix Windows Data Recovery-Technician. It will rebuild a virtual array even if you don’t remember the RAID parameters. Once the RAID has been rebuilt, you can easily recover all the data. Download the demo version from the official website here:  https://www.stellarinfo.com/windows-raid-recovery.php
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.