SBS 2008 won't fully boot after raid failure, get splash screen, brief black screen with arrow, then reboots

Dell PE T310
PERC S300 (Yes, I know this card sucks)

Came in 2 days ago and the server was down.  Check the drive status - shows 2 of 3 drives in the RAID 5 array are down (Drives 0 and 1).  I Reboot the server, which then shows that 1 drive is out - I go into the BIOS which reports that Drive 0 is "Ready", drive 1 is "Online", and drive 2 is "Spare" (Which it should never have been).  I allow it to continue to boot. I don't change anything.

Windows boots!!  But, after about 5 minutes it crashes and reboots, at which point it performs a disk check and whatnot.  After it cycles through this twice, Windows now refuses to boot all the way.  Now, when it attempts to boot, I get the regular splash screen with the moving bar for about 3 minutes, then I get a black screen with a mouse pointer on it.  The mouse pointer is moveable with the mouse, but after about 1 minute the system crashes and begins a reboot.  There are no error messages or BSOD - just a reboot.

I call Dell to go through all of their troubleshoot steps, which didn't go very far until they said we need 2 new disks and a controller card, and we will need to start from scratch and rebuild from a backup.  I go through the motions to get the new hardware sent out, but I continue to attempt to save this thing.  

Now, I am able to get into the Recovery Console - so this is a good start.  The first thing I did was backup all the data on the server with Robocopy to our external HDD (Our backups hadn't run in a couple of weeks).  I'm able to get all the data, and I also backup the sysvol, exchange database and other items that seem important.  

Next, I attempted to use various tools to fix the installation.  I still haven't changed anything with the drives even though I have the new hardware on site.  My goal at this point is to try to get Windows to boot, and run a system image backup that I can restore to a fresh array.  I first attempt startrep, then I try chkdsk /f, then I sfc in offline mode, and finally a chkdsk /r.  None of these get SBS to boot all the way, and I'm still stuck with the same boot characteristics.

So, I pull out drive 0 and replace it, since that drive was definitely dead.  I bring up the BIOS and it shows that it is a NON-RAID disk with it's own virtual drive on it.  I delete this virtual drive, which changes it's status to READY, at which point I added it as a Global Hot Spare.  Now, the drive status has changed from above to Drive 0 - Spare, 1 - Online, 2 - Spare.  Obviously you can't have a raid 5 set with 2 spares and 1 online drive.  I'm guessing the PERC adapter freaked out when 2 drives failed, and set the remaining good drive to a spare.  I unassigned drive 2 from being a spare, at which point I was pleasantly surprised to see it return to "online" status.  Now it looks like I was getting somewhere.  The status now is: drive 0 - spare, 1 - online, 2 online.  Now I'm very unhappy to see that there is no "rebuild" functionality in the BIOS, but from reading online - the array should start rebuilding once Windows has booted and the software raid can do it's thing.  This is a downer because I still can't get into Windows (Still the same characteristics).  HOWEVER - when Windows is booting, I see it accessing all three drives, even the new one with nothing on it - which leads me to believe that the Windows boot is getting far enough to get the array to automatically start rebuilding.  But - then it crashes and reboots.

Discouraged, I next boot into the recovery environment again.  I notice that even while in the recovery environment it appears that the array is rebuilding from the light activity on the front of the drives.  I decide to call it a night and let it do what it's doing....  This may be good!!  

I wake up and I see that it appears the rebuild is done, but now drive 1 is blinking orange.  I attempt to reboot and stop in the RAID bios screen to see what's up - It shows that 1 of my virtual drive (not C: though) is rebuilt to normal, but my C: drive is still degraded.  I don't change anything and I allow it to continue booting,  but again - same characteristics.  During boot time drive 1 turns green and it appears that most of the boot is occurring from that drive from the lights, and midway through the boot that drive turns orange again.  The system continues to have the same boot characteristics.  I decide to pull disk 1 from the array and reboot.  This time it boots partially and then BSOD's.

I place disk 1 back in the array and am back to the same situation.

1.  Has anyone had this type of situation happen, and were you able to get out of it without rebuilding from scratch?

2.  I am able to back up data in the recovery window.  To help aid the rebuild of SBS if I can't get out of this, what should all be backed up to aid in a successful rebuilt?  So far I have backed up all user data, Exchange Database, the Exchange folder from C:\program files, sysvol, and inetpub.  

Thanks for reading this whole thing and I hope it makes sense.
aliveone76Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Seth SimmonsSr. Systems AdministratorCommented:
i don't think you will be able to manage to recover successfully from this
having a double fault in a raid 5 is pretty much the end of it - been there, done that
had this happen several years ago on a PE 2650 that was a domain controller and wasn't pretty.  after getting the drives replaced, rebuilt the domain controller (i had others in the environment to replicate from)

in your case, copying data to an external drive is good for some data; though the flat exchange files probably won't do much of any good.  is this the only server or is there another domain controller?  if not, do you have a system state backup?  if you don't then you will have user/computer issues if you can't restore AD.

if you can do it, i'd get a 4th drive and build as raid 6.  i typically build bare metal domain controllers only with raid 1 but if this is the only domain controller then you would want the additional redundancy - else it becomes a single point of failure
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Nick RhodeIT DirectorCommented:
I had to go out to a company that had the same issue.  Problem was the 1 good drive they thought they had actually had a defect and would sync with another disc to rebuild, fail, rebuild, fail, over and over.  Much like your situation.  They had no image backups and tapes failed for 4 months.  I had to restore an image I took of the company 6 months ago (I worked for a consulting company and they no longer need our service and downsized to save money).  In otherwords you would most likely have to start from scratch, dump your data back in, join the systems again and so forth.  I tried a few recovery programs with little success and we did managed to get the OS to boot but the data was mostly corrupted because of the failure in the raid.

From what you described above it might me more that just a simple disk failure, seems like you have trouble with the raid controller.  Its a bad spot to be and hopefully you have a reliable backup or someway to do a baremetal restore of your existing backup (if you did an image).  I hope it all works out for you.
0
aliveone76Author Commented:
I do have system state backups from June.  Would I be able to recover AD from those?  I can rebuild exchange from OST's on the computers.  I'm not worried about Exchange.

I just want to save AD and not have to rebuild the environment.  However, I feel like there's a timestamp issue with recovering AD from that long ago...  no?

Thanks for the input!!
0
aliveone76Author Commented:
Also - this is SBS 2008, and my client only has the 1 server.
0
Nick RhodeIT DirectorCommented:
SBS is alot more picky and finiky of an OS vs just a standard 2008 OS.  Even if you did get it up and running you might be in for quite the ride.  This is just in theory and at this point you are looking at time.  Do you gamble more time in the attempt of a recovery, or do you utilize that time to get the organization back up and running.

How many systems/users are in your environment?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2008

From novice to tech pro — start learning today.