RAID5 drive failure, rebuilt OK, now windows wont even boot recovery console, hangs examining disk

got a video editing system here which suffered a RAID5 drive failure along with a serious software crash (the avid software hung and completely locked out) at the same time
it's an adaptec 2820SA raid card with 6 SATA disks forming a 5-disk array with a hotspare

i came to the machine in a powered-off state, powered it up and loaded the RAID BIOS, at which point it began rebuilding the array onto the hot spare. after several hours it completed and gave the RAID status as "optimal". i've double and triple checked the RAID config and it's correct - it used to consist of drives 0,1,2,3,4 as the array and 5 as the spare. it now consists of 0,1,5,3,4 as the array, with drive 2 failed.

after the rebuild completed, we rebooted, windows begins to load but after the XP logo appears all disk activity stops. the moving bar underneath the logo keeps moving, but nothing happens. it's been left in this state for about 15 minutes with no progress.

the same is true in safe mode, all (i think) the drivers load, it gets to agp440.sys anyway, and that's the end of all disk activity.

so i've tried to boot to recovery console off a windows CD, given it the RAID driver off a floppy (wont even recognize it without that), and it gets as far as "examining 1525579MB disk 0 at id 0 on bus 4 on aac..." and hangs here

where to go from here ? format/reinstall is not simply a last resort, it is absolutely not an option. there must be a way to get windows to recognize this disk, after all RAID 5 is supposed to protect us against drive failure !
LVL 3
Statick001Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

JasperIAMCommented:
Any Support from your hardware manufacturer?  Thats where i would start.
0
Statick001Author Commented:
unfortunately no. they won't let me through the phone system without a TSID number, which has been temporarily misplaced.

registering on their site with the hardware's serial number will generate a TSID, however their site is terrible, i've been trying to register for over an hour and the pages keep timing out.

their email support similarly demands this stupid number that they are unable to provide me. and the clock is ticking, this system requires 24/7 uptime.
0
Statick001Author Commented:
sorry i didn't make that clear - it's adaptec i'm trying to contact. either their site is having serious difficulties today, or it's just overwhelmingly bad. either way, i cannot get through to support without this number that only their broken website can give me.
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

Statick001Author Commented:
ok got the TSID

i'll post back if they're no help
0
JasperIAMCommented:
Make sure you tell them a critical server is down due to their hardware malfunctioning.  Gripe a little to get some better support.  In a situation like this hammer away at them till resolved.  Then spread the word about their responsiveness.  Amazing how many people read reviews before buying.
0
Statick001Author Commented:
OK i spoke to them and the guy was pretty helpful. i did tell him their support website was almost completely offline... but anyway

as far as he was concerned, that the RAID BIOS was telling me the array is "optimal" and that windows does begin to load says that the data is all intact and everything is as it should be in terms of the RAID. if windows got corrupted due to the crash that happened at the same time, this is a different matter and not one he could help me with (he did suggest phoning microsoft tech support).

however he did also suggest this :
the problem seems to be that windows won't mount the array in it's early stages of booting, and that because i'm stuck behind very simple interfaces (either the choice of booting windows in normal or safe mode, or booting recovery console off a CD) there's little i can do to force the issue. indeed, when i boot recovery console and present it with the drivers on a floppy, it gives me 2 choices of driver, one called something like "adaptec 2820SA SATA2 RAID driver" and the other something much simpler, like "adaptec raid driver". the first choice, being the exact match for our hardware, doesn't install, so we have to use the 2nd choice which apparently is a much more basic driver.

anyway, his idea was to make a parallel installation of windows onto a spare HD, separate from the array, and install the full adaptec drivers and management software there. from there i should be able to mount the array and repair whatever corruption is causing the problem.

so i'm now installing windows onto an old 8gb HD i had lying around, hopefully this will let me mount the drive and repair whatever's gone wrong.

0
Statick001Author Commented:
he did assure me that our 1.4 TB of data would be intact once we'd managed to mount the drive, which is a relief, and that if for some reason it was not that would definitely be a support issue and they would definitely work to resolve it.
0
JasperIAMCommented:
Thats the kind of support we like.  Keep us updated.
0
Statick001Author Commented:
latest :
the adaptec card wont install on a clean windows installation. i suspect the hardware may be faulty. the drivers install, but then the windows "found new hardware" icon in the system tray never goes away. also, attempting to open any further windows (my computer, MMC, etc) fails, although the mouse responds and the start menu still opens. shutting down also seems to be impossible (nothing actually happens), i have to hit the manual reset button.
THEN after doing this, the system won't boot, instead i have to boot to "last known good" which means the drivers aren't installed.

the adaptec site is still shockingly poor, and in emails i was sending/receiving 2 years ago when we tried to contact their support then, i mention to someone THEN that the adaptec support site was continually timing out and not being of any use.

so meanwhile i'm waiting for USA to wake up so i can call their support line. here in the UK its 2:20pm so in 40 mins the line should be open. if their hardware turns out to be faulty, i certainly won't be replacing it with another adaptec card. if they can't be bothered to run their support site properly, they've lost me as a customer for life.

0
Statick001Author Commented:
also - if i try and boot the new parallel installation after failing to install the drivers for the adaptec card, the symptoms are similar to when i try and boot the original RAID array - in safe mode, the list of drivers get to the same place then there's no further disk activity.

i'm thinking ahead now at the implication of replacing the RAID controller with a different make. i know the RAID settings, the stripe size, disk order, etc. if i input the information into a new RAID controller, is it likely to be a smooth transition?

i'm terrified that if something goes wrong there i'll totally bork the RAID and make myself extremely unpopular.
0
Statick001Author Commented:
right, contacted UK support after the site woke up and gave me the number (40 minutes of refreshing)

while the support site has been utterly useless, the telephone support has been exemplary.

by removing the drives, we had no trouble installing the driver and management console for the controller card, so that ruled out any fault there. next step was to assume one of the drives was causing the driver to hang, so we rebooted and tried with each drive in place individually, with no other drives in. unfortunately it booted every time so it didn't appear to be a fault with any of the drives either.

so i removed all the drives from the system and made a nice tower out of them on the bench, with dvd cases inbetween them so the electrics wouldn't short circuit, and re-connected the entire array directly to the controller card, bypassing the backplane.

and the system booted, windows mounted the array, and all the data was as we expected it !

so we have a failed backplane. and again, adaptecs telephone support has been superb throughout this. shame i cannot say the same for the website. specifically any pages at ask.adaptec.com (which is where most of the important info is, such as contact numbers, knowledgebase, etc) just time out continually, which really added to the frustration of this experience.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Disaster Recovery

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.