Link to home
Start Free TrialLog in
Avatar of Ehab Salem
Ehab SalemFlag for Egypt

asked on

RAID Controller corrupts data after reboot

A WIndows Server 2003 SP1 (mail server Exch 2003 SP2) on an IBM x236 with a ServeRAID and RAID 5 (4 drives + 1 hot spare) configuration is behaving oddly.
This server has been running for almost three years now without a problem.

Suddenly, it rebooted and one of the drives was defunctioning and the hotspare went online. After reboot it started chkdsk saying drive G has errors, after Windows boot I found that the drive G is full of chk files .00 and revoevred folders, among these files were the exchange edb (the stm has gone). I formatted the partition, restored the information store from a backup and everything went fine.
I reinserted the defunct HD (as it looked ok after testing), it started to rebuild, then suddenly and after 10 minutes it rebooted. Then the odd stuff started.
The server was back to its state after the first reboot, same chk.00 files, corrupted edb, no stm.
I removed the partition completely, reformatted, restarted the server, again the check files (this time without the edb) appeared!!!
Once more I removed part, reformated, restarted, then I found it blank.
Restored Exchange again from backup, then everything worked fine.
I rebooted using the ServeRAID Support CD, installed a new HD, convert it to online, rebuilt successfully, started the server and it looks ok.
To be more sure, I restarted the server again, without making any chkdsk it removed all the edb and stm files to a recovered folder, when I moved it to original location, the database did not mount, I restored from backup and now is again running but don't know what will happen if the server reboots!!!!!!!!
Please help?
Avatar of HiS_SlyneSS
HiS_SlyneSS

Is your boot disk on the same Raid Card? Sounds like the disk is faulty, the one you had inserted after you had pulled it out. Try with it out and reboot.

Sly
Avatar of Ehab Salem

ASKER

I called an IBM authorized service engineer and he said:
1- Never re-use a drive that was marked bad by the controller even if it passes all tests.
2- The problem is likely because of a HD failure that corrupted the stripe of the array.
3- Suggested: reboot, and copy configuration from controller to hard drives.
If this does not work, the array must be deleted, rebuilt from scratch

I cannot make this now - only in weekend possible.
But if anyone comes with a solution it is welcomed as rebuilding the array means restoring a mail server completely from backup and this is a big headache.
ASKER CERTIFIED SOLUTION
Avatar of Ehab Salem
Ehab Salem
Flag of Egypt image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial