Replay of Transaction Logs stops and crashes MAD.EXE
Posted on 2006-07-15
We are in the process of moving from 5.5 to Exch 2003. To make a long story short, the 71GB priv.edb IS became unusable due to a RAID 5 error. Our backup is quite old, about 5 weeks, but we do have all the transaction logs between that backup and the crash. There are about 2,900 of them as it's an active server. I added new storage to the server to hold all these transaction logs for recovery as the location pre crash didn't have enough space. I then made a new partion to hold the restored IS and mounted it as the same drive letter as the failed RAID array. Restored the backup using Backup Exec 9.1. The logfiles start to play but when it gets to a certain point about 75% of the way through, I get a Dr. Watson of MAD.EXE. I created a recovery server per MS instructions for restoring a single mailbox and did the same thing, thinking that perhaps because the original server was old and underpowered it may not have had enough power to complete the whole recovery since there were so many log files. The recovery process crashed MAD.EXE in the exact same spot.
I restored to the original server again but this time moved out all logs just prior to the one that was replaying during the crash. This time it crashed MAD.EXE but at a different logfile. I did use the eseutil utility to look at the header of what I thought was the corrupt logfile (from the original restore) and it read it in fine.
I then restored to the recovery server again but this time left the log file directory blank, so that BackupExec would restore the full backup of the database to it's consistent state as of that backup and not play any other logfiles back. This worked fine and the IS opened with no problem. However, it's missing 5 weeks of mail of course.
Has any expert run into issues like this before where a good backup cannot be brought current due to issues with the transaction logs? All the logs are there as I ran MS's script in the DR white paper to check for the sequence. Is it possible that there's some sort of corruption in the IS from the backup that is preventing full replay of all the logs? Would having the transaction log directory be a different path from the backup have any effect (I adjusted the path in RegEdit before the restore)? Is there any utility I can run on an unmounted IS PRIOR to beginning the replay of transaction logs that might assess (and fix?) corruption in the restored IS?
There is one error in the event viewer after the replay starts where it says something about an I/O size discrepancy in the IS. This comes up during the transaction log replay but it continues on fine for quite a while after that up until the crash of MAD.EXE, which occurs after perhaps 2200 log files have been replayed (the I/O size error shows up after about the 20th log file has replayed).
Tried to call Microsoft but support is discontinued for 5.5. Ideally, I need to get the production 5.5 server running so I can move the final 20 or so mailboxes off it and onto the Exchange 2003 server, even if it means that those moved mailboxes are missing data. Once those mailboxes are moved I then hope to be able to use Exmerge to get the missing data back and merged into the mailboxes on Exch 2003.
Please help me if you can. I will make it worth your while points wise and will send an Amazon GC to whoever can help me through this....This is an urgent issue.