Exchange 2007 Cluster - Outage Causing "Split Brain" - Lost Emails
Posted on 2012-03-28
We have a two node Exchange cluster (CCR) that experienced a non controlled outage on one of the nodes today (outage at 1pm). We have 5 Information Stores, and four of them came up on the Passive node with no problem. The fifth looked at first like it worked but then we realized that it was missing emails from the last week.
We then realized that on the Node B (Originally passive, now active) it showed that Information Store as "Initalizing" and the Information Store Logs had not been replicated for over a week from Node A.
We tried to dismount (and suspend replication) the broken Information Store and transfer back to Node A (used the -IgnoreDismount switch) and it worked but we then had all emails prior to the outage but none between 1pm and 5pm when we took down the server again for maintenance. We then realized that the log files were re-creating on Node A and were conflicting with the log files on Node B. At this point I think we have a "Split Brain" convergence of the Information Stores. We have a backup and snapshot of both Node A and Node B .edb and log files before we started to troubleshoot this problem so we could roll back.
Are there any options? Is the best option to get Node A working up to 1pm, re-seed Node B, then use the backup of Node B .edb and log files into a DR environment and then just export the changes since 1 pm in mail to a PST (and give to the users)? Are there better options? Could we use the Exchange Recovery Group?