EXCHANGE 2010 - UNABLE TO MIGRATE DATABASE AVAILABILITY GROUP TO ANOTER SERVER AFTER REBOOT

One of the advantages to Exchange 2010 is the fact that the Database Availability Groups (DAGs) can be moved between servers running the mailbox role without any user interruption or downtime. Commonly, this is done in order to perform maintenance like the installation of monthly Microsoft Security Patches. A typical environment configure for high availability can either be physical or virtual, and similar to the configuration below:

2- Windows 2008 R2 running Exchange 2010 SP2 with CAS\Hub transport roles with Windows NLB configured.

2- Windows 2008 R2 running the Exchange 2010 SP2 mailbox roles

While performing these types of moves and reboots in July and August of 2013, the databases would not move back to one of the mailbox servers. After a few days of working with Microsoft in July 2013, they had recommended a restart the Microsoft Exchange Transport service on each of the CAS servers. This appeared to correct the problem, but there was no root cause provided. Microsoft Support said it was apparently a “fluke” and if it happened again, the problem would be investigated further.

This month, August 2013, while a colleague was installing Microsoft security patches, the problem reoccurred. When trying to move a DAG back to the mailbox server, the following error occurred and was logged in the event viewer under “MSExchange Management” along with a displayed message:

Another symptom of this problem appears in the Exchange Management Console. On the CAS server, under ToolBox>Queue Viewer, the mailbox server in question is not connected and is in a persistent “retry” status.

According to Microsoft Support, they found that TCP port 4757 was being reset. One mailbox server was sending a “SYN” packet to the other mailbox server. However, the server in question was returning a “RESET” command.

As part of the troubleshooting process, Microsoft requested the results of the following command ran from the command prompt on each server “netsh int tcp show global”. These are TCP setting for how Windows Server 2008 handles TCP traffic. Of particular interest were the settings for “Chimney Offload State” and “Receive-Side Scaling State.” The results of the command on each server showed that the settings were not consistent.

As a result of this output, Microsoft Support suggested running the following command on ALL Exchange servers:

netsh int tcp set global chimney=disabled
netsh interface tcp set global autotuning=disabled
netsh int tcp set global RSS=disabled

In making this recommendation, Microsoft Support referenced knowledgebase 951037 (http://support.microsoft.com/kb/951037 ) on how these changes affect the way Windows Server 2008 handles the Chimney Offload process with other programs.

Once the commands were initiated on each server, the DAG’s were successfully moved between mailbox servers with no issues. While a reboot was not required after running the commands, there appeared to be erratic connection problems with Outlook clients several hours later. A reboot of each of the servers corrected this problem. Therefore, a reboot is recommended.

EXCHANGE 2010 - UNABLE TO MIGRATE DATABASE AVAILABILITY GROUP TO ANOTER SERVER AFTER REBOOT

Comments (0)