asked on

Couldn't switch the mailbox into Sync Source mode

Hello

We are migrating to o365. The setup is hybrid.
Hybrid server are Exchange Server 2016 Cumulative Update 8 (CU8)
Source servers are mixed, most of them Exchange 2016, Exchange 2010.
From average batch of 100 users, approx 10 are failing with the following error:

Error: Couldn't switch the mailbox into Sync Source mode.
This could be because of one of the following reasons:
Another administrator is currently moving the mailbox.
The mailbox is locked.
The Microsoft Exchange Mailbox Replication service (MRS) doesn't have the correct permissions.
Network errors are preventing MRS from cleanly closing its session with the Mailbox server. If this is the case, MRS may continue to encounter this error for up to 2 hours - this
duration is controlled by the TCP KeepAlive settings on the Mailbox server.
Wait for the mailbox to be released before attempting to move this mailbox again.

upon deeper investigation, this seems to be related with WCF (Windows Communication Foundation)

FailureType : CommunicationWithRemoteServiceFailedTransientException
FailureHash : e76c
FailureCode : -2146233088
MapiLowLevelError : 0
FailureSide : Source
FailureSideInt : 1
ExceptionTypes : {MRS, MRSTransient, Transient, Exchange}
ExceptionTypesInt : {10, 11, 2, 1}
WorkItem : IncrementalSync
Message : Communication with remote service 'https://hybrid.ommited/EWS/mrsproxy.svc hybryd server FQDN (15.1.1415.3 ServerCaps:, ProxyCaps:, MailboxCaps:, legacyCaps:0FFD6FFFBF5FFFFFCB07FFFF)' has
failed. --> The call to 'https://hybrid.ommited/EWS/mrsproxy.svc hybryd server FQDN (15.1.1415.3 ServerCaps:, ProxyCaps:, MailboxCaps:, legacyCaps:0FFD6FFFBF5FFFFFCB07FFFF)' timed out.
Error details: The request channel timed out while waiting for a reply after 00:00:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time
allotted to this operation may have been a portion of a longer timeout. --> The request operation did not complete within the allotted timeout of 00:00:50. The time allotted to this operation may have been a
portion of a longer timeout. --> The request channel timed out while waiting for a reply after 00:00:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the
Binding. The time allotted to this operation may have been a portion of a longer timeout. --> The request operation did not complete within the allotted timeout of 00:00:50. The time allotted to this
operation may have been a portion of a longer timeout.

Where do i find the mrsproxy config file to check the default bindings ?
Anyone else facing this problem ?

The work around is to perform a database failover, as initiating the move the mailbox state flag is set to 'InTransit' in InformationStore memory.
https://blogs.msdn.microsoft.com/brad_hughes/2016/12/16/source-mailbox-already-being-moved-errors-while-moving-mailboxes/

just to know that for Exchange 2010 the configuration is set to maintain the session affinity. Or in other words once the connection to 2010 CAS is established, it keeps communicating with this server.

looking at the move-request with -includereport parameter

Transient error CommunicationWithRemoteServiceFailedTransientException has occurred. The system will retry
The job has been paused temporarily because the mailbox is locked. The job will attempt to continue again after (loop until permanent exception is thrown

Saif Shaikh

To fix the issue we will need to create a registry as per the below article:
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/cc957549(v=technet.10)

1) Create DWORD value KeepAliveTime on both the exchange server under the below path with the value 1800000
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
2) Rebooted both the server.
3) Resume the failed batches.
4) You will have to increase the values if you again receive such errors in other batches.

Saif Shaikh

This has to be created on the onpremise mailbox servers.

belyo belev

ASKER

forgot to mention about these settings

In Source servers which are HUB/CAS/MBX the TCP KeepAliveTime are 300000 or 5 mins
In Hybrid servers are 1800000 or 30 mins
Default value is 2 hours
I dont believe so far that Hybryd server TCP KeepAliveTime matters

Saif Shaikh

Yes it matters and this is a known issue and Hybryd server TCP KeepAliveTime matters.

This is the only resolution to the problem and you need to add the required registry and increase everytime when the batches go in failed state and resume then after setting the value and reboot is required.

This question needs an answer!

Become an EE member today

7 DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform.

View membership options

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.