Solved

Exchange SCR Replication ReplayQueueLength very high

Posted on 2013-01-09
20
1,005 Views
Last Modified: 2013-01-18
I am running Exchange 2007 SP3 RU9 in an CCR active-passive cluster. I recently setup another server for SCR. I used the following cmdlet -  Enable-StorageGroupCopy -id "StorageGroup" -StandByMachine "SCR-Server" -ReplayLagTime 0.0:0:0

I have 20 storage groups and had successfully seeded 16 of them. All 16 had been replicating without a problem, CopyQueueLength was 0 and ReplayQueueLength was 50. A few days ago I noticed the ReplayQueueLength for almost all of them was in the thousands. The logs are being copied, but the SCR server does not seem to be truncating them fast enough.

We had a flurry of activity on our exchange server last week, a number of mailbox moves plus a large number of public folder data was moved into shared mailboxes. Could this of caused the high replayqueuelength? Do I simply need to wait for the SCR server to "catch up"?

I really don't want to reseed because that process takes up to 12 hours per mailstore. The SCR server is in the Amazon cloud.

I attached a screenshot of the current status.

Any help is greatly appreciated. Thank you.

SCR status
0
Comment
Question by:cyberleo2000
  • 8
  • 7
  • 3
  • +1
20 Comments
 
LVL 13

Expert Comment

by:imkottees
ID: 38758820
Hi,

Yes it must be because of the too much activity or slowness of the network.

I would suggest you to wait, let it complete but keep an eye on it.

FYI- I used to have copy queue length around 50k plus in my environment.
0
 
LVL 41

Expert Comment

by:Amit
ID: 38759213
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38759358
What is set for -TruncationLagTime   ??

http://technet.microsoft.com/en-us/library/bb676502(EXCHG.80).aspx
-TruncationLagTime   This parameter is used to specify the amount of time that the Microsoft Exchange Replication service should wait before truncating log files that have been copied to the SCR target computer and replayed into the copy of the database. The time period begins after the log has been successfully replayed into the copy of the database. The format for this parameter is (Days.Hours:Minutes:Seconds). The maximum allowable setting for this value is 7 days. The minimum allowable setting is 0 seconds, although setting this value to 0 seconds effectively eliminates any delay in log truncation activity. After the value for this parameter is set, it cannot be changed without disabling and then enabling SCR.

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38760596
To: amitkulshrestha: what checkpoint file is deleted? I'm assuming only the one on the target SCR server node?

To: Rancy: See original question. I did not add a TruncationLagTime value, therefore using the default, which I believe should truncate the logs immediately after they are replayed. I used this command - Enable-StorageGroupCopy -id "StorageGroup" -StandByMachine "SCR-Server" -ReplayLagTime 0.0:0:0

thank you
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38762596
I dont think so why dont we run the command to confirm the Truncation time

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38764052
ReplayQueneLength is the number of files waiting to be committed to the exchange database, i.e.: replayed. So my problem is that the SCR node is not replaying log files fast enough. I'm just trying to find out if i had a problem or if I simply have to be patient and wait for the log files to be replayed.
0
 
LVL 41

Expert Comment

by:Amit
ID: 38764067
Create one Test DB and check how fast it replicates.
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38764086
I guess with SCR it doesnt reply's last 50 logs

- Rancy
0
 
LVL 13

Expert Comment

by:imkottees
ID: 38764101
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38764136
Thanks imkottees ..... for the articles :)

Cyberleo: if its the same its fine if not maybe something to check, i guess you would have more DB's does all have the same issue or only one\few ?

- Rancy
0
Why spend so long doing email signature updates?

Do you spend loads of your time carrying out email signature updates? Not very interesting are they? Don’t let signature updates get you down. Let Exclaimer Cloud - Signatures for Office 365 make managing email signatures a breeze.

 

Author Comment

by:cyberleo2000
ID: 38764361
All the replicating DBs have large ReplayQueueLengths ranging from a few hundred to over 8000. And all replicating DBs have CopyQueueLength of 0. The logs are copying with no problem. It the issue seems to be with the SCR server not replaying logs fast enough. SCR replication was working just fine for about a month or two. CopyQueueLengths were all 0 and ReplayQueueLengths were all 50.

The we had a flurry of activity: mailbox moves, public folder data being transferred to shared mailboxes, etc. Is it coincidence or did this increase i activity cause more logs to be generated than the SCR node can handle?

If its a matter of waiting for the SCR ndoe to catch up and replay all the extra logs, ok, no problem, I can wait. But if there is a problem I don't want to wait for it to get worse.

Reseeding takes too long, and I have 20 stores, so that's really a last resort, worse case solution.

thank you
0
 
LVL 41

Expert Comment

by:Amit
ID: 38764370
wait and watch is the right choice for now, as you were doing lot of activity
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38764401
Is it coincidence or did this increase i activity cause more logs to be generated than the SCR node can handle? - I dont think so but Weekend is a good time and logs should purge after backup on Active CCR

I Personally dont think its an issue as too said by Amit best is we can wait for this weekend as i guess over the entire week there is too much activity with what you just mentioned

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38778578
So I waited about 5 days. My latest SCR replication status is still way too high. ReplayQueueLength should be around 50. See attachment. At this point I'm think that maybe the SCR node doesn't have enough resources to process the logs fast enough? The specs of the server are as follows: Processor is Xeon E5-2665 @ 2.4 GHz and memory is 34.1 GB @ 2.64 GHz. I would think that this is enough.
scrstatus.jpg
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38779011
I do agree but what if we Suspend all and only leave 2-3 SG with SCR does it keeps low as 50 ?

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38779333
I guess that's my next test. Right now I am waiting to hear back from a  server engineer who is checking the box for any I/O errors or latency of any kind.
0
 
LVL 52

Expert Comment

by:Manpreet SIngh Khatra
ID: 38781476
Perfect, please do keep us updated ..... getting hands on SCR-CCR after a long time now :)

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38787877
I suspended replaiction on all storageg roups except 16 whihc had a replayqueuelength of over 15000. The queue has gone down to 10000 in about 15 minutes, so maybe the problem is with the SCR node not having enough resources to keep up.

One thing I noticed is that when I suspend replication for a storage group, the replaying of logs to the database on the SCR node also stops. Why would that be? Why wouldn't the SCR node continue to replay logs until the confiured limit was reached? In my case, 50 logs. That does not make much sense.
0
 
LVL 52

Accepted Solution

by:
Manpreet SIngh Khatra earned 500 total points
ID: 38789972
Why would that be? - Suspend mean asking not to copy any logs as if that is done Replay is by ESE engine so you do stop the initial process

- Rancy
0
 

Author Comment

by:cyberleo2000
ID: 38792975
I figured that. I just wished the software was programmed so that there was a way to pause replication but continue to replay logs. Anyhow, it turns out that the problem is with low resources on the scr node, specifically disk I/O. I have a server engineer looking at moving the data to faster disks. Of course that means convincing my company to spend money :)

Thank you for the help. Greatly appreciated. I will try to "pay it forward"
0

Featured Post

Too many email signature updates to deal with?

Do you feel like you are taking up all of your time constantly visiting users’ desks to make changes to email signatures? Wish you could manage all signatures from one central location, easily design them and deploy them quickly to users? Well, there is an easy way!

Join & Write a Comment

"Migrate" an SMTP relay receive connector to a new server using info from an old server.
Scam emails are a huge burden for many businesses. Spotting one is not always easy. Follow our tips to identify if an email you receive is a scam.
In this video we show how to create a Shared Mailbox in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >> Sha…
In this video we show how to create a Resource Mailbox in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: Navigate to the Recipients >> Resources tab.: "Recipients" is our default selection …

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now