Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1069
  • Last Modified:

Exchange SCR Replication ReplayQueueLength very high

I am running Exchange 2007 SP3 RU9 in an CCR active-passive cluster. I recently setup another server for SCR. I used the following cmdlet -  Enable-StorageGroupCopy -id "StorageGroup" -StandByMachine "SCR-Server" -ReplayLagTime 0.0:0:0

I have 20 storage groups and had successfully seeded 16 of them. All 16 had been replicating without a problem, CopyQueueLength was 0 and ReplayQueueLength was 50. A few days ago I noticed the ReplayQueueLength for almost all of them was in the thousands. The logs are being copied, but the SCR server does not seem to be truncating them fast enough.

We had a flurry of activity on our exchange server last week, a number of mailbox moves plus a large number of public folder data was moved into shared mailboxes. Could this of caused the high replayqueuelength? Do I simply need to wait for the SCR server to "catch up"?

I really don't want to reseed because that process takes up to 12 hours per mailstore. The SCR server is in the Amazon cloud.

I attached a screenshot of the current status.

Any help is greatly appreciated. Thank you.

SCR status
0
cyberleo2000
Asked:
cyberleo2000
  • 8
  • 7
  • 3
  • +1
1 Solution
 
imkotteesSenior Messaging EngineerCommented:
Hi,

Yes it must be because of the too much activity or slowness of the network.

I would suggest you to wait, let it complete but keep an eye on it.

FYI- I used to have copy queue length around 50k plus in my environment.
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
What is set for -TruncationLagTime   ??

http://technet.microsoft.com/en-us/library/bb676502(EXCHG.80).aspx
-TruncationLagTime   This parameter is used to specify the amount of time that the Microsoft Exchange Replication service should wait before truncating log files that have been copied to the SCR target computer and replayed into the copy of the database. The time period begins after the log has been successfully replayed into the copy of the database. The format for this parameter is (Days.Hours:Minutes:Seconds). The maximum allowable setting for this value is 7 days. The minimum allowable setting is 0 seconds, although setting this value to 0 seconds effectively eliminates any delay in log truncation activity. After the value for this parameter is set, it cannot be changed without disabling and then enabling SCR.

- Rancy
0
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

 
cyberleo2000Author Commented:
To: amitkulshrestha: what checkpoint file is deleted? I'm assuming only the one on the target SCR server node?

To: Rancy: See original question. I did not add a TruncationLagTime value, therefore using the default, which I believe should truncate the logs immediately after they are replayed. I used this command - Enable-StorageGroupCopy -id "StorageGroup" -StandByMachine "SCR-Server" -ReplayLagTime 0.0:0:0

thank you
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
I dont think so why dont we run the command to confirm the Truncation time

- Rancy
0
 
cyberleo2000Author Commented:
ReplayQueneLength is the number of files waiting to be committed to the exchange database, i.e.: replayed. So my problem is that the SCR node is not replaying log files fast enough. I'm just trying to find out if i had a problem or if I simply have to be patient and wait for the log files to be replayed.
0
 
AmitIT ArchitectCommented:
Create one Test DB and check how fast it replicates.
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
I guess with SCR it doesnt reply's last 50 logs

- Rancy
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
Thanks imkottees ..... for the articles :)

Cyberleo: if its the same its fine if not maybe something to check, i guess you would have more DB's does all have the same issue or only one\few ?

- Rancy
0
 
cyberleo2000Author Commented:
All the replicating DBs have large ReplayQueueLengths ranging from a few hundred to over 8000. And all replicating DBs have CopyQueueLength of 0. The logs are copying with no problem. It the issue seems to be with the SCR server not replaying logs fast enough. SCR replication was working just fine for about a month or two. CopyQueueLengths were all 0 and ReplayQueueLengths were all 50.

The we had a flurry of activity: mailbox moves, public folder data being transferred to shared mailboxes, etc. Is it coincidence or did this increase i activity cause more logs to be generated than the SCR node can handle?

If its a matter of waiting for the SCR ndoe to catch up and replay all the extra logs, ok, no problem, I can wait. But if there is a problem I don't want to wait for it to get worse.

Reseeding takes too long, and I have 20 stores, so that's really a last resort, worse case solution.

thank you
0
 
AmitIT ArchitectCommented:
wait and watch is the right choice for now, as you were doing lot of activity
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
Is it coincidence or did this increase i activity cause more logs to be generated than the SCR node can handle? - I dont think so but Weekend is a good time and logs should purge after backup on Active CCR

I Personally dont think its an issue as too said by Amit best is we can wait for this weekend as i guess over the entire week there is too much activity with what you just mentioned

- Rancy
0
 
cyberleo2000Author Commented:
So I waited about 5 days. My latest SCR replication status is still way too high. ReplayQueueLength should be around 50. See attachment. At this point I'm think that maybe the SCR node doesn't have enough resources to process the logs fast enough? The specs of the server are as follows: Processor is Xeon E5-2665 @ 2.4 GHz and memory is 34.1 GB @ 2.64 GHz. I would think that this is enough.
scrstatus.jpg
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
I do agree but what if we Suspend all and only leave 2-3 SG with SCR does it keeps low as 50 ?

- Rancy
0
 
cyberleo2000Author Commented:
I guess that's my next test. Right now I am waiting to hear back from a  server engineer who is checking the box for any I/O errors or latency of any kind.
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
Perfect, please do keep us updated ..... getting hands on SCR-CCR after a long time now :)

- Rancy
0
 
cyberleo2000Author Commented:
I suspended replaiction on all storageg roups except 16 whihc had a replayqueuelength of over 15000. The queue has gone down to 10000 in about 15 minutes, so maybe the problem is with the SCR node not having enough resources to keep up.

One thing I noticed is that when I suspend replication for a storage group, the replaying of logs to the database on the SCR node also stops. Why would that be? Why wouldn't the SCR node continue to replay logs until the confiured limit was reached? In my case, 50 logs. That does not make much sense.
0
 
Manpreet SIngh KhatraSolutions Architect, Project LeadCommented:
Why would that be? - Suspend mean asking not to copy any logs as if that is done Replay is by ESE engine so you do stop the initial process

- Rancy
0
 
cyberleo2000Author Commented:
I figured that. I just wished the software was programmed so that there was a way to pause replication but continue to replay logs. Anyhow, it turns out that the problem is with low resources on the scr node, specifically disk I/O. I have a server engineer looking at moving the data to faster disks. Of course that means convincing my company to spend money :)

Thank you for the help. Greatly appreciated. I will try to "pay it forward"
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

  • 8
  • 7
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now