Link to home
Start Free TrialLog in
Avatar of Gordon Clee
Gordon CleeFlag for Canada

asked on

Active Directory ERROR_REPLICA_SYNC_FAILED_THE REPLICATION OPERATION FAILED TO ALLOCATE MEMORY

I am having a problem with AD replcation at one of our branch offices that hope an expert can help me with.

FIRST THE NETWORK DESCRIPTION:
We have numerous branch offices each with a Windows 2003 DC replicating with a Windows 2003 DC at head office. The

schema has been upgraded to Windows 2008R2 and there is one Windows 2008R2 DC in the forest. The branches are

connected to the Internet by cable, DSL, or terrestrial wireless (whatever is the most practical method for the

site). Each branch has a VPN gateway appliance that creates an IPSec VPN connection to head office.

NOW THE SPECIFIC PROBLEM:
The branch DC has two drives which are mirrored with Windows software mirroring. One drive failed so we

disconnected it and brought the server up on the remaining drive. Around the same time the branch DC stopped

replicating the domain partition while the Configuration, Schema, and ForestDNSZones are replicating OK. The error

that replmon gives is:
"There was an error during queuing the synchronization. The error cade was: ERROR_REPLICA_SYNC_FAILED_THE

REPLICATION OPERATION FAILED TO ALLOCATE MEMORY."
The advice in the error log is to restart the DC and try again but this did not help (I have done it twice).

This is a terrestrial wireless connection with higher latency that our typical branches, so I thought in might be a

Kerberos timeout problem. I attacked this by adding the registry key

HKLM\SYSTEM\CurrentControSet\Control\Lsa\Kerberos\Parameters\KdcWaitTime set to 30 seconds (the default is 5). This

made no difference.

There are lots of things in event logs but I need some expert guidance to figure out what to try next.

Here are excerpts from the event logs...

APPLICATION LOG:

Event Type:      Error
Event Source:      Userenv
Event Category:      None
Event ID:      1041
Date:            9/30/2011
Time:            9:28:15 AM
User:            NT AUTHORITY\SYSTEM
Computer:      <branch DC>
Description:
Windows cannot query DllName registry entry for {7B849a69-220F-451E-B3FE-2CB811AF94AE} and it will not be loaded.

This is most likely caused by a faulty registration.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

DIRECTORY SERVICE LOG:

Event Type:      Warning
Event Source:      NTDS KCC
Event Category:      Knowledge Consistency Checker
Event ID:      1925
Date:            9/30/2011
Time:            10:04:22 AM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      <branch DC>
Description:
The attempt to establish a replication link for the following writable directory partition failed.
 
Directory partition:
DC=<domain>
Source domain controller:
CN=NTDS Settings,CN=<another branch DC that is not routable>,CN=Sites,CN=Configuration,DC=<domain>
Source domain controller address:
5b43d6cf-0c12-440d-84d6-cb9169984d73._msdcs.<domain>
Intersite transport (if any):
CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=<domain>
 
This domain controller will be unable to replicate with the source domain controller until this problem is

corrected.  
 
User Action
Verify if the source domain controller is accessible or network connectivity is available.
 
Additional Data
Error value:
1722 The RPC server is unavailable.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

-------------------------------------------------------------------

Event Type:      Error
Event Source:      NTDS KCC
Event Category:      Knowledge Consistency Checker
Event ID:      1311
Date:            9/30/2011
Time:            10:01:34 AM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      <branch DC>
Description:
The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition.
 
Directory partition:
CN=Configuration,DC=<domain>
 
There is insufficient site connectivity information in Active Directory Sites and Services for the KCC to create a

spanning tree replication topology. Or, one or more domain controllers with this directory partition are unable to

replicate the directory partition information. This is probably due to inaccessible domain controllers.
 
User Action
Use Active Directory Sites and Services to perform one of the following actions:
- Publish sufficient site connectivity information so that the KCC can determine a route by which this directory

partition can reach this site. This is the preferred option.
- Add a Connection object to a domain controller that contains the directory partition in this site from a domain

controller that contains the same directory partition in another site.
 
If neither of the Active Directory Sites and Services tasks correct this condition, see previous events logged by

the KCC that identify the inaccessible domain controllers.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

FILE REPLICATION SERVICE LOG:

Event Type:      Error
Event Source:      NtFrs
Event Category:      None
Event ID:      13568
Date:            9/29/2011
Time:            12:12:07 PM
User:            N/A
Computer:      <branch DC>
Description:
The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in

JRNL_WRAP_ERROR.
 
 Replica set name is    : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"
 Replica root path is   : "c:\windows\sysvol\domain"
 Replica root volume is : "\\.\C:"
 A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read from the NTFS USN journal is not

found.  This can occur because of one of the following reasons.
 
 [1] Volume "\\.\C:" has been formatted.
 [2] The NTFS USN journal on volume "\\.\C:" has been deleted.
 [3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds

corrupt entries at the end of the journal.
 [4] File Replication Service was not running on this computer for a long time.
 [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:".
 Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery

steps to be taken to automatically recover from this error state.
 [1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you

do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File

Replication Service.
 [2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will

trigger a full tree sync for the replica set.
 
WARNING: During the recovery process data in the replica tree may be unavailable. You should reset the registry

parameter described above to 0 to prevent automatic recovery from making the data unexpectedly unavailable if this

error condition occurs again.
 
To change this registry parameter, run regedit.
 
Click on Start, Run and type regedit.
 
Expand HKEY_LOCAL_MACHINE.
Click down the key path:
   "System\CurrentControlSet\Services\NtFrs\Parameters"
Double click on the value name
   "Enable Journal Wrap Automatic Restore"
and update the value.
 
If the value name is not present you may add it with the New->DWORD Value function under the Edit Menu item. Type

the value name exactly as shown above.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.
Avatar of netjgrnaut
netjgrnaut
Flag of United States of America image

Fix the Journal Wrap Error first.  

The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in

JRNL_WRAP_ERROR.
 
 Replica set name is    : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"
 Replica root path is   : "c:\windows\sysvol\domain"
 Replica root volume is : "\\.\C:"
 A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read from the NTFS USN journal is not

found.  This can occur because of one of the following reasons.
 
 [1] Volume "\\.\C:" has been formatted.
 [2] The NTFS USN journal on volume "\\.\C:" has been deleted.
 [3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds

corrupt entries at the end of the journal.
 [4] File Replication Service was not running on this computer for a long time.
 [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:".
 Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery

steps to be taken to automatically recover from this error state.
 [1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you

do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File

Replication Service.
 [2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will

trigger a full tree sync for the replica set.
 

I think the rest of your problems will clear up after that.  If not, post back which errors remain.

Good luck!
Yes, as suggested above, event id 13568 states that the branch office server's replica set is in journal wrap error state.
Steps to recover from journal warp :
Click on Start, Run and type regedit> Expand HKEY_LOCAL_MACHINE.
Click down the key path: "System\CurrentControlSet\Services\NtFrs\Parameters" > Double click on the value name >    "Enable Journal Wrap Automatic Restore" > update the value to "1" > then run "net stop ntfrs" followed by "net start ntfrs" to restart the File Replication Service.

If the value name is not present you may add it with the New->DWORD Value function under the Edit Menu item. Type the value name exactly as shown above as "Enable Journal Wrap Automatic Restore".

Regards,
Abhijit Waikar.

Once you are performed above steps change value of "Enable Journal Wrap Automatic Restore" to "0" or delete it and wait for 10 minutes then check FRS events 13516 will be generate that means everything is fine.
Avatar of Gordon Clee

ASKER

I have done the "recover from journal wrap" procedure suggested by netjgrnaut and NtFrs event 13568 has stopped. I got NtFrs event 13565 which tells me to wait unitl the SYSVOL share is recreated. I'm not sure how long this should take but I am done for the day. I will check again on Monday and report back.
Thanks.
Check the SYSVOL and NETLOGON shares are available on server, if they are available then all is well, ALso confirm with dcdiag /q command.

The JRNL_WRAP_ERROR was resolved by steps suggested in the error log.

The main problem of AD replication still remains. I have worked to free up some system resources by deleting unused printers. The page file size is 1536 MB and there is over 6 GB free on drive C:

After restarting the server task manager reports Physical memory usage to be:
  Total 1046592 K
  Available 558604 K
  System Cache 411876 K

Here are the Directory services events:

Event Type:      Warning
Event Source:      NTDS General
Event Category:      Replication
Event ID:      1079
Date:            10/3/2011
Time:            2:26:14 PM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      <branch DC>
Description:
Internal event: Active Directory could not allocate enough memory to process replication tasks. Replication might be affected until more memory is available.
 
User Action
Increase the amount of physical memory or virtual memory and restart this domain controller.

For more information,

see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

------------------------------------------

Event Type:      Error
Event Source:      NTDS Replication
Event Category:      Replication
Event ID:      1084
Date:            10/3/2011
Time:            2:26:14 PM
User:            NT AUTHORITY\ANONYMOUS LOGON
Computer:      <branch DC>
Description:
Internal event: Active Directory could not update the following object with changes received from the following source domain controller. This is because an error occurred during the application of the changes to Active

Directory on the domain controller.
 
Object:
CN=<workstation name>
Object GUID:
8966644e-9771-426f-8d2d-7a87ffd397b1
Source domain controller:
f8c2a429-c194-48ba-ba76-834eda423ada._msdcs.windows.arbormemorial.ca
 
Synchronization of the local domain controller with the source domain controller is blocked until this update problem is corrected.
 
This operation will be tried again at the next scheduled replication.
 
User Action
Restart the local domain controller if this condition appears to be related to low system resources (for example, low physical or virtual memory).
 
Additional Data
Error value:
8446 The replication operation failed to allocate memory.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.
ASKER CERTIFIED SOLUTION
Avatar of netjgrnaut
netjgrnaut
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes the server has only 1GB RAM. It seemed like enough 5 years ago...
I increased the page file to 4096 MB and restarted. No improvement.
Tomorrow I will try dcpromo to reinstall AD.
You may have to do a metadata cleanup, if the dcpromo/remove doesn't go smoothly.
Ran dcpromo to remove and then again to reinstall the DC. This went without a hitch.
Thanks for your help.
Glad to hear it!