How to clean up lingering objects in the Global Catalog partition in Windows 2008 R2

GusGallowsSupport Escalation Engineer
CERTIFIED EXPERT
Published:

Windows 2008 R2 Active Directory replication issues


Replication issues can cause all sorts of headaches, largely because they are not always obvious replication issues. In this article we will talk about one in particular where the replication issue caused mail delivery problems in our Exchange 2007 environment.

The symptoms


Our issue started out looking like a mail delivery problem. As it was, when someone sent email to our recently created mailbox (created within the last 3 months), they were suddenly, intermittently getting non-delivery reports (NDRs) from those mailboxes. Some would deliver and others wouldn't. There was no obvious rhyme or reason to it and the only common denominators were that they were all on the same exchange server (Exchange 2007 SP2 RU5) and they were all relatively new email accounts. The NDRs were saying things like MAPIExceptionUnknownUser or MAPIExceptionNotFound. The same mailboxes were also producing the same kind of NDR you get when your cache auto-complete file gets corrupted in Outlook, where the address looks something like this:
<IMCEAEX-_O=Org_OU=EXCHANGE+20ADMINISTRATIVE+20GROUP+20+28FYDIBOHF23SPDLT+29_CN=RECIPIENTS_CN=user@forestrootdomain.com>

To make it even more confounding, the addresses that were being shown in the user@forestrootdomain.com part were not SMTP addresses that had ever been stamped on the mailboxes. They were the User Principal Names (UPNs) of the mailboxes.

At this point, we were thinking it was an Exchange issue. It had not dawned on us to check AD replication. We attempted to move the mailboxes to other mailbox databases, but also found that even moving the mailboxes was met with intermittant failures. First attempt would fail opening source mailbox. Second attempt failed opening destination mailbox. Third attempt completed successfully, but the NDR issue persisted. At this point, I called Microsoft.

Troubleshooting the wrong piece


Here is where the frustration really came into play. After talking to Microsoft, we realized we were on a no-longer supported version of Exchange. For Microsoft to help with this issue, we needed to be at Exchange 2007 SP3 and at least at Rollup 3. Rollup 3 addressed some of the errors they were seeing but they were not sure it would address our overall problem. We decided to go to the latest rollup at the time, Rollup 7. So we applied the service pack and update to our CAS/HUB servers and then to all of our Mailbox servers (In 2007, always apply updates to your CAS servers first). The problem did not go away.

I called Microsoft back. This time, they sent me to the connectors team. Now we were getting somewhere. The engineer immediately explained that the Exchange server will use any Global Catalog (GC) server in the site. This particular site had GCs from three different domains in it, the forest root, and two child domains, one of which was the domain our Exchange Server and users lived in. We will call it Child1. Still at this time, and I know we should have, we did not relate it to a replication issue.

A workaround for Exchange


The engineer pointed out that it appeared when the message was resolved through the domain controllers in the domain our mailboxes lived in (child1.forestrootdomain.com), that it resolved fine and the message was delivered. However, when it was resolved through the forest root domain controllers (forestrootdomain.com) or the other child domain (child2.forestrootdomain.com), the NDRs would happen. So as a workaround, he had me set the StaticDomainControllers, and StaticGlobalCatalogs to only domain controllers in the site that were in the same domain as the mailboxes. The command in the Exchange Management Shell for that looks like this:

Set-ExchangeServer SERVERNAME -StaticDomainControllers DC1,DC2,D3 -StaticGlobalCatalogs DC1,DC2,DC3

Open in new window


To see whether the setting took, you would do the following:

Get-ExchangeServer SERVERNAME -Status |fl *

Open in new window


If you omit the -Status from the cmdlet, the StaticDomainControllers and StaticGlobalCatalogs will appear empty.

The REAL problem


This did fix our mail delivery issue, but at this point I was really feeling like this was a band-aid fix. I needed to know why this happened in the first place and to fix the root cause; after all, email should be able to resolve on any Global Catalog server in the site regardless of whether it was in the same Active Directory domain or not. That is kind of the point of having Global Catalog servers. So I started digging in my logs. At this point I had pretty much figured out that we were having some kind of replication issues as it appeared that the users' email addresses did not exist in the GCs of the other domains in the site. I started going through each of those domain controllers and running the Active Directory Domain Services Best Practices Analyzer (BPA). Since my environment is Windows 2008 R2 SP1 for all of my domain controllers, the BPA can be found in the Server Manager. Click on Roles, then on Active Directory Domain Services. You'll find the BPA tool in the third frame on the right hand side. Simply click Scan This Role.

The first thing I noted was only one of my Domain Controllers had Strict Replication Consistency turned on. This made me cringe as that is the only setting that prevents Lingering Object from propagating around the domain controllers. Lingering objects are the AD objects that were deleted while a domain controller was down for whatever reason. When the domain controller comes up, it thinks the objects still exist, while AD thinks the object is deleted, so it moves the objects into a Lost And Found folder in AD. The problem is, when there are lingering objects in a Global Catalog server, it causes the replication to stop.

So I pulled up the Directory Service logs on the Domain Controller that had the Strict Replication Consistency turned on and sure enough, it was flooded with Event 1988 alerts. The alert looks like this:

Log Name:      Directory Service
                      Source:        Microsoft-Windows-ActiveDirectory_DomainService
                      Date:          8/14/2012 1:17:12 PM
                      Event ID:      1988
                      Task Category: Replication
                      Level:         Error
                      Keywords:      Classic
                      User:          ANONYMOUS LOGON
                      Computer:      dc1.forestrootdomain.com
                      Description:
                      Active Directory Domain Services Replication encountered the existence of objects in the following partition that have been deleted from the local domain controllers (DCs) Active Directory Domain Services database.  Not all direct or transitive replication partners replicated in the deletion before the tombstone lifetime number of days passed.  Objects that have been deleted and garbage collected from an Active Directory Domain Services partition but still exist in the writable partitions of other DCs in the same domain, or read-only partitions of global catalog servers in other domains in the forest are known as "lingering objects". 
                       
                       
                      Source domain controller: 
                      b6ededee-6b63-49d4-b7ad-a880562f57b8._msdcs.forestrootdomain.com 
                      Object: 
                      CN=ComputerName\0ACNF:5ef4cf79-5efb-4d6c-9b88-6b05924bbd84,CN=LostAndFound,DC=child1,DC=forestrootdomain,DC=com 
                      Object GUID: 
                      5ef4cf79-5efb-4d6c-9b88-6b05924bbd84  This event is being logged because the source DC contains a lingering object which does not exist on the local DCs Active Directory Domain Services database.  This replication attempt has been blocked.
                       
                       The best solution to this problem is to identify and remove all lingering objects in the forest.
                       
                      User Action:
                       
                      Remove Lingering Objects:
                       
                       The action plan to recover from this error can be found at http://support.microsoft.com/?id=314282.
                       
                       If both the source and destination DCs are Windows Server 2003 DCs, then install the support tools included on the installation CD.  To see which objects would be deleted without actually performing the deletion run "repadmin /removelingeringobjects <Source DC> <Destination DC DSA GUID> <NC> /ADVISORY_MODE". The eventlogs on the source DC will enumerate all lingering objects.  To remove lingering objects from a source domain controller run "repadmin /removelingeringobjects <Source DC> <Destination DC DSA GUID> <NC>".
                       
                       If either source or destination DC is a Windows 2000 Server DC, then more information on how to remove lingering objects on the source DC can be found at http://support.microsoft.com/?id=314282 or from your Microsoft support personnel.
                       
                       If you need Active Directory Domain Services replication to function immediately at all costs and don't have time to remove lingering objects, enable loose replication consistency by unsetting the following registry key:
                       
                      Registry Key:
                      HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Strict Replication Consistency
                       
                       Replication errors between DCs sharing a common partition can prevent user and compter acounts, trust relationships, their passwords, security groups, security group memberships and other Active Directory Domain Services configuration data to vary between DCs, affecting the ability to log on, find objects of interest and perform other critical operations. These inconsistencies are resolved once replication errors are resolved.  DCs that fail to inbound replicate deleted objects within tombstone lifetime number of days will remain inconsistent until lingering objects are manually removed by an administrator from each local DC.
                       
                       Lingering objects may be prevented by ensuring that all domain controllers in the forest are running Active Directory Domain Services, are connected by a spanning tree connection topology and perform inbound replication before Tombstone Live number of days pass.
                      
                      Event Xml:
                      <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
                        <System>
                          <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS General" />
                          <EventID Qualifiers="49152">1988</EventID>
                          <Version>0</Version>
                          <Level>2</Level>
                          <Task>5</Task>
                          <Opcode>0</Opcode>
                          <Keywords>0x8080000000000000</Keywords>
                          <TimeCreated SystemTime="2012-08-14T17:17:12.616796700Z" />
                          <EventRecordID>1009999</EventRecordID>
                          <Correlation />
                          <Execution ProcessID="472" ThreadID="1308" />
                          <Channel>Directory Service</Channel>
                          <Computer>DC1.ForestRootDomain.com</Computer>
                          <Security UserID="S-1-5-7" />
                        </System>
                        <EventData>
                          <Data>b6ededee-6b63-49d4-b7ad-a880562f57b8._msdcs.EnProIndustries.com</Data>
                          <Data>CN=ComputerName\0ACNF:5ef4cf79-5efb-4d6c-9b88-6b05924bbd84,CN=LostAndFound,DC=child1,DC=forestrootdomain,DC=com</Data>
                          <Data>5ef4cf79-5efb-4d6c-9b88-6b05924bbd84</Data>
                          <Data>Strict Replication Consistency</Data>
                          <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
                        </EventData>
                      </Event>

Open in new window


Now in this alert, it tells you a few things. First, we have lingering objects. Second, they are in the child domain:
CN=ComputerName\0ACNF:5ef4cf79-5efb-4d6c-9b88-6b05924bbd84,CN=LostAndFound,DC=child1,DC=forestrootdomain,DC=com

Since these alerts are being generated on the forestrootdomain.com server, but the lingering objects are in the child domain, we can safely deduce that the partition in question is the Global Catalog partition of the child domain. This is because the only objects from a domain outside of the server's domain that would be in its directory are entries related to the Global Catalog. This negates the recommended resolution provided in the event. You can try all day to do a repadmin /removelingeringobjects and it will continually fail to find any lingering objects. When the objects are in the Global Catalog, you have to take a different approach to cleaning them up.

The Fix


First, you need to find which domain controllers are failing to replicate. The easiest way to do this is to run the following command from the server that is reporting the Event 1988 alerts:

repadmin /showrepl * /csv > C:\showrepl.csv

Open in new window


Open the output file (C:\showrepl.csv) in Excel and filter the Last Failure Status column to remove all of the lines with a 0 in that column. The remaining servers are all suffering from the replication issue to the server getting the 1988 alerts.

Now that we know which servers are having the problem, we can start the less than fun part of cleaning it up. But first, lets make sure all domain controllers are set to have Strict Replication Consistency turned on so we can prevent the issue from coming up again once it is fixed. To do this, from an elevated command prompt, do the following:

repadmin /regkey * +strict

Open in new window


This will enable the feature on every domain controller, so depending on how many you have, it may take a while to finish. Wait for it to finish before you move on.

Once it does finish, we need to go to a domain controller in the site and child domain where the mailboxes live, and remove the lingering objects. This will product a clean server to work with. You can do this in Active Directory Users and Computers (ADUC).

Now normally, if you were going to copy a clean partition to the other domain controllers, you would use the one you know is clean, in this case DC1.forestrootdomain.com. However, because the clean domain controller is in a domain that is not the same as the partition where the problem is, you have to do it differently as the forest root domain controller will not have the appropriate rights to child1.forestrootdomain.com partition to be able to perform the fix. Since we are doing this deletion, the current clean server will no longer be in sync with the new clean server and will no longer be considered clean. This means you have to add the previously clean forest root machine to the list of domain controllers to apply the fix to.

So to remove the lingering objects from the child domain controller that we are going to make our new clean server, you open ADUC on that server, click on view and make sure you are in advanced mode, then expand the domain and click on the LostAndFound folder. You should see your lingering objects in that folder. Now, highlight and delete those objects. I know, scary, but do it anyway. The deletion should clear them out of all domain controllers in the same domain but this still won't remove them from the forestrootdomain or the child2 domain controllers in the site, so here is where you do that.

1. Disable Outbound Replication


One at a time, on each of the affected domain controllers, including the one that was originally the clean server, go in to an elevated command prompt and stop outbound replication. To do that, use the following command:

repadmin /options DC1 +DISABLE_OUTBOUND_REPL

Open in new window


Fully Qualified Domain Name (FQDN) is not needed in this step since you will be running this from the actual affected domain controller. Do this for each server from an elevated command prompt on their own consoles.

2. Rehost the affected partition


Once outbound replication is disabled, it is time to copy in the clean server's (the new one) partition. To do this, you issue the following command:

repadmin /rehost DC1.forestrootdomain.com DC=child1,DC=forestrootdomain,DC=com CleanDCName.child1.forestrootdomain.com

Open in new window


This will take the clean copy of the DC=Child1,DC=forestrootdomain,DC=com partition, from the clean domain controller on the domain the partition lives in (child1.forestrootdomain.com) and overwrite the one on the affected domain controller (DC1.forestrootdomain.com). Once this finishes, you can restart outbound replication:

3. Enable Outbound Replication


repadmin /options DC1 -DISABLE_OUTBOUND_REPL

Open in new window


Do this for each of the affected domain controllers not in the child1 domain (child1 domain will take care of itself) one at a time, letting each finish before you move on to the next.
Now, check your Directory Services logs again on each affected domain controllers. You should see an event 1116 stating that Outbound Replication has been enabled and there should be no other event 1988 alerts.

In conclusion


After these steps are all completed, the lingering objects should be gone, you should have better protection in place against them (with the Strict Replication Consistency turned on) and you should no longer get Unknown User or Not Found NDRs on relatively new mailboxes.

I know this is a long article but this is a very complex issue and I just wanted to relay the full detail as accurately as I could while it was all still fresh in my mind. Global Catalog replication issues can manifest in many ways and there isn't much documentation out there on what to do when it occurs, so hopefully this will be helpful if you ever run into this particular issue.
9
14,383 Views
GusGallowsSupport Escalation Engineer
CERTIFIED EXPERT

Comments (4)

tigermattStaff Platform Engineer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
Great article! Voted "Yes" above.

-Matt
CERTIFIED EXPERT

Commented:
Good issue and Great article. Voted Yes . Thanks for the article.
A definite yes. Thank you for sharing.
how  can i do this in a multi domain , where replication shows successful on one domain , shows failure and insufficient storage attributes error on other domain ?

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.