Link to home
Start Free TrialLog in
Avatar of atyar
atyar

asked on

Active Directory 'KCC' errors related to configuring ip site links on Windows Server 2K3 and 2000

I have a simple active directory domain with 4 sites, and a total of 5 domain controllers.  It works, but my 2003 Servers, of which there are 2 (the other 3 are 2000 Servers), are ridiculously finicky about having the site links setup just so in Active Directory Sites and services.  If anything is out of whack (even having too many site links configured), the knowledge consistency checker gets all out of whack and spams my event viewer with error messages about not being able to form a complete spanning tree topology, and as a result, 1 of the other offices is unavailable, yada yada yada.

My problem is, I create the site links on each of the domain controllers to reflect the wan design as follows:
1)A to B, with cost 100 and freq 60.
2)A to C, with cost 100 and freq 60.
3)A to D, with cost 125 (D has a slower internet connection) and freq 60.
4)B to C, with cost 100 and freq 60.
5)B to D, with cost 125 and freq 60.
6)C to D, with cost 125 and freq 60.
(I tried naming the site links exactly the same on all domain controllers, too.)

At first this is ok.  Within a few minutes, however, the site links get replicated to the other domain controllers, who wind up with a total of like 10 or 12 different site links, the 'extra' ones that were replicated from the other d.c.'s having funky names like A-B(box)CNF:as234kdbf8k234kfb.....

The 2000 servers don't seem to mind this too much, but the 2003 servers get spammed with kcc errors in event viewer and say they can't see the other site with 2003 server, and vice versa.  Does anyone know how you're supposed to setup these site links to they don't replicate all over the domain and cause these kcc errors? I thought if I made them exactly the same on each domain controller, they wouldn't replicate and add to each other.

Come to think of it, the thought occurs to me - are you rather supposed to setup the site links on each domain controller that apply to that d.c. only?  like, on A, I would configure site links for AB, AC, and AD, and then on B, I would configure BA,BC, and BD, and so on?  Still, I'd guess when they replicate that A would wind up with site links AB and BA, and then kcc would get all bent out of shape.....Just a thought...

Any tips would be greatly appreciated.  Just when I get the event viewer clean of active directory errors, I start to feel all warm and fuzzy.....then, the kcc errors come in force and I feel stupid again...*sigh*

incidentally, all 4 sites are connected directly to each other via vpn tunnels.
Avatar of atyar
atyar

ASKER

ok, maybe the kcc errors aren't a result of the site links, or even just on the 2K3 servers....now, I deleted the extraneous site links, and the kcc errors appear on 1 of the 2k3 servers, and 1 of the 2000 servers......

The kcc errors always involve the site with the slower connection (the 3 other sites are behind full T-1 connections, while the slower site has a cable modem and a 2K3 server and 2000 Server).  Maybe I should just figure the cable modem site is going to always give me a hassle with kcc? *sigh*
<Any tips would be greatly appreciated.>
*Check your site link design & setup against this.

Managing intrasite replication and bandwidth is done with a ‘site link’.
http://www.windowsnetworking.com/articles_tutorials/Deploy-Windows-Server-2003-Planning-Network-Bandwidth.html

For Internet Protocol (IP) transport, a typical site link connects just two sites and corresponds to an actual wide area network (WAN) link. You would want to configure your site links with the IP subnets that they correspond with, such as … if you have the hub site as 10.1.1.0, then that is subnet you want to correspond to that site link on the hub site end. The remote sites (the UK and Asia) are also connected via site links, as per their respective subnets… 10.1.2.0 and 10.1.3.0.
By placing one domain controller per site, (called the intersite topology generator) which is nothing more than a service that runs on a DC which scans all connections and reports new or dead ones, is able to assess link status, as well as to ‘control’ link connection which will give you a way to manage the links in between physical sites… with new logical ones that you build within the Active Directory Sites and Services MMC.

A way to find out if you are having issues is to search the Directory Log in the Event Viewer for KCC errors, many of which point to replication and synchronization errors. The Knowledge Consistency Checker (KCC) process will updates the intersite replication topology accordingly with its findings as well.

*Other benefits are the elimination of redundant replication paths between sites, as changes can occur rapidly in your environment;
Windows Server 2003 is working for you to get the best path.
<Does anyone know how you're supposed to setup these site links to they don't replicate <all over the domain and cause these kcc errors?>
*The KCC runs by default every 15 minutes. The KCC logs an event that indicates that the global catalog has been removed from a domain controller.

What event IDs/info are noted in the KCC errors?
Have you run the following T/S commands?
C:\>repadmin /showreps

KCC errors - Troubleshooting Active Directory Replication Problems
http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part1/adogd12.mspx

*Verify Successful Replication to a Domain Controller.
http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part2/adogdapb.mspx#EJAA

Procedures Reference.
http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part2/adogdapb.mspx#E6AA
<not being able to form a complete spanning tree topology and as a result,>
<1 of the other offices is unavailable, yada yada yada.>

*Very important to have the complete error message info, Event ID, Source ID, etc.
Bridgehead, bridging, manual changes such as removing sites, and the like.

The intersite topology is a layering of spanning trees -
(one intersite connection between any two sites for each directory partition)
and generally does not contain redundant connections.



KCC and Topology Generation/How Active Directory Replication Topology Works.
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/TechRef/c238f32b-4400-4a0c-b4fb-7b0febecfc73.mspx
---------------------

Tools - procedures [If needed]

Install the support tools on dc at each site -
dcdiag /v > dcdiag.txt
netdiag /v > netdiag.txt

Clean up server metadata
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/012793ee-5e8c-4a5c-9f66-4a486a7114fd.mspx

Long list of Site Link operation procedures.
http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part2/adogdapb.mspx
Avatar of atyar

ASKER

source: NTDS KCC
event id:1865
"The knowledge consistency checker was unable to form a complete spanning tree network topology.  As a result, the following list of sites cannot be reached from the local site.
Sites: (the site behind the lower speed connection)"

also

source: NTDS KCC
event it:1311
"the knowledge consistency checker(kcc) has detected problems with the following directory partition:
directory partition:
CN=configuration,DC=umapinc,DC=com

there is insufficient site connectivity information in active directory sites and services for the KCC to create a spanning tree replication topology. Or, one or more domain controllers with this directory partition are unable to replicate the directory partition information.  This is probably due to inaccessible domain controllers."

Sorry I didn't reply before now, but for some reason, I wasn't getting the usual email notifications of your posts....
The Knowledge Consistency Checker (KCC) could not find domain controllers in any other site.
This event is logged after an NTDS KCC 1311 event and should be used to help troubleshoot that event.
   
Use the Active Directory Sites and Services snap–in to resolve this problem.
Try one or all of the following:

/ Verify that the sites are connected by site links.
/ Verify that each site within the domain has a path through the site links to other sites within the domain.
/ Ensure that at least one bridgehead server for the domain is reachable and replicating in that site.

In our case, this error came up after we deleted a server from Active Directory. When you open up Active Directory Sites and Services, look for the server that may have been deleted. If it is still in the site, and you are SURE it was taken out of AD via DCPROMO, go ahead and delete it. The errors will clear up shortly thereafter.

/ The Knowledge Consistency Checker (KCC) detected problems with the specified directory partition. There is insufficient site connectivity information in Active Directory Sites and Services for the KCC to create a spanning tree replication topology, or one or more domain controllers with this directory partition are unable to replicate the directory partition information. The latter situation is usually caused by inaccessible domain controllers.

/ Use Active Directory Sites and Services to perform one of the following actions:
Publish sufficient site connectivity information so that the KCC can determine a route by which this directory partition can reach this site. This is the recommended action.
Add a Connection object to a domain controller that contains the directory partition in this site from a domain controller that contains the same directory partition in another site.
If neither of the Active Directory Sites and Services tasks corrects this condition, review previous events logged by the KCC that identify the inaccessible domain controllers.

http://www.microsoft.com/technet/support/ee/result.aspx?EvtSrc=Active+Directory&EvtID=1311&ProdName=Windows+Operating+System&LCID=1033&ProdVer=5.0

Event ID 1311: Replication configuration does not reflect the physical network.
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/062e8eaa-27e0-4c5e-bc2b-2913ecce24b8.mspx
--------------------------------------------

Kendra (Last update 11/25/2003):
I have also found that if the time on the servers has become out of sync (by 5 minutes either way) this error will appear. I had this issue and found that my domain controllers were out of sync. Changed the times and the errors went away.

Imran Hashim (Last update 5/9/2005):
This event sometime occurs in an environment with large number of sites and domain controllers when connectivity to one or more sites is lost. ISTG tries to reach that site through alternate routes available and creates new connections for this purpose. By design in Windows 2003, these connections should be deleted automatically when original connectivity is restored, but in Windows 2000 these links are not deleted. You have to go to NTDS Settings of all the servers in affected site and delete all connections, and initiate "Check Replication Topology". ISTG will create all the links from scratch for all of these servers and problem will disappear.

Ionut Marin (Last update 7/22/2004):
From a newsgroup post: "In certain rare conditions, the error will appear erroneously. This is more typical in environments with large numbers of sites, domain controllers, and domains. The steps from M214745 will very likely resolve the issue. If all steps from the article have been exhausted but the error still appears, you can open a free MS support case to obtain the fix referenced in M819249".

Adrian Grigorof
This behavior can occur if the Knowledge Consistency Checker (KCC) has determined that a site has been orphaned from the replication topology. See M214745, M244368 and M271997 for troubleshooting.  
http://www.eventid.net/events.asp
Avatar of atyar

ASKER

I have a feeling this is some communication problem between this site and the other sites.  I'm looking at the Cisco IOS release that is running on their router, and I'm not absolutely certain it's the best match, so I'm going to pursue the proper newest IOS and see if that makes a difference.  I'll post the results on that...
Avatar of atyar

ASKER

Well, Cisco says we're running the most current IOS version available and it is correct for our needs, so that doesn't give me anything to go on on that front.  I really think there is come connectivity problem between the affected site and all other sites, given that this site is behind a relatively unpredictable connection (cable modem).  I wish I had some way to verify this, however, rather than the generic and nagging kcc errors mentioned above.  I believe A.D. replication and kcc use rpc traffic between the servers to accomplish their tasks - I have seen something called 'rpcping' which I gather is supposed to help diagnose rpc traffic problems, but when I downloaded it, I wasn't able to do much of anything with it.  The other idea I had was to try to disable kcc from operating, and just rely on manually creating the replication links.  In some ways, I'd prefer this anyway, to have more control on how much traffic is going into and out of each office, since each office really is unique in its internet usage.

Any ideas on that front?
<<Any ideas on that front?
<<
i will think thru from this viewpoint & get back to you.

Did you kind nothing relating help in the comments above?
Avatar of atyar

ASKER

Well, the point about the time differences is a good point to keep in mind.  I have seen some additional errors recently with regard to time differences between d.c.'s, as I've been implementing kerberos authentication and had to adjust daylight savings time settings.  On a separate e.e. post on the issue, a responder linked to a Microsoft Site with a VB script that disables and enables KCC on demand.  I ran that to disable KCC and am waiting for the dust to settle, as it has to replicate that change to the other d.c.'s.  When I can verify that KCC is indeed disabled domain-wide, I'll go in and manually configure my NTDS replication links and see how it goes.  Sometimes, I get the feeling Microsoft thinks the computers are smarter than we are, and does away with the 'human element' too easily.  Here, I think it's creating unnecessary problems....

Avatar of atyar

ASKER

Well, disabling the bumbling kcc with that vb script seems to have at least cleared up my kcc event log spam.  I'll do a test, now, to ensure that replication makes its way around the domain.  I manually created replication connections to suit our needs, so it should hopefully work.

I'm not sure what to do with the points on this question, though - you should get some for effort, but your suggestions didn't really solve my problem.  Any ideas how to handle that?
Avatar of atyar

ASKER

I solved this question 'myself' by disabling kcc.  I used http://support.microsoft.com/kb/245610/ to disable kcc, per a suggestion from an E.E. user on another topic.
ASKER CERTIFIED SOLUTION
Avatar of modulo
modulo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial