I have a Windows 2003 domain with two sites and two domain controllers. Recently, at one of the sites I started getting the following error when trying to add a computer to the domain "Logon Failure: The target account name is incorrect." I've also noticed that from a number of computers (not from all) in the same site, that I cannot browse file shares - I get the same error message.
Examining the error logs on the local DC (at the site with the problems) - I have a number of error events.
In the application log I have a high occurrence of Event 1053:
Windows cannot determine the user or computer name. (The target principal name is incorrect. ). Group Policy processing aborted.
In the system log I have a high occurrence of Event 4:
The kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/eagle.detect.local. The target name used was cifs/eagle.detect.local. This indicates that the password used to encrypt the kerberos service ticket is different than that on the target server. Commonly, this is due to identically named machine accounts in the target realm (DETECT.LOCAL), and the client realm. Please contact your system administrator.
(The server and target name change often - host/ cifs/ dns/ ldap/ and I get it for both of my servers, falcon and eagle both with and without FQDN)
In the Directory Service Log I have a high occurrence of events 1865, 1311, and 1566:
The Knowledge Consistency Checker (KCC) was unable to form a complete spanning tree network topology. As a result, the following list of sites cannot be reached from the local site.
The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition.
There is insufficient site connectivity information in Active Directory Sites and Services for the KCC to create a spanning tree replication topology. Or, one or more domain controllers with this directory partition are unable to replicate the directory partition information. This is probably due to inaccessible domain controllers.
and Event 1566
All domain controllers in the following site that can replicate the directory partition over this transport are currently unavailable.
I have been experiencing some WAN slowness over the last couple of days but not sure if this is causing part of the problem. I also recently had a power outage at the failing site. I have sustained these failures in the past and didn't have these problems.
I can ping all of the involved machines to and from each other. I can browse files by IP but not by name or FQDN. I can also RDP to and from these servers (by name) with no problems.
In doing some searching here on EE and also Google, Microsoft, etc. It sounded like I might have multiple SPN entries for the server(s). I ran setspn -L on both DC's and I guess I'm not sure what I'm looking at to see if there is a problem - they look "normal" to me. (I can send results on request)
One suggestion was to dis-join the failing DC and re-join but I don't really want to do that If I can avoid it.
Does anyone out there have some suggestions or know where the root cause might be?
Many thanks in advance,