Link to home
Start Free TrialLog in
Avatar of paddyboz
paddyboz

asked on

DNS issue - ping successful but replication failing

Hi Experts. Would be profoundly grateful for any help on this. This is both an Exchange and windows AD question but mostly AD/DNS I suspect.
Our company encompasses four sites -
The head office where all applications of any importance are, and three others connected via VPN.
We have a single windows domain covering all four sites. There are single DCs at three of the sites which are small satellite offices and each DC at the satellite sites is also a GC, DNS and DHCP server.
There were three DCs at the main site.
The three DCs at the main site were/are a SQL server, an Exchange Server and a file and print server.
All has been working very well (particularly Exchange 2003) until.....

We had problems with our Exchange Server - the system attendant would not start, and as the problem seemed to be related in some way to communication between AD and Exchange we rashly took the decision to demote it from a DC to a member server. I know (now!) that MS does not support this however at the time it worked and the services all started ok.

Not long afterwards however things started to behave strangely. Outlook 2000 clients at remote sites hang when connecting to Exchange. Some Outlook 2003 clients do but not all it seems. Removing the outlook profile completely and recreating it seems to address the problem temporarily but it recurs. Outlook users at the head office have no issues with connection to the Exchange Server.

We are having issues with replication of DCs between the satellite sites and the main site - KCC errors 1311 and 1865, and NTDS replication errors of 1232 and 1188. These would suggest that names cannot be resolved or their is no IP connectivity.

Internal DNS name resolution is not working for clients at satellites trying to conect to intranet sites at the head office. When pinging by the same host name it works fine.
I have changed the DNS client of my DCs at the remote sites to point to a DNS server at the head office but this has not had any effect.
I have made any number of changes to sites and services to try and persuade the DCs to see each other. My understanding is that most of this is done automatically but I have nonetheless manually set bridgehead servers between the spokes and the hub.

DNS would seem to be the culprit but there is something darned strange going on that is bewildering me. DNS appears to be working fine! Everything resolves OK when pinging and all the relevant records seem to be present in the DNS servers at each site.

This is not an IP issue as far as i can see - we have never had issues with our IP connectivity and at the IP level everything seems to work as before. I have checked with portqry to see if the ports are all available and they certainly seem to be. It just seems to be an issue of name resolution that is not resolving (except when you ping!).

Any help much appreciated.
Avatar of joedoe58
joedoe58

Did you try running netdiag and dcdiag on all DC's?
so nslookups do not work correctly, but pings do?  for example:

nslookup dc1.domain.com fails
ping dc1.domain.com      successfuly
is that what you are experiencing?  if so the only thing i can think of is that you have a correct host record on the box for dc1.domain.com, since nslookups would not use this host file, but the ping would.
Avatar of paddyboz

ASKER

Hi to both. DCDIAG on the bridgehead server at the main site produces:

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: Marbella\CENTRAL-TECH01
      Starting test: Connectivity
         ......................... CENTRAL-TECH01 passed test Connectivity

Doing primary tests
   
   Testing server: Marbella\CENTRAL-TECH01
      Starting test: Replications
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From OAS01 to CENTRAL-TECH01
            Naming Context: CN=Schema,CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 17:06:11.
            The last success occurred at 2005-06-03 20:38:26.
            129 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From VERA01 to CENTRAL-TECH01
            Naming Context: CN=Schema,CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 17:18:49.
            The last success occurred at 2005-05-21 10:38:15.
            663 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From HEVER-MAIN to CENTRAL-TECH01
            Naming Context: CN=Schema,CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 17:25:08.
            The last success occurred at 2005-05-21 10:38:15.
            672 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From OAS01 to CENTRAL-TECH01
            Naming Context: CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 16:53:30.
            The last success occurred at 2005-06-03 20:41:26.
            163 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From VERA01 to CENTRAL-TECH01
            Naming Context: CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 16:59:49.
            The last success occurred at 2005-05-21 10:38:13.
            880 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From HEVER-MAIN to CENTRAL-TECH01
            Naming Context: CN=Configuration,DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-07 17:12:29.
            The last success occurred at 2005-05-21 10:38:15.
            878 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From OAS01 to CENTRAL-TECH01
            Naming Context: DC=rmicentral,DC=com
            The replication generated an error (1722):
            Win32 Error 1722
            The failure occurred at 2005-06-03 18:35:25.
            The last success occurred at 2005-06-03 10:10:51.
            1 failures have occurred since the last success.
            [OAS01] DsBindWithSpnEx() failed with error 1727,
            Win32 Error 1727.
            The source remains down. Please check the machine.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From VERA01 to CENTRAL-TECH01
            Naming Context: DC=rmicentral,DC=com
            The replication generated an error (1818):
            Win32 Error 1818
            The failure occurred at 2005-06-03 20:18:05.
            The last success occurred at 2005-05-21 10:38:16.
            25 failures have occurred since the last success.
         [Replications Check,CENTRAL-TECH01] A recent replication attempt failed:
            From HEVER-MAIN to CENTRAL-TECH01
            Naming Context: DC=rmicentral,DC=com
            The replication generated an error (1727):
            Win32 Error 1727
            The failure occurred at 2005-06-03 20:55:58.
            The last success occurred at 2005-05-21 10:38:16.
            25 failures have occurred since the last success.
         REPLICATION LATENCY WARNING
         CENTRAL-TECH01: A long-running replication operation is in progress
            The job has been executing for 3 minutes and 0 seconds.
            Replication of new changes along this path will be delayed.
            Error: Higher priority replications are being blocked
            Enqueued 2005-06-07 16:58:12 at priority 170
            Op: SYNC FROM SOURCE
            NC CN=Configuration,DC=rmicentral,DC=com
            DSADN CN=NTDS Settings,CN=OAS01,CN=Servers,CN=oasis,CN=Sites,CN=Configuration,DC=rmicentral,DC=com
            DSA transport addr ada8d20b-2fe0-487f-a4b0-959e272aa43a._msdcs.rmicentral.com
         REPLICATION-RECEIVED LATENCY WARNING
         CENTRAL-TECH01:  Current time is 2005-06-07 17:28:08.
            CN=Schema,CN=Configuration,DC=rmicentral,DC=com
               Last replication recieved from OAS01 at 2005-06-03 20:38:23.
               Last replication recieved from HEVER-MAIN at 2005-05-21 10:38:15.
            CN=Configuration,DC=rmicentral,DC=com
               Last replication recieved from OAS01 at 2005-06-03 20:41:26.
               Last replication recieved from HEVER-MAIN at 2005-05-21 10:38:14.
            DC=rmicentral,DC=com
               Last replication recieved from OAS01 at 2005-06-03 10:10:51.
               Last replication recieved from HEVER-MAIN at 2005-05-21 10:38:16.
         REPLICATION-RECEIVED LATENCY WARNING

          Source site:

         CN=NTDS Site Settings,CN=oasis,CN=Sites,CN=Configuration,DC=rmicentral,DC=com

          Current time: 2005-06-07 17:34:27

          Last update time: 2005-06-03 20:36:19

          Check if source site has an elected ISTG running.

          Check replication from source site to this server.
         REPLICATION-RECEIVED LATENCY WARNING

          Source site:

         CN=NTDS Site Settings,CN=Vera,CN=Sites,CN=Configuration,DC=rmicentral,DC=com

          Current time: 2005-06-07 17:34:27

          Last update time: 2005-05-30 16:26:25

          Check if source site has an elected ISTG running.

          Check replication from source site to this server.
         REPLICATION-RECEIVED LATENCY WARNING

          Source site:

         CN=NTDS Site Settings,CN=Hever,CN=Sites,CN=Configuration,DC=rmicentral,DC=com

          Current time: 2005-06-07 17:34:27

          Last update time: 2005-05-21 10:27:49

          Check if source site has an elected ISTG running.

          Check replication from source site to this server.
         ......................... CENTRAL-TECH01 passed test Replications
      Starting test: NCSecDesc
         ......................... CENTRAL-TECH01 passed test NCSecDesc
      Starting test: NetLogons
         ......................... CENTRAL-TECH01 passed test NetLogons
      Starting test: Advertising
         ......................... CENTRAL-TECH01 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... CENTRAL-TECH01 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... CENTRAL-TECH01 passed test RidManager
      Starting test: MachineAccount
         ......................... CENTRAL-TECH01 passed test MachineAccount
      Starting test: Services
         ......................... CENTRAL-TECH01 passed test Services
      Starting test: ObjectsReplicated
         ......................... CENTRAL-TECH01 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... CENTRAL-TECH01 passed test frssysvol
      Starting test: frsevent
         ......................... CENTRAL-TECH01 passed test frsevent
      Starting test: kccevent
         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 06/07/2005   17:22:25
            Event String: The Knowledge Consistency Checker (KCC) has

         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 06/07/2005   17:22:25
            Event String: The Knowledge Consistency Checker (KCC) was

         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 06/07/2005   17:22:25
            Event String: All domain controllers in the following site that

         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 06/07/2005   17:22:25
            Event String: The Knowledge Consistency Checker (KCC) has

         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 06/07/2005   17:22:25
            Event String: The Knowledge Consistency Checker (KCC) was

         ......................... CENTRAL-TECH01 failed test kccevent
      Starting test: systemlog
         ......................... CENTRAL-TECH01 passed test systemlog
      Starting test: VerifyReferences
         ......................... CENTRAL-TECH01 passed test VerifyReferences
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : rmicentral
      Starting test: CrossRefValidation
         ......................... rmicentral passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... rmicentral passed test CheckSDRefDom
   
   Running enterprise tests on : rmicentral.com
      Starting test: Intersite
         ......................... rmicentral.com passed test Intersite
      Starting test: FsmoCheck
         ......................... rmicentral.com passed test FsmoCheck

this is dcdiag from one of the remote sites:


Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: oasis\OAS01
      Starting test: Connectivity
         ......................... OAS01 passed test Connectivity

Doing primary tests

   Testing server: oasis\OAS01
      Starting test: Replications
         [Replications Check,OAS01] A recent replication attempt failed:
            From CENTRAL-TECH01 to OAS01
            Naming Context: DC=rmicentral,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2005-06-06 15:12.31.
            The last success occurred at 2005-06-03 20:35.50.
            4 failures have occurred since the last success.
            The replication RPC call executed for too long at the server and
            was cancelled.
            Check load and resouce usage on CENTRAL-TECH01.
         [Replications Check,OAS01] A recent replication attempt failed:
            From CENTRAL-MAIN to OAS01
            Naming Context: DC=rmicentral,DC=com
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2005-06-07 13:09.58.
            The last success occurred at 2005-06-01 09:09.54.
            234 failures have occurred since the last success.

netdiag to follow:


Avatar of Chris Dent
DCDiag is a good idea as Joe suggests. Rather than running it on all DCs individually you might want to try:

dcdiag /e /c /v /f:output.txt

Which runs comprehensive tests against all DCs in the enterprise and outputs everything to the file output.txt (there will be a pretty huge amount of output). That along with NetDiag (again as Joe suggests) should be able to point you towards some kind of failure.
nslookup works - it can resolve fine, but the nslookup from dos looks like this from the server at the remote site:

default server: unknown
address: 192.168.20.20

at the head office site it is correct. Not sure if this is relevant. It seems to suggest that the DNS server doesnt know its own name....??

"Default server: unknown" means that it's missing a Reverse Lookup zone for 192.168.20.x.
ASKER CERTIFIED SOLUTION
Avatar of Debsyl99
Debsyl99

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
DCs are not SP1.

Time sync is OK - both are near enough the right time (about 2 seconds difference).
The RPC MTU issue sounds like it could be the problem, but as I say, the two servers we are looking at here are not service packed.
They are both windows 2003 std edition.

netdiag from our head office bridgehead server produces the following:


....................................

    Computer Name: CENTRAL-TECH01
    DNS Host Name: central-tech01.rmicentral.com
    System info : Windows 2000 Server (Build 3790)
    Processor : x86 Family 15 Model 2 Stepping 9, GenuineIntel
    List of installed hotfixes :
        KB282010
        KB817789
        KB819696
        KB823182
        KB823353
        KB823559
        KB824105
        KB824141
        KB824146
        KB824151
        KB825119
        KB828035
        KB828741
        KB828750
        KB832894
        KB833987
        KB834707
        KB835732
        KB837001
        KB837009
        KB839643
        KB839645
        KB840315
        KB840374
        KB840987
        KB841356
        KB841533
        KB842773
        KB867282
        KB867460
        KB871250
        KB873333
        KB873376
        KB885250
        KB885834
        KB885835
        KB885836
        KB886903
        KB888113
        KB890047
        KB890175
        KB890859
        KB890923
        KB891711
        KB891781
        KB893066
        KB893086
        KB893803
        KB893803v2
        Q147222
        Q828026


Netcard queries test . . . . . . . : Passed



Per interface results:

    Adapter : Lan 1

        Netcard queries test . . . : Passed

        Host Name. . . . . . . . . : central-tech01
        IP Address . . . . . . . . : 192.168.0.26
        Subnet Mask. . . . . . . . : 255.255.255.0
        Default Gateway. . . . . . : 192.168.0.250
        Dns Servers. . . . . . . . : 192.168.0.26


        AutoConfiguration results. . . . . . : Passed

        Default gateway test . . . : Passed

        NetBT name test. . . . . . : Passed
        [WARNING] At least one of the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names is missing.

        WINS service test. . . . . : Skipped
            There are no WINS servers configured for this interface.


Global results:


Domain membership test . . . . . . : Passed


NetBT transports test. . . . . . . : Passed
    List of NetBt transports currently configured:
        NetBT_Tcpip_{CF20C948-BEA2-4F8C-A790-BE6D02FDC2F3}
    1 NetBt transport currently configured.


Autonet address test . . . . . . . : Passed


IP loopback ping test. . . . . . . : Passed


Default gateway test . . . . . . . : Passed


NetBT name test. . . . . . . . . . : Passed
    [WARNING] You don't have a single interface with the <00> 'WorkStation Service', <03> 'Messenger Service', <20> 'WINS' names defined.


Winsock test . . . . . . . . . . . : Passed


DNS test . . . . . . . . . . . . . : Passed
    PASS - All the DNS entries for DC are registered on DNS server '192.168.0.26' and other DCs also have some of the names registered.


Redir and Browser test . . . . . . : Passed
    List of NetBt transports currently bound to the Redir
        NetBT_Tcpip_{CF20C948-BEA2-4F8C-A790-BE6D02FDC2F3}
    The redir is bound to 1 NetBt transport.

    List of NetBt transports currently bound to the browser
        NetBT_Tcpip_{CF20C948-BEA2-4F8C-A790-BE6D02FDC2F3}
    The browser is bound to 1 NetBt transport.


DC discovery test. . . . . . . . . : Passed


DC list test . . . . . . . . . . . : Passed


Trust relationship test. . . . . . : Passed
    Secure channel for domain 'RMICENTRAL' is to '\\CENTRAL-MAIN.rmicentral.com'.


Kerberos test. . . . . . . . . . . : Passed


LDAP test. . . . . . . . . . . . . : Passed
    [WARNING] Failed to query SPN registration on DC 'OAS01.rmicentral.com'.
    [WARNING] Failed to query SPN registration on DC 'vera01.rmicentral.com'.
    [WARNING] Failed to query SPN registration on DC 'HEVER-MAIN.rmicentral.com'.


Bindings test. . . . . . . . . . . : Passed


WAN configuration test . . . . . . : Skipped
    No active remote access connections.


Modem diagnostics test . . . . . . : Passed

IP Security test . . . . . . . . . : Skipped

    Note: run "netsh ipsec dynamic show /?" for more detailed information


The command completed successfully
Had to deal with MS to fix this in the end. Was given MS05-019 to patch, also hot fixes 899148 and 898060, and registry fixes as follows:

1. Click "Start", click "Run", type "regedit" (without the quotation marks), and then click "OK"
2. Locate and then click the following registry subkey:
   HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT\Rpc
3. Click the "Edit" menu, point to "New", and then click "DWORD Value".
4. Type "Server2003NegotiateDisable" (without the quotation marks) as the name of the new DWORD Value.
5. Right-click " Server2003NegotiateDisable", and then click "Modify".
6. In the "Value Data" box, type "1", and then click "OK".
7. Quit Registry Editor. Restart the Windows Server 2003-based computer.


http://support.microsoft.com/?id=898060 should also give an idea of what happened.

I pointed out to MS that this was an issue caused by MS patches and suggested that support fees should be waived. They agreed, but I wonder had I not said anything whether they might just have taken my money anyway...