Solved

PDC is down

Posted on 2011-02-16
12
1,380 Views
Last Modified: 2012-05-11
Hello,

We run an environment with about 200 users and one domain controller. We used to have a secondary controller, but this was removed some months ago. We started seeing some problems about four days days ago, and this coincided with an exchange database corruption event. Firstly, Active Directory client would not connect to the domain controller and subsequently our external emails, POP, and IMAP would go down. I saw SAM event ids 16645 and 16651 just before this occurred in each instance. We did some troubleshooting and found the problem would get triggered when we connected to the server via RDP. Not using RDP for some time would increase the time between this issue, but did not go away. Restarting netlogon service was also a quick fix and we initiated a bat to do this every two hours. However, the problem still occurs about every eight hours and we have to restart the server and then restart the mail services to get the system up and running again. Naturally this causes some consternation as we are unable to find out the root cause. When I look at the operations masters, the PDC tab gives me an error. Below is my latest dcdiag dump. Any assistance would be highly appreciated.


Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: Default-First-Site-Name\DC
      Starting test: Connectivity
         ......................... DC passed test Connectivity

Doing primary tests
   
   Testing server: Default-First-Site-Name\DC
      Starting test: Replications
         [Replications Check,DC] A recent replication attempt failed:
            From BAANDEV to DC
            Naming Context: DC=ForestDnsZones,DC=etc,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2011-02-16 13:48:36.
            The last success occurred at 2009-11-07 13:55:52.
            11157 failures have occurred since the last success.
         [BAANDEV] DsBindWithSpnEx() failed with error 1753,
         There are no more endpoints available from the endpoint mapper..
         [Replications Check,DC] A recent replication attempt failed:
            From ADC to DC
            Naming Context: DC=ForestDnsZones,DC=etc,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2011-02-16 13:48:43.
            The last success occurred at 2008-10-22 22:53:27.
            20143 failures have occurred since the last success.
         [ADC] DsBindWithSpnEx() failed with error 1722,
         The RPC server is unavailable..
         [Replications Check,DC] A recent replication attempt failed:
            From BAANDEV to DC
            Naming Context: DC=DomainDnsZones,DC=etc,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2011-02-16 13:48:36.
            The last success occurred at 2009-11-07 13:55:52.
            11157 failures have occurred since the last success.
         [Replications Check,DC] A recent replication attempt failed:
            From ADC to DC
            Naming Context: DC=DomainDnsZones,DC=etc,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network troubleshooting, see Windows Help.
            The failure occurred at 2011-02-16 13:48:43.
            The last success occurred at 2008-10-22 22:53:27.
            20143 failures have occurred since the last success.
         [Replications Check,DC] A recent replication attempt failed:
            From ADC to DC
            Naming Context: CN=Schema,CN=Configuration,DC=etc,DC=local
            The replication generated an error (8524):
            The DSA operation is unable to proceed because of a DNS lookup failure.
            The failure occurred at 2011-02-16 13:48:54.
            The last success occurred at 2008-10-22 22:53:27.
            20143 failures have occurred since the last success.
            The guid-based DNS name bd34eb2e-09b6-46fb-b64a-0aa9a56a6c48._msdcs.etc.local
            is not registered on one or more DNS servers.
         [Replications Check,DC] A recent replication attempt failed:
            From BAANDEV to DC
            Naming Context: CN=Schema,CN=Configuration,DC=etc,DC=local
            The replication generated an error (1753):
            There are no more endpoints available from the endpoint mapper.
            The failure occurred at 2011-02-16 13:48:54.
            The last success occurred at 2009-11-07 13:55:52.
            11157 failures have occurred since the last success.
            The directory on BAANDEV is in the process.
            of starting up or shutting down, and is not available.
            Verify machine is not hung during boot.
         [Replications Check,DC] A recent replication attempt failed:
            From ADC to DC
            Naming Context: CN=Configuration,DC=etc,DC=local
            The replication generated an error (8524):
            The DSA operation is unable to proceed because of a DNS lookup failure.
            The failure occurred at 2011-02-16 13:48:43.
            The last success occurred at 2008-10-22 23:43:54.
            20143 failures have occurred since the last success.
            The guid-based DNS name bd34eb2e-09b6-46fb-b64a-0aa9a56a6c48._msdcs.etc.local
            is not registered on one or more DNS servers.
         [Replications Check,DC] A recent replication attempt failed:
            From BAANDEV to DC
            Naming Context: CN=Configuration,DC=etc,DC=local
            The replication generated an error (1753):
            There are no more endpoints available from the endpoint mapper.
            The failure occurred at 2011-02-16 13:48:43.
            The last success occurred at 2009-11-07 13:55:41.
            11157 failures have occurred since the last success.
            The directory on BAANDEV is in the process.
            of starting up or shutting down, and is not available.
            Verify machine is not hung during boot.
         [Replications Check,DC] A recent replication attempt failed:
            From BAANDEV to DC
            Naming Context: DC=etc,DC=local
            The replication generated an error (1753):
            There are no more endpoints available from the endpoint mapper.
            The failure occurred at 2011-02-16 13:48:36.
            The last success occurred at 2009-11-07 13:55:52.
            11156 failures have occurred since the last success.
            The directory on BAANDEV is in the process.
            of starting up or shutting down, and is not available.
            Verify machine is not hung during boot.
         [Replications Check,DC] A recent replication attempt failed:
            From ADC to DC
            Naming Context: DC=etc,DC=local
            The replication generated an error (8524):
            The DSA operation is unable to proceed because of a DNS lookup failure.
            The failure occurred at 2011-02-16 13:49:01.
            The last success occurred at 2008-10-22 23:51:57.
            20143 failures have occurred since the last success.
            The guid-based DNS name bd34eb2e-09b6-46fb-b64a-0aa9a56a6c48._msdcs.etc.local
            is not registered on one or more DNS servers.
         REPLICATION-RECEIVED LATENCY WARNING
         DC:  Current time is 2011-02-16 14:22:40.
            DC=ForestDnsZones,DC=etc,DC=local
               Last replication recieved from BAANDEV at 2009-11-07 14:05:56.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
               Last replication recieved from ADC at 2008-10-22 22:53:27.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
            DC=DomainDnsZones,DC=etc,DC=local
               Last replication recieved from BAANDEV at 2009-11-07 14:05:56.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
               Last replication recieved from ADC at 2008-10-22 22:53:27.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
            CN=Schema,CN=Configuration,DC=etc,DC=local
               Last replication recieved from BAANDEV at 2009-11-07 14:05:56.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
               Last replication recieved from ADC at 2008-10-22 22:53:27.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
            CN=Configuration,DC=etc,DC=local
               Last replication recieved from BAANDEV at 2009-11-07 14:05:45.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
               Last replication recieved from ADC at 2008-10-22 23:43:54.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
            DC=etc,DC=local
               Last replication recieved from BAANDEV at 2009-11-07 14:05:56.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
               Last replication recieved from ADC at 2008-10-22 23:51:57.
               WARNING:  This latency is over the Tombstone Lifetime of 180 days!
         ......................... DC passed test Replications
      Starting test: NCSecDesc
         ......................... DC passed test NCSecDesc
      Starting test: NetLogons
         ......................... DC passed test NetLogons
      Starting test: Advertising
         Warning: DC is not advertising as a time server.
         ......................... DC failed test Advertising
      Starting test: KnowsOfRoleHolders
         Warning: ADC is the PDC Owner, but is not responding to DS RPC Bind.
         [ADC] LDAP search failed with error 58,
         The specified server cannot perform the requested operation..
         Warning: ADC is the PDC Owner, but is not responding to LDAP Bind.
         ......................... DC failed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... DC passed test RidManager
      Starting test: MachineAccount
         ......................... DC passed test MachineAccount
      Starting test: Services
         ......................... DC passed test Services
      Starting test: ObjectsReplicated
         ......................... DC passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... DC passed test frssysvol
      Starting test: frsevent
         ......................... DC passed test frsevent
      Starting test: kccevent
         ......................... DC passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x40000005
            Time Generated: 02/16/2011   13:29:57
            Event String: The kerberos client received a KRB_AP_ERR_TKT_NYV

         An Error Event occured.  EventID: 0x0000165B
            Time Generated: 02/16/2011   14:18:26
            Event String: The session setup from computer 'VERAMANI' failed

         An Error Event occured.  EventID: 0x000016AD
            Time Generated: 02/16/2011   14:20:27
            Event String: The session setup from the computer VERAMANI

         ......................... DC failed test systemlog
      Starting test: VerifyReferences
         ......................... DC passed test VerifyReferences
   
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : etc
      Starting test: CrossRefValidation
         ......................... etc passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... etc passed test CheckSDRefDom
   
   Running enterprise tests on : etc.local
      Starting test: Intersite
         ......................... etc.local passed test Intersite
      Starting test: FsmoCheck
         Warning: DcGetDcName(PDC_REQUIRED) call failed, error 1355
         A Primary Domain Controller could not be located.
         The server holding the PDC role is down.
         Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
         A Time Server could not be located.
         The server holding the PDC role is down.
         Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
         A Good Time Server could not be located.
         ......................... etc.local failed test FsmoCheck
0
Comment
Question by:amit_krishnan
  • 4
  • 4
  • 2
  • +2
12 Comments
 
LVL 2

Expert Comment

by:storkyIV
Comment Utility
Hello,

So the log states that the DC is trying to replicate, as you've said you no longer have a secondary DC, I would turn this off for a starter!
0
 
LVL 11

Expert Comment

by:Snibborg
Comment Utility
When you shut down the other DC are you certain it was not the first DC in the domain?  Did you check where the FSMO roles were located to confirm this?

Snibborg
0
 
LVL 2

Expert Comment

by:danny1875
Comment Utility
From what i can see, it seems as if the secondary DC was holding the PDC Emulator FSMO role. If the secondary DC is down and will not be re-introduced into the organisation you could seize that role on your current working DC. Before you do anything here you should check to see where the roles are currently being held and take things from there..
0
 
LVL 11

Expert Comment

by:Snibborg
Comment Utility
Here are some articles to assist you in this:

http://support.microsoft.com/kb/255690
http://support.microsoft.com/kb/324801

Snibborg
0
 

Author Comment

by:amit_krishnan
Comment Utility
Thank you for your quick replies. The secondary domain was removed about 8 months ago, before I joined the organization, and I am unsure what protocol was followed when this change occurred. However, we have experienced no problems until the 9th of February, when all this started happening.

@snibborg, I appreciate the links. However through the graphical interface all checks out except the PDC emulator tab, which shows an error and will not allow me to change the master as it says the current role holder is offline.

I think there are multiple issues in our setup, however the current problems have occurred only recently while the replication errors have been occurring for some time now. The main thing worrying me the the line from the final FSMO test in dc diag, Starting test: FsmoCheck
         Warning: DcGetDcName(PDC_REQUIRED) call failed, error 1355
         A Primary Domain Controller could not be located.

I am pretty new at this so all help is highly appreciated. Thanks.
0
 
LVL 2

Assisted Solution

by:danny1875
danny1875 earned 250 total points
Comment Utility
Amit,

check to see where the DC thinks the PDC Emulator role is held. You can do this via ntdsutil. As i mentioned in my previous post you may well need to seize that role on your working domian controller
0
Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

 
LVL 27

Accepted Solution

by:
KenMcF earned 250 total points
Comment Utility
You are going to have to run a metadatacleanup of the old DC that was removed and sieze the fsmoo roles. It looks like the DC was not removed properly. Here are links to the steps to run both.


http://support.microsoft.com/kb/216498
http://support.microsoft.com/kb/216498
0
 

Author Comment

by:amit_krishnan
Comment Utility
Thank you for all the advice. Please find below the latest dcdiag dump. It looks a lot cleaner, but getting a time server error which I am working to fix now. I guess it is a matter of waiting to see if the problem recurs.

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: Default-First-Site-Name\DC
      Starting test: Connectivity
         ......................... DC passed test Connectivity

Doing primary tests
   
   Testing server: Default-First-Site-Name\DC
      Starting test: Replications
         ......................... DC passed test Replications
      Starting test: NCSecDesc
         ......................... DC passed test NCSecDesc
      Starting test: NetLogons
         ......................... DC passed test NetLogons
      Starting test: Advertising
         Warning: DC is not advertising as a time server.
         ......................... DC failed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... DC passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... DC passed test RidManager
      Starting test: MachineAccount
         ......................... DC passed test MachineAccount
      Starting test: Services
         ......................... DC passed test Services
      Starting test: ObjectsReplicated
         ......................... DC passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... DC passed test frssysvol
      Starting test: frsevent
         ......................... DC passed test frsevent
      Starting test: kccevent
         An Warning Event occured.  EventID: 0x8000072D
            Time Generated: 02/16/2011   15:51:45
            (Event String could not be retrieved)
         ......................... DC failed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0xC0001B77
            Time Generated: 02/16/2011   15:51:45
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00004105
            Time Generated: 02/16/2011   16:02:16
            Event String: The maximum account identifier allocated to this

         ......................... DC failed test systemlog
      Starting test: VerifyReferences
         ......................... DC passed test VerifyReferences
   
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : etc
      Starting test: CrossRefValidation
         ......................... etc passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... etc passed test CheckSDRefDom
   
   Running enterprise tests on : etc.local
      Starting test: Intersite
         ......................... etc.local passed test Intersite
      Starting test: FsmoCheck
         Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
         A Time Server could not be located.
         The server holding the PDC role is down.
         Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
         A Good Time Server could not be located.
         ......................... etc.local failed test FsmoCheck
0
 
LVL 2

Expert Comment

by:danny1875
Comment Utility
Hi Amit,

You definitely need to get that PDC Emulator role seized onto your working DC, that will be the reason for the time server error.
0
 

Author Comment

by:amit_krishnan
Comment Utility
I have followed the procedure to clean up the metadata and seize the PDC emulator role to the DC, and restart the time service, after which all tests except systemlog have passed. All seems to be working fine now. I am able to create new users without hanging the system. Thank you all for the assistance.

Will monitor for some time and try to resolve the systemlog issue as well. Any idea if this indicates a critical problem?

     Starting test: systemlog
         An Error Event occured.  EventID: 0xC0001B77
            Time Generated: 02/16/2011   15:51:45
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00004105
            Time Generated: 02/16/2011   16:02:16
            Event String: The maximum account identifier allocated to this

         ......................... DC failed test systemlog
0
 
LVL 2

Expert Comment

by:danny1875
Comment Utility
Amit,

Basically what that is saying is that you have errors in your event logs. Try saving them and clearing them. re-run dcdiag and you should be all set.
0
 

Author Comment

by:amit_krishnan
Comment Utility
All seems to be clean and working now. Very much appreciate the assistance. Thanks.
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

This may not be a text book method to resolve VSS backup issues but it seemed to have worked on few of the Windows 2003 servers we had issues while performing a Volume Shadow Copy backup. If you have issues while performing a shadow copy backup usin…
Many of us need to configure DHCP server(s) in their environment. We can do that simply via DHCP console on server or using MMC snap-in on each computer with Administrative Tools installed in a network. But what if we have to configure many DHCP ser…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now