How do I verify redundancy in a domain controller?

robw24
robw24 used Ask the Experts™
on
I have a native Windows 2003 domain with two domain controllers and a single site. I thought my domain controllers were redundant of each other, meaning that if one failed hard than the other would maintain all domain services. Recently, while one of them was restarting after installing Microsoft Updates, I ran into an issue. I was on another server in the domain, an Exchange 2003 server, in the Active Directory Users & Computers tool, trying to add a user into an existing security group. I was not able to do it. I forget what the exact message was but it had to do with not being able to contact the domain controller that was rebooting. The domain contoller that was rebooting is actually the secondary controller.

My question is, how exactly do I make sure that the domain controllers are redundant for the domain? I don't want any active directory services unavailable if I need to restart one of them.

I also would like to know how to make sure that they both are up to date with USN's (Update Sequence Numbers)? They seem to have different numbers although at the same time, they seem to pass communication tests with each other.

Thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
If one fails, and it holds a FSMO (Flexible single Master operations) role, you will need to move (or seize if the server is NEVER going to be brought back up) the roles that were hosted by the dead server:

MS info:
http://support.microsoft.com/kb/324801
How to view and transfer FSMO roles in Windows Server 2003


http://www.petri.co.il/understanding_fsmo_roles_in_ad.htm
Understanding FSMO Roles in Active Directory

Top Expert 2012
Commented:
First make sure both DCs are Global Catalogs. Make sure both DCs point to themselves for DNS for primary then the other for secondary. All clients and servers should point to these systems for DNS only in their TCP\IP settings.

Run dcdiag to test the health of the systems.
Top Expert 2013

Commented:
if you want to look at USNs use
repadmin with the /showutdved and show /objmeta switches
In addition to dcdiag run  repadmin /showreps and check your event logs
Thanks
Mike
 
Should you be charging more for IT Services?

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Commented:
your issue was the Relative ID master FSMO role was unavailable while the DC that holds the role was rebooting.

The Relative ID master controls the handing out of ID as users are creating so duplicate SIDs are not created in the domain.   If it was just a server reboot, then you probably have no worries.
Top Expert 2013

Commented:
Probably not the RID master being down.  The RID master hands out 500 at a time and the DC asks for a new pool at 50% (250)
If he was trying to bulk add hundreds or thousands of users and the RID master was down then it would be an issue but not for a few users.
Thanks
Mike

Author

Commented:
Thanks, I will report back once I investigate these responses.

Author

Commented:
I checked DNS and each server pointed to itself for primary and the secondary was empty. I corrected this but I doubt this was the issue.

I verified that each server is listed as a Global Catalog, so that could not have been the issue.

I then ran dcdiag on the server (primary) that did not reboot, and here is the results:

P:\>dcdiag

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\SRV1
      Starting test: Connectivity
         ......................... SRV1 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\SRV1
      Starting test: Replications
         ......................... SRV1 passed test Replications
      Starting test: NCSecDesc
         ......................... SRV1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... SRV1 passed test NetLogons
      Starting test: Advertising
         ......................... SRV1 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SRV1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SRV1 passed test RidManager
      Starting test: MachineAccount
         ......................... SRV1 passed test MachineAccount
      Starting test: Services
         ......................... SRV1 passed test Services
      Starting test: ObjectsReplicated
         ......................... SRV1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SRV1 passed test frssysvol
      Starting test: frsevent
         ......................... SRV1 passed test frsevent
      Starting test: kccevent
         ......................... SRV1 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:48
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:49
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:49
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:50
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:50
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:50
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:51
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:08:51
            (Event String could not be retrieved)
         ......................... SRV1 failed test systemlog
      Starting test: VerifyReferences
         ......................... SRV1 passed test VerifyReferences

   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidatio

      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom

   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidatio

      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : domain
      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom

   Running enterprise tests on : domain.com
      Starting test: Intersite
         ......................... domain.com passed test Intersite
      Starting test: FsmoCheck
         ......................... domain.com passed test FsmoCheck

P:\>

And then I ran dcdiag on the server (secondary) that rebooted:

P:\>dcdiag

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\SRV2
      Starting test: Connectivity
         ......................... SRV2 passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\SRV2
      Starting test: Replications
         ......................... SRV2 passed test Replications
      Starting test: NCSecDesc
         ......................... SRV2 passed test NCSecDesc
      Starting test: NetLogons
         ......................... SRV2 passed test NetLogons
      Starting test: Advertising
         ......................... SRV2 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SRV2 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SRV2 passed test RidManager
      Starting test: MachineAccount
         ......................... SRV2 passed test MachineAccount
      Starting test: Services
         ......................... SRV2 passed test Services
      Starting test: ObjectsReplicated
         ......................... SRV2 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SRV2 passed test frssysvol
      Starting test: frsevent
         ......................... SRV2 passed test frsevent
      Starting test: kccevent
         ......................... SRV2 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:21
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:22
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:22
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:23
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:23
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 10/19/2010   08:23:24
            (Event String could not be retrieved)
         ......................... SRV2 failed test systemlog
      Starting test: VerifyReferences
         ......................... SRV2 passed test VerifyReferences

   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom

   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : domain

      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom

   Running enterprise tests on : domain.com
      Starting test: Intersite
         ......................... domain.com passed test Intersite
      Starting test: FsmoCheck
         ......................... domain.com passed test FsmoCheck

P:\>


I tried running repadmin with the /showutdved switch but it seems that switch is invalid, as well as the /objmeta switch.

Running repadmin /showreps shows the other server as successful rep partner, with no errors in the logs.


When I go into Active Directory Sites & Services and right-click on NTDS settings of the primary server (SR1), and then the Object tab, USN's show as Current: 12586 and Original: 4418

For the secondary server (SR2), the one that rebooted, it shows Current: 58760 and Original: 13926

Not sure what to do next.
Commented:
Since I stopped receiving replies to my last response which addressed questions that were asked of me, I had no choice but to open a case with Microsoft. They told me that when a domain controller reboots, it is normal for some computers and clients to run into problems with authentication. This is because although I have multiple domain controllers, some servers and clients may still try to communicate directly with the domain controller that has rebooted. The work around for me, when I was adding a user to a group (from the users and computers applet on our exchange server) when the secondary domain controller was being rebooted, was to do this from the primary domain controller instead of a non-domain controller.

Author

Commented:
The answer from Microsoft was 3/5ths of my question. The rest was running dcdiag to test the overall health.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial