Link to home
Start Free TrialLog in
Avatar of ams_group
ams_group

asked on

Domain Controllers and disaster recovery

Hi all,

I'm testing disaster recovery of our primary Windows Server 2003 domain controller before making the leap to Windows Server 2008.

It hasn't been smooth sailing so far...

We have two domain controllers, both are GC and the primary one I'm trying to restore has all the FMSO roles. The second domain controller is on a different subnet.

The first option was to try restoring from one of the Commvault backups. We currently use Commvault 8 with SP5. To test this in an isolated environement, I created a VLAN and moved the Commserve to the new VLAN along with a newly deployed Windows 2003 VMware virtual machine. Once I got the servers talking to each other and updated host files with each of their IP addresses, the next step was to install the iData agent on the VM.. this went ok.
To run the restore, I restarted the new VM in directory services restore mode and ran a full restore (overwriting with newer data or non exisiting) including system state which completed successfully. On restarting server, I got an error at login prompt about directory services unavailable, press enter to restart in DSRM. Second attempt was to redeploy the VM and restore again with uncondtional overwrite. This got to 91 percent and failed with an error restoring SYSVOL. Third time was to try DCPROMO on the new VM first and do a restore of System State, SYSVOL and LOG partitions only. Same error at 91 percent.

Next option was to try restore from vRanger snapshot to the same isolated network. This went successfully but was unable to join another computer to the domain or add users after logging into the domain controller. Received error: "the directory service was unable to allocate a relative identifier". This seems to be due to the RID master and replication between the second domain controller after running DCDiag... output below.

      Starting test: Replications
         [Replications Check,SV-DC1] A recent replication attempt failed:
            From DMZ-DC1 to SV-DC1
            Naming Context: DC=ForestDnsZones,DC=ams,DC=local
            The replication generated an error (1256):
            Win32 Error 1256
            The failure occurred at 2011-10-05 16:53:21.
            The last success occurred at 2011-10-04 19:54:02.
            1 failures have occurred since the last success.
         [Replications Check,SV-DC1] A recent replication attempt failed:
            From DMZ-DC1 to SV-DC1
            Naming Context: DC=DomainDnsZones,DC=ams,DC=local
            The replication generated an error (1256):
            Win32 Error 1256
            The failure occurred at 2011-10-05 16:53:21.
            The last success occurred at 2011-10-04 19:54:02.
            1 failures have occurred since the last success.
         [Replications Check,SV-DC1] A recent replication attempt failed:
            From DMZ-DC1 to SV-DC1
            Naming Context: CN=Schema,CN=Configuration,DC=ams,DC=local
            The replication generated an error (1722):
            Win32 Error 1722
            The failure occurred at 2011-10-05 16:53:42.
            The last success occurred at 2011-10-04 19:54:02.
            2 failures have occurred since the last success.
            [DMZ-DC1] DsBindWithSpnEx() failed with error 1722,
            Win32 Error 1722.
            The source remains down. Please check the machine.
         [Replications Check,SV-DC1] A recent replication attempt failed:
            From DMZ-DC1 to SV-DC1
            Naming Context: CN=Configuration,DC=ams,DC=local
            The replication generated an error (1722):
            Win32 Error 1722
            The failure occurred at 2011-10-05 16:53:21.
            The last success occurred at 2011-10-04 19:54:02.
            2 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,SV-DC1] A recent replication attempt failed:
            From DMZ-DC1 to SV-DC1
            Naming Context: DC=ams,DC=local
            The replication generated an error (1722):
            Win32 Error 1722
            The failure occurred at 2011-10-05 16:54:03.
            The last success occurred at 2011-10-04 19:54:02.
            2 failures have occurred since the last success.
            The source remains down. Please check the machine.
         REPLICATION-RECEIVED LATENCY WARNING
         SV-DC1:  Current time is 2011-10-05 17:47:50.
            DC=ForestDnsZones,DC=ams,DC=local
               Last replication recieved from DMZ-DC1 at 2011-10-04 19:54:02.
            DC=DomainDnsZones,DC=ams,DC=local
               Last replication recieved from DMZ-DC1 at 2011-10-04 19:54:02.
            CN=Schema,CN=Configuration,DC=ams,DC=local
               Last replication recieved from DMZ-DC1 at 2011-10-04 19:54:02.
            CN=Configuration,DC=ams,DC=local
               Last replication recieved from DMZ-DC1 at 2011-10-04 19:54:02.
            DC=ams,DC=local
               Last replication recieved from DMZ-DC1 at 2011-10-04 19:54:02.
         ......................... SV-DC1 passed test Replications
      Starting test: NCSecDesc
         ......................... SV-DC1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... SV-DC1 passed test NetLogons
      Starting test: Advertising
         ......................... SV-DC1 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SV-DC1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         The DS has corrupt data: rIDPreviousAllocationPool value is not valid
         No rids allocated -- please check eventlog.
         ......................... SV-DC1 failed test RidManager
      Starting test: MachineAccount
         ......................... SV-DC1 passed test MachineAccount
      Starting test: Services
         ......................... SV-DC1 passed test Services
      Starting test: ObjectsReplicated
         ......................... SV-DC1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SV-DC1 passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... SV-DC1 failed test frsevent

The thing is this DC has all the roles so I don't see why it needs replication to be working? Also with the second DC being on a different subnet and a different location, it is more difficult to test in my lab environment. Is there a way to fix this without the second DC and what are best practices when backing up/restore domain controllers? Has anyone had experience restoring a domain controller using Commvault? In saying that, one way should a DC fail is to introduce a brand new one and have the remaining DC replicate to the new one. However what happens if there is corruption in AD that affects both DCs?

Sorry about the long message... wanted to make it as clear as possible. Looking forward to your responses?
Avatar of Reubenwelsh
Reubenwelsh

Hi,

This is how we have done the last 3-4 upgrades we have done in environments with 2 DC's.

Shut down one of the DC's. Bring up the forest level / Domain functionallity level with a new 2008 DC. When your satisfied its all gone as planned start up the old DC server.

If everything was to go terribly, shut down all DC servers and boot up the 2003 server that has been shut down.

Claim the FSMO roles on that 2003 server and your up and running again.
Avatar of ams_group

ASKER

Thanks for the response. This covers how we can upgrade to 2008 DC but not how to restore a DC from backup. As stated I've tried two methods and both have failed. Commvault which was non-authoritative restore (not sure I mentioned that) and restoring from VM snapshot using vRanger. Which is the best method for restoring a DC and the steps involved?

Thanks
Ok, I've managed to get the restored DC working now. Had to delete secondary DC and the site that DC is a member of. So in DR scenario if AD become corrupt, after shutting down both DCs, it would be possible to restore a DC from the vRanger backup then promote a member server to bring back the second DC. Still haven't worked out how to restore using Commvault but the vRanger option is quicker and simpler.

 
Hi,

I think the only real supported way from microsoft is to restore from a systemstate backup. But it isn't that easy to be honest. Only had the issue once at our company thankfully we had a guy who had worked for microsoft with AD issues so he knew what to do.

I would not rely on snapshots since they arn't always 100%

This article covers backup / restore of: http://technet.microsoft.com/en-us/library/bb727048.aspx
Thanks for the link, I've read through it and are going to try system state restore again on Monday. In the meantime I've been reading more on snapshoting DCs and found a very interesting discussion.

http://communities.vmware.com/message/1546097
Have now successfully restored from Commvault after much trial and error. It's known issue that Microsoft is aware of, when the RID master is on the PDC in a two DC environment with replication enabled. Am getting the same problem restoring from vRanger too.

http://support.microsoft.com/?kbid=822053

Steps to restore a DC using Commvault are:

1.      Boot in Directory Services Restore Mode
2.      Authoritative Restore of System State and C: drive, force overwrite
3.      Reboot back into Directory Services Restore Mode and log in as Administrator using the DSRM restore password.
4.      Run NTDSUTIL and from the prompt type authoritative restore then restore database

http://support.microsoft.com/kb/241594

5.      Reboot back into DSRM and copy contents of NTfrs_Preexisting folder to %Windows%Sysvol\Sysvol\Domain Name folder  
Locate HKLM\SYSTEM\CurrentControlSet\Services\Ntfrs\Parameters\Backup/Restore\Process at Startup
6.      Set BurFlags registry entry to D4 and reboot.

http://support.microsoft.com/kb/958804
ASKER CERTIFIED SOLUTION
Avatar of ams_group
ams_group

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Resolved by myself in the end