Restore virtual secondary domain controller

I have a client with two 2008 R2 domain controllers running on Esx 5.1 on two separate hosts. DC1 and DC2. DC1 is the primary. I verified that DC1 has all 5 FSMO roles. Yesterday the drive containing
Who is Participating?
bevegeAuthor Commented:
Here's what we ended up doing.

Ok, so having heard the same arguments before about just rebuilding the server I decided to go right to the source. I called Microsoft. I talked to a tech and he said there is no need to rebuild the server. I asked him about USN and any other issues that may come up. He said that as long as the server is in a "proper" state there will be no issues. He even checked with his technical lead. By "proper" state he said there are two things you need to check. One is a registry key that gets created when a secondary DC can't talk to the Primary for "X" amount of time. I'm not sure how long "X" is but will look it up later. This key locks the AD DB and causes the USN rollback. If that key has not been created then you're ok. Step two is to check the netlogon service. If a USN rollback has started, the service will be paused or disabled. If the service is paused/disabled you are good to go. He's going to send me the registry key to check so I don't have it right now. If these two things are OK it's fine to just turn the server back on. It's basically just like the server was turned off for several hours.
In summary here's what we did.
Server died on on Wednesday at 3:00pm.
Around midnight I restored server from Tuesdays 2:00pm backup using storagecraft shadowprotect. I left the nic turned off until I confirmed the correct action to take. Restore took about 20 minutes.
Thursday Morning I wasn't in the mood to rebuild the server and wanted to learn the pros and cons of not doing so. Called MS using one of my 10 free support calls that I never use.
MS Tech said no need to rebuild
He researched a few things and confirmed with his Tech lead that no rebuild is needed to safely continue. He did say there is a possibility that the 2nd DC would go into a USN rollback. He also said it's easy to fix. All you have to do is demote the 2nd DC and then add it back into the domain.
He took a System state backup from the working DC just in case
Verified that the Netlogon service was running (not paused or stopped)
Verified that the registry key was not created. I'll update when I find the correct key.
Told me to turn on the nic
Turned it on. Waiting a few minutes and logged onto the 2nd DC
ran a few standard repadmin checks on both DC's to verify everything was replicating as expected. See articles below for commands. Everything looked good.
Rebooted 2nd DC for good measure.
Added new user to Primary DC. Manually replicated and verified it showed up on the 2nd DC. It did.
Deleted user from 2nd DC. Manually Replicated and verify it was delete on Primary. It was.
Did same two steps above but made sure replication worked on it's own, ie didn't manually hit the replicate button. It worked in both cases.
Searched event logs for anything odd. Nothing
Did it all again this morning. Working fine.
Total time if you don't count the system state backup was about 25 minutes or so. Guess it would be 45 minutes if you include the restore. Much faster than rebuilding the server in my opinion. In a large environment I might agree it's just easier to rebuild but in this case not worth the hassle.
He also sent me this list of articles. I haven't had a chance to look through them all but will do so this weekend. I also found this article generally helpful.
Articles that can be referred:
1.       Troubleshooting AD Replication error 8456 or 8457: "The source | destination server is currently rejecting replication requests"
2.       How to detect and recover from a USN rollback in Windows Server 2003, Windows Server 2008, and Windows Server 2008 R2;EN-US;875495
3.       Windows Server Backup Step-by-Step Guide for Windows Server 
4.       Active Directory Replication Status Tool    
Will SzymkowskiSenior Solution ArchitectCommented:
You are running server 2008R2 for your domain controllers so restoring the secondary DC is not supported. You can only do this if you have a 2012 DC that is holding the PDC FSMO role.

If your Secondary DC has failed DO NOT restore a earlier version VM. You will run into USN issues with DNS and replication will not work properly.

If this DC has failed, you will need to remove it from the domain, perform the metadata cleanup and then install a new member server and promote it to a DC. Then allow replication to proceed from the DC that is online.

Make sure that you also update your DNS on your servers and clients if you are using different IP for the new DC.

bevegeAuthor Commented:
I'm trying to understand what would trigger a USN rollback. Just for learning.  What would happen if the failed server was offline only for 1 hr. Say you turn it off at 2:00pm fix the issues and you turn it back on at 3:00pm.  How is that different than taking a full backup at 2:00pm and then restoring it at 3:00pm and turning it on? in both situations the server is not talking to the Primary DC for 1 hr. What does the restored server do that the turned off server would do?  How does this change when both servers are off for say a week?

I'm just trying to understand because rebuilding is a big PITA vs a 20 minute restore.

Thank you

Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

bevegeAuthor Commented:
Looks like my original question was cut off. I can't figure out how to edit the original question. Anyway here is the entire original question.

I have a client with two 2008 R2 domain controllers running on Esx 5.1 on two separate hosts. DC1 and DC2. DC1 is the primary. I verified that DC1 has all 5 FSMO roles. Yesterday the drive containing DC2 failed. I have Storagecraft Shadow protect backups every hour up to the point of failure.

Since this is a secondary domain controller. I can just restore the image and turn it on right?  It's been less than 24hrs so the 2nd domain controller should just pull all the updated AD data from the primary without causing a bunch of AD errors correct?

I haven't had this happen in years so I want to do it correctly. This is also a good time to update our documentation with more detailed restore information.

I remember you don't want to do this if it's been a long time.

Will SzymkowskiSenior Solution ArchitectCommented:
There are several different mechanisums that happen on a domain controller. You only ever restore a DC using an image when your entire domain has failed. You can then restore the DC Image and they you would add any additonal DC's replicating from the images DC.

Whenever you are actually restoring something back into AD you need to use either non-authoritative or authoritative. When you have a DC offline for like and hour it will get the updates/chagnes pushed to it when it is back online from the other DC's that are up to date.

If you restore from an image the DC's that are online the entire time will not push those changes to the DC that has been imaged. This is where you see orphaned objects that will now never get replicated to the other DC's.

In Server 2012 you can virtulize your DC's and it has USN rollback so that when you bring up a DC from a recovered VM it will get its updates from the other DC that were online the entire time.

Take a look at the following link which explains in detail about USN Roll Back and it's importance.

Seth SimmonsSr. Systems AdministratorCommented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.