Link to home
Start Free TrialLog in
Avatar of dustypenguin
dustypenguinFlag for United States of America

asked on

DC failure Windows Server 2008 R2

Both machines mentioned are VMs (xenserver)

I have a dc server that has failed. (Raid failure and corrupted VM)  It was also a DNS server and had DFRS namespace function.

There is a second server that has all the roles and is a global catalog.

I do have a vm image (3-4 days old) of the failed server.  
1 - Would there be any benefit to boot the image and run dcpromo to remove it from the domain?  Or would there be AD contamination?
2 - I can live without the DFRS for a few days
3 - If I do not do dcpromo on the image, how do I make sure that the server is removed from the domain?
4 - What about DNS?

Thanks
Avatar of Larry Struckmeyer MVP
Larry Struckmeyer MVP
Flag of United States of America image

Consider what you would do if these machines were physical.  Adsiedit would be required to remove the failed server.  So you could go that route.  But if the VM's images are only 3/4 days old could you not replace the failed drives and restore the images?

Consider what would happen if the failed DC were simply accidently turned off for the weekend.  On the next working day, when restarted it would replicate with the live DC's in the domain, updating and exchanging information until they were in sync.  I don't believe restoring a 3/4 day old image will cause any harm.

In the case of DNS, it seems to me you will want to have a DNS server on your network.  You say this one was "a dns server".  If there are no others, you can add the role to any other server in the domain either permanently or temp.
Avatar of dustypenguin

ASKER

Can I indeed, just use the Active Directory Users and Computers snap in and remove the failed server from the Domain Controllers list?
Thanks for the reply fl_flyfishing, yes there is another DNS server on the network.

Having lived through a domain crash, I am hesitant to just restore and go if there is a chance that it would inject bad data into the AD.  Especially in light of wanting the next couple of days off!
No ADUC will not clean up the other DC's.  ADSIedit is required.  You should be able to find the steps in the MS KB.  As far as the other DC's are concerned, this DC has simply been turned off.  ADUC is for removing AFTER the DC has been DCPromo'd out of the AD.
I am restoring the image ... we'll give that a go and monitor the event viewer.  I would have done that earlier (last night), except for my bad experience about 5 years ago.  Ugly ugly ugly.
The appropriate way to cleanup AD is below...
- Use ntdsutil (perform a metadata cleanup) to remove the broken server
- DO NOT bring up the old imaged server that was from 3-4 days ago
*as soon as you bring this server back up in the environment it is going to start replicating and authenticating users. The USN will be out of sync due to this failure.
- Once you have remove cleaned up the server using ntdsutil, you will need to go into ADUC delete the computer account
- Open Sites and Services and remove the computer object from there
- Open DNS Manager and under _msdcs folder (SRV records) you will need to remove all of the records that were associated with your DC that you are removing (kereros/ldap/gc/etc)
- You will also want to change your DNS server settings on your DHCP client scopes so that they no longer point to this DNS server

Once you have completely removed all of the objects from your environment you can bring up a new server and promote this server as a domain controller and it will replicate it's data from healthy DC.

Below is a link on how to use NTDSUTIL to remove cleanup metadata and also removing the object from AD. http://technet.microsoft.com/en-us/library/cc816907(v=ws.10).aspx

Will.
Thanks for the reply, Will.  I think I agree with you.

Here is the part I do not understand.  I did not have to seize any roles, since the roles already resided on the other DC.

What is my next step directly after that?

Your link says "Metadata cleanup is a required procedure after a forced removal of Active Directory Domain Services"  I am not sure I have done that part ("forced removal") yet ....
Will

In the text of the link it says "Expand the domain of the domain controller that was forcibly removed ... "

That's the part I am unsure of.  Do I have to do an action to forcibly remove the DC or is the failure itself the removal?

Thanks
If the FSMO role holder is dead you need to seize the role and perform metadata cleanup. Not your issue.

If you have a BDC (backup DC) that has failed you still need to perform the metadata cleanup and remove this DC from Active Directory and Sites and Services. As stated as well you need to update your DHCP scopes DNS server settings. SRV records are also required.

Basically all you need to do from NTDSUTIL is do metadata cleanup and then proceed with the removal of computer objects for that account.

Will.
Ok, good.

From what I see from the link provided, I can do both things from Active Directory Users and Computers, and I do not need to use NTDSUTIL, as metadata will be cleaned up as well through the GUI.  

("When you use Remote Server Administration Tools (RSAT) or the Active Directory Users and Computers console (Dsa.msc) that is included with Windows Server 2008 or Windows Server 2008 R2 to delete a domain controller computer account from the Domain Controllers organizational unit (OU), the cleanup of server metadata is performed automatically. Previously, you had to perform a separate metadata cleanup procedure.")

Would you agree with that assessment?
ASKER CERTIFIED SOLUTION
Avatar of Will Szymkowski
Will Szymkowski
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok, thanks ...
Restoring a VM image of a failed domain controller is not a support recovery option by Microsoft or VM. Both companies recommend not doing that.

You are on the right track about removing the old DC. You don't need to have it running to delete it.

It is a failed DC so you can safely just delete the computer account for the DC from ADUC.
Using the GUI is the preferred method on AD2008.

Once you've deleted the computer account you should then format the drive and re-install Windows and then add the Directory Services.
This article ( https://blogs.technet.com/b/askds/archive/2009/06/05/dc-s-and-vm-s-avoiding-the-do-over.aspx?Redirected=true ) was also very helpful in understanding why this was the right answer.

While fl_flyfishing's answer was correct for a physical machine that had died, the issue with a VM is that between the time of the VM backup, and the time the machine had died, there may have been (and likely were) USN updates that would have made the VM backup out of sync with the rest of the DCs on the network.  Since the failed machine had accepted some new USNs since the backup, it can not just get back to the level of the other machines since they would see a conflict there in trying to resubmit USNs that had already been accepted.

Thanks for each of your participation.
Lesson Learned.  Thanks everyone.