Solved

DC crash damage control

Posted on 2010-09-24
8
580 Views
Last Modified: 2012-05-10
Came in in the middle of aproject and inherited all of this.
We have three DC servers. py the original priamry server crashed this week after being demoted to a member server. PY and AST both are backup DCs with Global Catalog enabled on both. We are having some minor issues, now that the demoted server has actually crashed.
1:  one of the DCs, in a separate location and on a separate subnet, is now complaining that it can't find a logon server.
2:   On the PDC, on the same subnet as the old PDC, I am now getting the error message that the DNS record for PY (on PY) cannot be deleted automatically and must be deleted manually.
This probably has a good deal of bearing on the issue: I demoted the original PDC PY, not knowing that it was an SBS installation. I realize that SBS cannot be demoted and must server as a DC, but having worked almost entirely with the iserver remotely, I missed the fact that it was a SBS DC.
Three questions:
How bad is this long and short term?
What is the immediate impact on log ons, etc.
What are the 3 highest priority steps I need to take to resolve this issue and stabilize the network?
0
Comment
Question by:evault
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 29

Expert Comment

by:pwindell
Comment Utility
It's not all that bad as long as you don't panic and do things that you should not do and make things worse.

1. Do not build a new DC by the same name [yet].

2. Adjust the TCP/IP specs of any machine that are acting up so that the dead DC is not listed in as their DNS or is at least at the bottom of the list so it is not the primary.  Any "happy" machines that aren't complaining leave them alone.  If the old DC (it's name and IP#) are never coming back as a DC then it needs to be removed from the TCP/IP Specs of every machine everywhere.  This is where you learn to love DHCP.

3. Do the normal expected Metadata Clean up to remove traces of the old DC.  Since it was demoted previously,...there might only be bits and pieces left behind.  This process should include deleting the DNS Records.  If replication is working correctly then removing from one DNS will replicate the change to the others.

How to remove data in Active Directory after an unsuccessful domain controller demotion
http://support.microsoft.com/kb/216498

4. Now,..if you wanted to,...you can now add back in a new/rebuilt DC with the same name and IP# as the old.

5. Multiple sites.........  

There should already be a DC at every location that is separated from the rest by a slow WAN link.  Active Directory Sites and Service should already be used to manage AD Replication of the WAN Link and it will also ensure that users will use the DC/DNS at their own location rather than jumping the WAN link.

Clients in the remote Sites should use their own DC/DNS in their TCP/IP specs for DNS.  You can then add one other as a secondary no matter where it is located.  It is pretty much a waste of time having more than two listed.
0
 
LVL 56

Expert Comment

by:Cliff Galiher
Comment Utility
> How bad is this long and short term?
Unknown. A crashing DC *can* corrupt AD when it goes. Could be trivial, could be very big. Take the last good backup before the crash, make a copy of it, and keep it handy.
> What is the immediate impact on log ons, etc.
Should be minimal, but again, the health of AD is intimiately tied to the answer, and we haven't determined that yet.
> What are the 3 highest priority steps I need to take to resolve this issue and stabilize the network?
Before I give you the steps, I need to make a distinction. Active Directory does not have primary domain controllers. Purge PDC from your vocabulary. This is important to understand thesteps I will provide:
1) SBS requires that it hold all FSMO roles. So by demoting it unceremoniously, which SBS doesn't expect, those roles may now be in limbo. Go to another DC and *seize* (not transfer) the FSMO roles.
2) Perform the Microsoft metadata cleanup procedure to remove any trace of the failed DC.
3) Download the Microsoft IT Health Scanner and run it. It categorizes issues as critical, warning, etc, so it wil give you a better idea of what to focus on first. Fix every issue it reports. Then move on to using dcdiag (a command line tool).
While working through that process, you'll probably know if AD has become significantly damaged and if you need that backup. It'll show up by a number of critical errors to great to realistically fix, or by attempting to fix issues and running into inexplicable and unavoidable permissions or read/write errors. It is rare, but it does happen.
 
-Cliff

0
 
LVL 29

Expert Comment

by:pwindell
Comment Utility
Awww!  SBS!   I didnt see it was SBS.  Sorry,..I don't go near SBS.  SBS is a disaster even before the disaster happens,...particularly in the area of disaster recovery.
I'll just let cgaliher handle it since he is probably more familiar with SBS than I.
0
 
LVL 1

Author Comment

by:evault
Comment Utility
pwindell:
Not to worry...I haven't paniced. I know better than to try to fix things while in an emotionally charged state. Thanks for the reminder, though.
1:)   Doubt I will use the same DC name as it was a hinky name that didn't really make any sense.
2:)  All DHCP is running off the AST and the PS DCs, both of which have had PY removed from their DNS. All other machines that use any sort of static DNS have had the reference to PY changed to either AST or PS, depending on their geographic location and therefore subnet. So, as far as I can tell since all of this was addressed before the crash all machines supposedly have had every trace of PY removed from their DHCP and DNS. I will double check however, to make sure.Oh, I do love DHCP <(:-).
3:)  I believe replication is working properly as I had previously replicated all of the AD info from the PDC (PY) to the other DCs in the domain. (Yes, I know many feel there is no such thing as a PDC, but many other experts, authors and even Microsoft disagree). I have printed the article and will properly review it before making any suggested changes.
4:)  That domain name and IP address are never coming back, although we may build another DC for backup purposes.
5:)  Just set that up last week before I demoted the PDC, PY. Double checked it before I demoted PY to make sure that replication was working and that all AD objects from both site had been replicated to the other site. At that time it seemed that everything was good at both sites.
>> Clients in the remote Sites should use their own DC/DNS in their TCP/IP specs for DNS. << Most, if not all, PCs at each site use DHCP which is the DC for that particular site. DHCP at each site points to the DC at that specific site for DNS, so I should be good right?
>> You can then add one other as a secondary no matter where it is located.  It is pretty much a waste of time having more than two listed.<< Do you mean more than two listed at each specific site/subnet?
 
cgaliher:
I seized FSMO roles about a month ago in preparation for this. This wasn't as unceremoniously at it looks. A lot of prep went into this. Unfortunately I was not the one to set it up and there was (as usual) no documentation. How, though, can I check to make sure FSMO roles been handled properly?
Thanks for your assistance! Both of you! I will get back with another post in about a day or so to let you know how the 'repair' went.

0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 56

Expert Comment

by:Cliff Galiher
Comment Utility
The it health scanner will check the roles and check to see that the server expected to have them still responds. So advice-wise, unchanged. You will have that answer if you run the checks.
0
 
LVL 38

Accepted Solution

by:
ChiefIT earned 500 total points
Comment Utility
Seizing the roles is probably not necessary. In a mixed environment, the SBS has to be the FSMO role holer, (this is true). But, you can have an alternate. If configured as a mixed environenment and an alternate FSMO role holder was chosen, the FSMO roles may already exist on another DC.

So, to cause a least invasive approach to fixing this problem, let's test the FSMO role holder to see if the roles exist on a remaining DC.

Go to the command prompt and type:
DCdiag /test:FSMO

If the roles are on one DC, then all you have to do is a DNS metadata and possibly a FRS metadata cleanup... (Definately DNS meatadata because of that existing DNS record).

FRS appears to be working. If any DC on the FRS replication group goes down, usually FRS stops and you will get errors in the FRS event logs in the 13000's.

I think you gracefully demoted the SBS machine and I think the FSMO's are good as well as FRS. But there resides a DNS SRV record that you might have to delete on the remaining Servers.
0
 
LVL 29

Expert Comment

by:pwindell
Comment Utility
3:)  I believe replication is working properly as I had previously replicated all of the AD info from the PDC (PY) to the other DCs in the domain. (Yes, I know many feel there is no such thing as a PDC, but many other experts, authors and even Microsoft disagree). I have printed the article and will properly review it before making any suggested changes.
It is a PDC FSMO Role,...not the same thing as a PDC who's life ended with NT4.  You can find an expert or an author that will say anything you want to hear if you look for them.  MS is not mono-lithic,...for every MS guy you find that says one thing I can find more MS guys that say the opposite.  For example,  you can find MS guys who might say that the Forefront TMG should never be a Domain Member,...and I can find you a couple MS guys who would slap the first MS guy for saying that.
all, PCs at each site use DHCP which is the DC for that particular site. DHCP at each site points to the DC at that specific site for DNS, so I should be good right?
It would "get by",...it would "survive",...but it is not complete.  Use Active Directory Sites and Services,...that is what it was designed for, hence the name it is called by.
0
 
LVL 1

Author Closing Comment

by:evault
Comment Utility
FSMO roles are good. Thanks much. Still have some minor errs, FRS has an error or two and systemlog has an error, but other than that after running dcdiag /fix, everything looks good. Looks like I dodged a bullet.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

Resolve DNS query failed errors for Exchange
Synchronize a new Active Directory domain with an existing Office 365 tenant
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now