Solved

Windows 2003 DC failover

Posted on 2012-03-29
14
703 Views
Last Modified: 2012-03-31
I have a single domain Windows 2003 network with redundant domain controllers.  if the first DC in the domain were to die irreparably  would one of the remaining DC's fully take over and which one would it be - the second one installed, assuming I have three, let's say? If the DC that died just did DC - no other services - would anybody notice? Would I have to do something to the other DC's?  If it did have services such as DNS or DHCP would I have to have had the other servers having 'secondary' versions of them and then again if the 'first' one died would the secondary ones take over so that nobody would notice? Or would I have to do some additional work? The basic question is how transparent is DC redundancy? Is there ever a need to try to restore one - same name/same function - if I  have other DC's - or can I just blow it away?
0
Comment
Question by:lineonecorp
  • 3
  • 3
  • 3
  • +4
14 Comments
 
LVL 35

Assisted Solution

by:Joseph Daly
Joseph Daly earned 56 total points
Comment Utility
The answer to your question is maybe.

If your DC held the FSMO roles for your domain you would need to seize them to another working DC and then perform a metadata cleanup on your AD environment.

You should be replicating DNS to all DCs in your domain so that if one fails you will still be able to resolve DNS.

If the failed DC held the DHCP role you would need to add DHCP server role to another server before clients would be able to get IP addresses. You may have some issues between old and new DHCP leases if you did this.

MS does reccomend an 80/20 DHCP scope between two servers if you want some kind of DHCP redundancy.

Any other questions let me know
0
 
LVL 57

Assisted Solution

by:Mike Kline
Mike Kline earned 112 total points
Comment Utility
You would have to make sure that the clients have the IP of the other DCs/DNS servers in their configuration.

Also make sure the other boxes are global catalogs.  

You will want to read on dc stickiness   http://www.frickelsoft.net/blog/?p=278

If your DC1 holds all the FSMO roles and it crashes hard you will have to seize the roles and cleanup the dead DC (search metadata cleanup)

Clients should continue to work ok

Thanks

Mike
0
 
LVL 29

Assisted Solution

by:pwindell
pwindell earned 111 total points
Comment Utility
Bottom line here....

Multiple DCs are not for the purpose of transparent Failover.

They are for the purpose of Ad Database Redundancy so that you do not loose the Database.  But if a DC goes down (any DC anywhere) there is likely going to be at least some amount of disruption somewhere.
0
 
LVL 57

Assisted Solution

by:Mike Kline
Mike Kline earned 112 total points
Comment Utility
I've seen a lot of environments that have over done things and have many DCs in their hub sites and one of those goes down many times there is no disruption.

Thanks

Mike
0
 
LVL 26

Assisted Solution

by:Leon Fester
Leon Fester earned 110 total points
Comment Utility
DC failovers are pretty much redundant, except when you've got hard coded references to your DC's.
DNS for example is associated with the IP address of a DC, so if that DC fails then any IP references will fail.
Similarly WINS would also be affected.
DHCP however, if built on th 80/20 or any split ratio system will continue to service DHCP requests.
Authentication has a built-in algorhythm to find the next available domain controller it first DC is not responding.

Similarly with the FSMO roles; it is explicitly set to a specific DC.  
If that DC falls over then you've lost that role until you either restore the DC or seize the role.
It's not an automated process, so manual intervention is required.

Regarding restoring one with the same name/functions:
It's not too important in your environment since you've already got 3 DC's.
What might be helpful is to restore/rebuild a DC on the same IP address to avoid reconfiguration of DHCP/DNS/WINS or any static mappings in applications.

In a multi-domain controller environment, it's often faster, cleaner and simpler to complete remove a failed DC and rebuilding it then running a DCPROMO.
All you're doing is adding a new DC to an existing domain structure.
0
 
LVL 29

Assisted Solution

by:pwindell
pwindell earned 111 total points
Comment Utility
To mkline71:

That's true.
It usually means that there wasn't anything actually using the one that went down.  DNS Clients always try to use the same DNS they were able to use the "last time",  Exchange will always try to use the same CG it used last time as well even if all of the DCs are also GCs.   So it means that nothing had previously been using the one that went down,..so nothing "missed" it not being there.  

But I've have never been that lucky,...things always actup if any of my DCs go down..
0
 

Author Comment

by:lineonecorp
Comment Utility
Thanks for the 'storm' of answers. One point that keeps coming up - 'seize the FSMO role'. What is involved in doing that if the original DC is dead? And what is the time critical aspect of it? What happens when a domain doesn't have FSMO for a few hours/days? I thought the whole point of having multiple DC's was that things chugged along when the 'primary' went down - what doesn't chug when the DC holding the FSMO role disappears? Is there any way to set up the domain so there is a 'secondary' FSMO server?
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 26

Assisted Solution

by:Leon Fester
Leon Fester earned 110 total points
Comment Utility
There are 5 FSMO roles, and only one of each per domain. So no, you cannot setup a 'secondary' fsmo role holder.

Of the 5 roles, the PDCe role is used most on your domain.

Have a read throught these two links for more understanding about FSMO roles.
http://blogs.technet.com/b/askds/archive/2011/02/25/friday-mail-sack-xxxxxxxxxxxx.aspx
Recommended from this blogs ...
http://technet.microsoft.com/en-us/library/cc780487(WS.10).aspx
0
 
LVL 95

Assisted Solution

by:Lee W, MVP
Lee W, MVP earned 55 total points
Comment Utility
First, a question:

Why haven't you tested this in your environment?  Or AT LEAST a test environment?  It's INSANELY EASY to test - pull the network cable out a DC - there - it's failed.  Now how does the rest of the network work?  

We can tell you what we THINK will happen, but we don't know the intricacies of your environment and we're not looking at them beyond what you're choosing to share and we're remembering to ask.

IN THEORY, you can lose a DC and not have any noticeable problem until and unless you try to add another DC.  It has been my experience that if there's a DC that's unreachable, at least in a site, then promoting a new DC can fail.  Admittedly, the last time I saw that was in Windows 2000's AD, but I suspect it largely holds true today.  Even if it doesn't you WANT to clean things up when a failure happens and not later*.

If you don't understand the FSMO roles then you should not be the person responsible for maintaining Active Directory (sorry to be blunt, but that's how I feel).  If you're trying to learn, great, but the way I'm reading the question, it seems you are the responsible party.  I also want to be clear - I'm not suggesting you don't have an ability to learn and excel, but if these are things you don't understand, you should be building test networks, taking classes, watching videos, and learning on a network that is NOT running a business.

The FSMO roles handle critical functions and coordinate things between all DCs.  For example, every user and computer has a Security ID/Globally Unique ID (SID/GUID).  This is assigned by the DC that creates the user or computer.  These IDs MUST be unique on the network or you'd have serious problems.  The RID master (Relative ID master) allocates blocks of IDs to each DC.  This way, each DC has it's own unique set of IDs to assign to accounts created on that DC that will not conflict with other DCs.  When the IDs get low, the DC asks the RID master for more IDs. So what happens if the DC designated as the RID master is down for a day?  Unless you're a business with thousands of employees adding dozens of new ones per day with their computers, probably nothing.  But when that supply decreases to 0 (and in a large business that could be days while in a small business, that could be MONTHS or even YEARS), you won't be able to create new accounts anymore UNTIL the RID master is restored.  

If the RID master is LOST - as in the DC that it was on fails and cannot be restored easily, then you must seize the roles - Seizure should ONLY be done once you are CERTAIN the role holder will never come back online.  And if you're not certain, once it's seized, the policy MUST be that the failed DC will NEVER be restored even if you suddenly figure out it was an easy fix.  When a role is seized, depending on the role, things happen to ensure it doesn't mess up the network with duplicate or bad information.  In the case of the RID master, it dramatically increases the RID count so the odds of actually handing out a duplicate block of RIDs is astronomically low.

I strongly recommend going to your management and demanding (as much as you can demand) a class in Active Directory and a reasonably powerful machine you can use for VMs to setup a test network to play with.  And for more information on the FSMO master roles, see:
http://www.petri.co.il/understanding_fsmo_roles_in_ad.htm
http://en.wikipedia.org/wiki/Flexible_single_master_operation
http://www.petri.co.il/seizing_fsmo_roles.htm
http://www.petri.co.il/transferring_fsmo_roles.htm

Some services (like Exchange) may be less resilient to DC failures.  Exchange relies on a Global Catalog (GC) server and if the server it's using becomes unavailable, it's possible that Exchange will have connectivity issues for a while.  Typically, within 30 minutes Exchange will figure out there's a problem and look for another GC, but there can be a disruption.

Personally, I recommend for any site without a strongly knowledgeable AD admin, a minimum and maximum of 2 DCs, both of which should be Global Catalogs and DNS servers.

For DHCP, I'd probably do a 50/50 split scope or a 33/33/33 split scope amongst three servers, but no more than 3 servers.

Finally, once you're done, especially if you haven't before, I recommend running DCDIAG on the existing DCs (I usually use the /c /e /v switches) to verify the health of AD and then address any errors that need addressing (some may be normal or even expected in certain environments).

BTW, I'm sure once you learn it, you'll be a fine AD admin... but no one should ever learn in a production environment and learning through a few questions on AD here is not sufficient in my opinion - there are BOOKS on this stuff that aren't complete.
0
 
LVL 23

Accepted Solution

by:
Suliman Abu Kharroub earned 56 total points
Comment Utility
For FSMO and its impact in the active directory if one of them lost, please read this article ( the best one I read ever about FSMO):

http://www.experts-exchange.com/Software/Server_Software/File_Servers/Active_Directory/A_2796-Demystifying-the-Active-Directory-FSMO-Roles.html
0
 
LVL 29

Expert Comment

by:pwindell
Comment Utility
Well said Lee!  Particularly on on having more than one but less than 3 DCs and on the DHCP Scope Splitting.
0
 

Author Comment

by:lineonecorp
Comment Utility
dvt_localboy, Sulimanw: Thanks for the links. They were quite useful.

leew:  Thanks for the lengthy explanation. As to why I don't just pull out the plug - well, I think all the answers on here are the reason - seeing that things aren't working is different than knowing why they might not be working. If I pulled out the cable and things didn't work  I would just end up starting the question with 'My DC went down' and then asking why - and then pretty well getting all the answers you and the others so readily provided insight into. As far as reading/watching videos/etc. - you should not assume that I haven't as it seems you are. I have done all that - but there is a difference between thinking you understand and actually understanding.  I think I know what I read but the only way I can get confirmation my understanding is correct is posing scenarios and  asking others what they think will happen in that scenario.  And of course, not everything is covered in manuals/lectures - - there are all kinds of undocumented 'unexpected's/ insider tips e.g. your 'two DC's and not more than and not less than'.  Experts is good for getting people to tell you from the trenches what the official material leaves out/what field experience dictates. This repartee here has sharpened my knowledge more than any additional reading or pulling cables could have - I can now intelligently pull the cable knowing if I do have problems whether they are par for the course or something just weird to my circumstances and any stuff I read from this point on will be read with a lot more context and certainty. So all in all, as far as this question goes for my needs and style of learning, Experts worked as perfectly as always.
0
 

Author Closing Comment

by:lineonecorp
Comment Utility
Good lively comprehensive discussion.
0
 
LVL 26

Expert Comment

by:Leon Fester
Comment Utility
OMG! Finally somebody who uses EE for learning!
This makes all the effort of sharing and typing in these longs discussions so much more worthwhile.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

On July 14th 2015, Windows Server 2003 will become End of Support, leaving hundreds of thousands of servers around the world that still run this 12 year old operating system vulnerable and potentially out of compliance in many organisations around t…
Find out how to use Active Directory data for email signature management in Microsoft Exchange and Office 365.
This tutorial will walk an individual through locating and launching the BEUtility application and how to execute it on the appropriate database. Log onto the server running the Backup Exec database. In a larger environment, this would generally be …
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now