Solved

Windows 2003 DC failover

Posted on 2012-03-29
14
706 Views
Last Modified: 2012-03-31
I have a single domain Windows 2003 network with redundant domain controllers.  if the first DC in the domain were to die irreparably  would one of the remaining DC's fully take over and which one would it be - the second one installed, assuming I have three, let's say? If the DC that died just did DC - no other services - would anybody notice? Would I have to do something to the other DC's?  If it did have services such as DNS or DHCP would I have to have had the other servers having 'secondary' versions of them and then again if the 'first' one died would the secondary ones take over so that nobody would notice? Or would I have to do some additional work? The basic question is how transparent is DC redundancy? Is there ever a need to try to restore one - same name/same function - if I  have other DC's - or can I just blow it away?
0
Comment
Question by:lineonecorp
  • 3
  • 3
  • 3
  • +4
14 Comments
 
LVL 35

Assisted Solution

by:Joseph Daly
Joseph Daly earned 56 total points
ID: 37784195
The answer to your question is maybe.

If your DC held the FSMO roles for your domain you would need to seize them to another working DC and then perform a metadata cleanup on your AD environment.

You should be replicating DNS to all DCs in your domain so that if one fails you will still be able to resolve DNS.

If the failed DC held the DHCP role you would need to add DHCP server role to another server before clients would be able to get IP addresses. You may have some issues between old and new DHCP leases if you did this.

MS does reccomend an 80/20 DHCP scope between two servers if you want some kind of DHCP redundancy.

Any other questions let me know
0
 
LVL 57

Assisted Solution

by:Mike Kline
Mike Kline earned 112 total points
ID: 37784206
You would have to make sure that the clients have the IP of the other DCs/DNS servers in their configuration.

Also make sure the other boxes are global catalogs.  

You will want to read on dc stickiness   http://www.frickelsoft.net/blog/?p=278

If your DC1 holds all the FSMO roles and it crashes hard you will have to seize the roles and cleanup the dead DC (search metadata cleanup)

Clients should continue to work ok

Thanks

Mike
0
 
LVL 29

Assisted Solution

by:pwindell
pwindell earned 111 total points
ID: 37784343
Bottom line here....

Multiple DCs are not for the purpose of transparent Failover.

They are for the purpose of Ad Database Redundancy so that you do not loose the Database.  But if a DC goes down (any DC anywhere) there is likely going to be at least some amount of disruption somewhere.
0
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

 
LVL 57

Assisted Solution

by:Mike Kline
Mike Kline earned 112 total points
ID: 37784379
I've seen a lot of environments that have over done things and have many DCs in their hub sites and one of those goes down many times there is no disruption.

Thanks

Mike
0
 
LVL 26

Assisted Solution

by:Leon Fester
Leon Fester earned 110 total points
ID: 37784384
DC failovers are pretty much redundant, except when you've got hard coded references to your DC's.
DNS for example is associated with the IP address of a DC, so if that DC fails then any IP references will fail.
Similarly WINS would also be affected.
DHCP however, if built on th 80/20 or any split ratio system will continue to service DHCP requests.
Authentication has a built-in algorhythm to find the next available domain controller it first DC is not responding.

Similarly with the FSMO roles; it is explicitly set to a specific DC.  
If that DC falls over then you've lost that role until you either restore the DC or seize the role.
It's not an automated process, so manual intervention is required.

Regarding restoring one with the same name/functions:
It's not too important in your environment since you've already got 3 DC's.
What might be helpful is to restore/rebuild a DC on the same IP address to avoid reconfiguration of DHCP/DNS/WINS or any static mappings in applications.

In a multi-domain controller environment, it's often faster, cleaner and simpler to complete remove a failed DC and rebuilding it then running a DCPROMO.
All you're doing is adding a new DC to an existing domain structure.
0
 
LVL 29

Assisted Solution

by:pwindell
pwindell earned 111 total points
ID: 37784407
To mkline71:

That's true.
It usually means that there wasn't anything actually using the one that went down.  DNS Clients always try to use the same DNS they were able to use the "last time",  Exchange will always try to use the same CG it used last time as well even if all of the DCs are also GCs.   So it means that nothing had previously been using the one that went down,..so nothing "missed" it not being there.  

But I've have never been that lucky,...things always actup if any of my DCs go down..
0
 

Author Comment

by:lineonecorp
ID: 37784531
Thanks for the 'storm' of answers. One point that keeps coming up - 'seize the FSMO role'. What is involved in doing that if the original DC is dead? And what is the time critical aspect of it? What happens when a domain doesn't have FSMO for a few hours/days? I thought the whole point of having multiple DC's was that things chugged along when the 'primary' went down - what doesn't chug when the DC holding the FSMO role disappears? Is there any way to set up the domain so there is a 'secondary' FSMO server?
0
 
LVL 26

Assisted Solution

by:Leon Fester
Leon Fester earned 110 total points
ID: 37784562
There are 5 FSMO roles, and only one of each per domain. So no, you cannot setup a 'secondary' fsmo role holder.

Of the 5 roles, the PDCe role is used most on your domain.

Have a read throught these two links for more understanding about FSMO roles.
http://blogs.technet.com/b/askds/archive/2011/02/25/friday-mail-sack-xxxxxxxxxxxx.aspx
Recommended from this blogs ...
http://technet.microsoft.com/en-us/library/cc780487(WS.10).aspx
0
 
LVL 95

Assisted Solution

by:Lee W, MVP
Lee W, MVP earned 55 total points
ID: 37784632
First, a question:

Why haven't you tested this in your environment?  Or AT LEAST a test environment?  It's INSANELY EASY to test - pull the network cable out a DC - there - it's failed.  Now how does the rest of the network work?  

We can tell you what we THINK will happen, but we don't know the intricacies of your environment and we're not looking at them beyond what you're choosing to share and we're remembering to ask.

IN THEORY, you can lose a DC and not have any noticeable problem until and unless you try to add another DC.  It has been my experience that if there's a DC that's unreachable, at least in a site, then promoting a new DC can fail.  Admittedly, the last time I saw that was in Windows 2000's AD, but I suspect it largely holds true today.  Even if it doesn't you WANT to clean things up when a failure happens and not later*.

If you don't understand the FSMO roles then you should not be the person responsible for maintaining Active Directory (sorry to be blunt, but that's how I feel).  If you're trying to learn, great, but the way I'm reading the question, it seems you are the responsible party.  I also want to be clear - I'm not suggesting you don't have an ability to learn and excel, but if these are things you don't understand, you should be building test networks, taking classes, watching videos, and learning on a network that is NOT running a business.

The FSMO roles handle critical functions and coordinate things between all DCs.  For example, every user and computer has a Security ID/Globally Unique ID (SID/GUID).  This is assigned by the DC that creates the user or computer.  These IDs MUST be unique on the network or you'd have serious problems.  The RID master (Relative ID master) allocates blocks of IDs to each DC.  This way, each DC has it's own unique set of IDs to assign to accounts created on that DC that will not conflict with other DCs.  When the IDs get low, the DC asks the RID master for more IDs. So what happens if the DC designated as the RID master is down for a day?  Unless you're a business with thousands of employees adding dozens of new ones per day with their computers, probably nothing.  But when that supply decreases to 0 (and in a large business that could be days while in a small business, that could be MONTHS or even YEARS), you won't be able to create new accounts anymore UNTIL the RID master is restored.  

If the RID master is LOST - as in the DC that it was on fails and cannot be restored easily, then you must seize the roles - Seizure should ONLY be done once you are CERTAIN the role holder will never come back online.  And if you're not certain, once it's seized, the policy MUST be that the failed DC will NEVER be restored even if you suddenly figure out it was an easy fix.  When a role is seized, depending on the role, things happen to ensure it doesn't mess up the network with duplicate or bad information.  In the case of the RID master, it dramatically increases the RID count so the odds of actually handing out a duplicate block of RIDs is astronomically low.

I strongly recommend going to your management and demanding (as much as you can demand) a class in Active Directory and a reasonably powerful machine you can use for VMs to setup a test network to play with.  And for more information on the FSMO master roles, see:
http://www.petri.co.il/understanding_fsmo_roles_in_ad.htm
http://en.wikipedia.org/wiki/Flexible_single_master_operation
http://www.petri.co.il/seizing_fsmo_roles.htm
http://www.petri.co.il/transferring_fsmo_roles.htm

Some services (like Exchange) may be less resilient to DC failures.  Exchange relies on a Global Catalog (GC) server and if the server it's using becomes unavailable, it's possible that Exchange will have connectivity issues for a while.  Typically, within 30 minutes Exchange will figure out there's a problem and look for another GC, but there can be a disruption.

Personally, I recommend for any site without a strongly knowledgeable AD admin, a minimum and maximum of 2 DCs, both of which should be Global Catalogs and DNS servers.

For DHCP, I'd probably do a 50/50 split scope or a 33/33/33 split scope amongst three servers, but no more than 3 servers.

Finally, once you're done, especially if you haven't before, I recommend running DCDIAG on the existing DCs (I usually use the /c /e /v switches) to verify the health of AD and then address any errors that need addressing (some may be normal or even expected in certain environments).

BTW, I'm sure once you learn it, you'll be a fine AD admin... but no one should ever learn in a production environment and learning through a few questions on AD here is not sufficient in my opinion - there are BOOKS on this stuff that aren't complete.
0
 
LVL 23

Accepted Solution

by:
Suliman Abu Kharroub earned 56 total points
ID: 37785134
For FSMO and its impact in the active directory if one of them lost, please read this article ( the best one I read ever about FSMO):

http://www.experts-exchange.com/Software/Server_Software/File_Servers/Active_Directory/A_2796-Demystifying-the-Active-Directory-FSMO-Roles.html
0
 
LVL 29

Expert Comment

by:pwindell
ID: 37786985
Well said Lee!  Particularly on on having more than one but less than 3 DCs and on the DHCP Scope Splitting.
0
 

Author Comment

by:lineonecorp
ID: 37790207
dvt_localboy, Sulimanw: Thanks for the links. They were quite useful.

leew:  Thanks for the lengthy explanation. As to why I don't just pull out the plug - well, I think all the answers on here are the reason - seeing that things aren't working is different than knowing why they might not be working. If I pulled out the cable and things didn't work  I would just end up starting the question with 'My DC went down' and then asking why - and then pretty well getting all the answers you and the others so readily provided insight into. As far as reading/watching videos/etc. - you should not assume that I haven't as it seems you are. I have done all that - but there is a difference between thinking you understand and actually understanding.  I think I know what I read but the only way I can get confirmation my understanding is correct is posing scenarios and  asking others what they think will happen in that scenario.  And of course, not everything is covered in manuals/lectures - - there are all kinds of undocumented 'unexpected's/ insider tips e.g. your 'two DC's and not more than and not less than'.  Experts is good for getting people to tell you from the trenches what the official material leaves out/what field experience dictates. This repartee here has sharpened my knowledge more than any additional reading or pulling cables could have - I can now intelligently pull the cable knowing if I do have problems whether they are par for the course or something just weird to my circumstances and any stuff I read from this point on will be read with a lot more context and certainty. So all in all, as far as this question goes for my needs and style of learning, Experts worked as perfectly as always.
0
 

Author Closing Comment

by:lineonecorp
ID: 37790210
Good lively comprehensive discussion.
0
 
LVL 26

Expert Comment

by:Leon Fester
ID: 37792123
OMG! Finally somebody who uses EE for learning!
This makes all the effort of sharing and typing in these longs discussions so much more worthwhile.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently changeā€¦
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlleā€¦

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question