Solved

DC locator / DC Failover

Posted on 2009-05-10
7
1,281 Views
Last Modified: 2012-09-10
Have some questions about the DC locator process and the netlogon cache.

So heres the background. We have three DCs in one of our active directory Sites. Upon logon all clients and servers will locate a DC via the netlogon service. So lets say an application server (appserver1) chooses DC1.domain.com as its DC that its going to authenticate with. Once the netlong service on appserver1.domain.com chooses DC1.domain.com it will cache that information and all subsequent authentication events will go through DC1.domain.com. We had some hardware issues on DC1.domain.com that forced us to bring it down (had to replace some bad memory that kept crashing the server). I figured since AD/DNS load balanced/fault tolerant, that is appserver1.domain.com would try what's in it's cache and would notice DC1.domain.com was unavailable and do a DNS lookup find SRV records and find a new domain controller to do its authentication&say it would find DC2.domain.com.



Well, once we took down dc1.domain.com our monitoring tool reported that many services were down. We quickly brought DC1.domain.com up and all services came back up.

So i got to reading.  I found everything under the sun.  Some links say that the netlogon service will NOT go and discover a new DC (http://www.smart-x.com/?CategoryID=171&ArticleID=165), http://support.microsoft.com/kb/314861 < this link says that clients will purge their cache only if the client has cached a DC that is not local to their site.....to a link that says "yes, clients will rediscover" (http://msdn.microsoft.com/en-us/library/ms675983.aspx)

Here's a good link for an advicate of "yes, clients will failover" (http://www.improve.dk/blog/2008/03/02/setting-up-and-testing-active-directory-failover)  - see section on domain Controller stick Stickiness

I have a long thread posted on Mark Minasi's Forum:  http://web2.minasi.com/forum/topic.asp?TOPIC_ID=30940 if anyone isinterested.  

Basically i'm just tryin to get to the bottom of this.  If servers do failover, then why does our monitoring tool say services are down when we bounce a Domain Controller?  What kinds of checks does a client do to see if a DC is actually servicing authentication requests?  Is it a ping only, some kind of LDAP query.....this artical says that "yes clients will failover...." but doesn't go into what kinds of checks the client does to verify http://msdn.microsoft.com/en-us/library/ms675983.aspx 

We use What's UP to monitor our servers which uses a service account  (domain account) which is an admin on all servers.  What's UP uses this account to get into WMI to test the services.  When we bounce a DC, What's up is saying that some services are down but not all.  We have a 5 min time laps between when serivces are down and when an e-mail will be sent, and when we bounce a DC, we get e-mails saying services are down.  

Any help would be MUCH appreciated,
0
Comment
Question by:esbfern
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 5

Expert Comment

by:sykojester
ID: 24349696
Is DC1.domain.com holding all the FSMO roles?  If so, that's possibly the reason appserver1 can't find it.  Is DC1.domain.com the ONLY DNS server in your infrastructure?  DNS is required for a machine to find the DC.  No DNS = No SRV records = Nothing working.
0
 

Author Comment

by:esbfern
ID: 24350066
Ah yes failed to mention that part.  

FSMO:  DC1.domain.com does not hold FSMO roles (we only have 3 FSMO roles as we are a child domain).  Further, we have no downlevel clients, thus our clients can authenticate to all DC's.  Just out of curiousity, what reason would holding FSMO roles on DC1.domain.com and downing DC1.domain.com cause appserver1 to not be able to find it?  

DNS:  DC1.domain.com does house DNS and depending on where our Memberserver / workstations are geographicaly located, would determine if DC1.domain.com would be the server / workstationi Primary DNS server or its secondary.  When we recieve e-mails that services are down after dc1.domain.com is rebooted, the e-mails say that member servers in all locations (where DC1.domain.com is the primary DNS server and where DC1.domain.com is secondary) are down, just not in locations where DC1.domain.com is primary.  

I also failed to mention this.  While we are recieving e-mails about services being down, i am not 100% positive that the service is actually down (plan to test this durring the next maintenance window).  With that said, on those servers we recieve e-mails saying the "server service" is down amongst other services, i can check the security log and the account our monitoring tool is using, is failing to authenticate.  This is what make me think the member server isn't going out and finding a 2nd domain controller to do it's authentication.  Rather is continuing to try the DC in it's cache, which i bet is DC1.domain.com.  

To prove this durring the next maintenance window i'm gonig to enable netlogon debugging as this will allow me to see which DC my client/member servers are caching.  

Thanks

Thanks
0
 
LVL 1

Expert Comment

by:mabthal
ID: 24350758
Try making the other DC's in that site global catalog servers
0
Office 365 Training for IT Pros

Learn how to provision tenants, synchronize on-premise Active Directory, implement Single Sign-On, customize Office deployment, and protect your organization with eDiscovery and DLP policies.  Only from Platform Scholar.

 

Author Comment

by:esbfern
ID: 24353489
We have two DC's in the site that are Global Catalog Servers.  
0
 

Author Comment

by:esbfern
ID: 24353573
Having one, two, or zero GC's in the same site shouldn't make a difference because our DC's are the most up to date / authoritative information wise for our Child Domain.  GC's just store limited attributes about all records in other domains.  Having two DNS servers should be all that is required.    
0
 
LVL 1

Expert Comment

by:mabthal
ID: 24354058
In a native-mode domain, a Global Catalog server is a requirement for logging on to the domain. For this reason, it is advisable to have at least one Global Catalog server in a site. If a Global Catalog is not available in a site and there is another Global Catalog server in a remote site, the server in the remote site can be used for the logon process. If no Global Catalog is available in any site, the logon process proceeds with cached logon information
0
 

Accepted Solution

by:
esbfern earned 0 total points
ID: 25849786
Turned out that when clients try and discover if a DC is down, it's a little bit more than a ping, but not much.  So should a DC take a few minutes to shutdown, a client may continue to keep trying to use the DC that is in it's cache.  we created a shutdown scrip that turns on the windows firewall to stop all traffic from the DC.  We then created one for a start up script that turns it back on.  Once implamenting this, all alerts have stoped.
0

Featured Post

Office 365 Training for IT Pros

Learn how to provision tenants, synchronize on-premise Active Directory, implement Single Sign-On, customize Office deployment, and protect your organization with eDiscovery and DLP policies.  Only from Platform Scholar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
This article explains the steps required to use the default Photos screensaver to display branding/corporate images
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question