Solved

Simulating DNS, GC failure

Posted on 2014-03-24
66
225 Views
Last Modified: 2014-05-29
We have 2x sites, each with 2x DCs at each, each running DNS, DHCP,, all are GCs.

I had upgraded and had to shut down one of the DC's and it appeared that the other DNS/DC wasnt working, nslookups were failing etc.

Is there a way to test the DC is working as it should and is there a way to simulate GC/DC failure?

Thanks
0
Comment
Question by:CHI-LTD
  • 33
  • 15
  • 6
  • +4
66 Comments
 
LVL 4

Accepted Solution

by:
steedBmaher earned 500 total points
Comment Utility
Hi,

The is a command line utility called DCDiag that you can run to test and diagnose DC failures.

Unfortunately this utility doesn't simulate failure, but could give you incite into errors that exist.

Here is a link: http://technet.microsoft.com/en-us/library/cc731968.aspx

Regards,

SBM
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
i ran dcdiag and got lots of errors.
0
 
LVL 4

Expert Comment

by:steedBmaher
Comment Utility
ok, My recommendation then would be to run a google search on the errors you are getting, In most cases DC errors relate to DNS not being setup correctly.

You could also search experts exchange for the errors and see the fixes.
I'm sure most errors relate to one or two misconfigurations, either in the DC settings or DNS.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
Here are the errors:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
C:\Users\dradministrator>dcdiag
Directory Server Diagnosis
Performing initial setup:
   Trying to find home server...
   Home Server = dr server
   * Identified AD Forest.
   Done gathering initial info.
Doing initial required tests
   Testing server: Hounslow\DR SERVER
      Starting test: Connectivity
         ......................... DR SERVER passed test Connectivity
Doing primary tests
   Testing server: Hounslow\DR SERVER
      Starting test: Advertising
         ......................... DR SERVER passed test Advertising
      Starting test: FrsEvent
         ......................... DR SERVER passed test FrsEvent
      Starting test: DFSREvent
         ......................... DR SERVER passed test DFSREvent
      Starting test: SysVolCheck
         ......................... DR SERVER passed test SysVolCheck
      Starting test: KccEvent
         ......................... DR SERVER passed test KccEvent
      Starting test: KnowsOfRoleHolders
         ......................... DR SERVER passed test KnowsOfRoleHolders
      Starting test: MachineAccount
         ......................... DR SERVER passed test MachineAccount
      Starting test: NCSecDesc
         ......................... DR SERVER passed test NCSecDesc
      Starting test: NetLogons
         ......................... DR SERVER passed test NetLogons
      Starting test: ObjectsReplicated
         ......................... DR SERVER passed test ObjectsReplicated
      Starting test: Replications
         [Replications Check,DR SERVER] A recent replication attempt failed:
            From CHI-AD1-DR to DR SERVER
            Naming Context: DC=DomainDnsZones,DC=domain,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network tr
oubleshooting, see Windows Help.
            The failure occurred at 2014-03-20 10:57:49.
            The last success occurred at 2014-03-20 09:57:28.
            1 failures have occurred since the last success.
         [CHI-AD1-DR] DsBindWithSpnEx() failed with error 1722,
         The RPC server is unavailable..
         [Replications Check,DR SERVER] A recent replication attempt failed:
            From CHI-AD1-DR to DR SERVER
            Naming Context: DC=ForestDnsZones,DC=domain,DC=local
            The replication generated an error (1256):
            The remote system is not available. For information about network tr
oubleshooting, see Windows Help.
            The failure occurred at 2014-03-20 10:57:49.
            The last success occurred at 2014-03-20 09:57:28.
            1 failures have occurred since the last success.
         [Replications Check,DR SERVER] A recent replication attempt failed:
            From CHI-AD1-DR to DR SERVER
            Naming Context: CN=Schema,CN=Configuration,DC=domain,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2014-03-20 10:58:31.
            The last success occurred at 2014-03-20 09:57:28.
            1 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,DR SERVER] A recent replication attempt failed:
            From CHI-AD1-DR to DR SERVER
            Naming Context: CN=Configuration,DC=domain,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2014-03-20 10:58:10.
            The last success occurred at 2014-03-20 10:46:37.
            1 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,DR SERVER] A recent replication attempt failed:
            From CHI-AD1-DR to DR SERVER
            Naming Context: DC=domain,DC=local
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2014-03-20 10:57:49.
            The last success occurred at 2014-03-20 10:51:28.
            1 failures have occurred since the last success.
            The source remains down. Please check the machine.
         ......................... DR SERVER failed test Replications
      Starting test: RidManager
         ......................... DR SERVER passed test RidManager
      Starting test: Services
         ......................... DR SERVER passed test Services
      Starting test: SystemLog
         A warning event occurred.  EventID: 0x000003F6
            Time Generated: 03/20/2014   11:05:30
            Event String:
            Name resolution for the name domain.local timed out after none
of the configured DNS servers responded.
         ......................... DR SERVER passed test SystemLog
      Starting test: VerifyReferences
         ......................... DR SERVER passed test VerifyReferences

   Running partition tests on : DomainDnsZones
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test
         CrossRefValidation
   Running partition tests on : ForestDnsZones
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test
         CrossRefValidation
   Running partition tests on : Schema
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
   Running partition tests on : Configuration
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
   Running partition tests on : domain
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
   Running enterprise tests on : domain.local
      Starting test: LocatorCheck
         ......................... domain.local passed test LocatorCheck
      Starting test: Intersite
         ......................... domain.local passed test Intersite
C:\Users\dradministrator>
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
Hi,

Seems lots of replication error with you DC. Please to fix one by one on DR SERVER.

http://technet.microsoft.com/en-us/library/replication-error-1722-the-rpc-server-is-unavailable%28v=ws.10%29.aspx

Also run the DCDIAG /test:DNS
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
Its ok now as the server was just offline.  This was the result when the server was off and i was trying to route out (assuming it would use the other DC)...
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
yes, it should use other DC,

1. Please check the NIC property and see if you have added both DNS server IP.
2. let the server offline and run the NSLOOKUP
3. run the "SET L" and see from where it it authenticating.
3. run the DCDIAG /test:DNS.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
1. both dns server ips are in there, as well as the 2 on site b on different network (over vpn)
2. server off, ran nslookup on client machine (logged in as domain admin)
3. what do you mean?
4. lots of failures, but at the end it said: pass, warn, pas x4..
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
1. make sure that other server on same site ip is configured on top.
2. is nslookup woking fine.
3. run "set l" on cmd
4. it may be root hints error. chk root hints.
0
 
LVL 11

Expert Comment

by:Giladn
Comment Utility
Hi,

Sorry, but I don't understand, are you using microsoft cluster services?
read here:
http://technet.microsoft.com/en-us/library/cc730647.aspx

you should be using the CLUSTERS IP address this way services won't be failing.

Hope this helps,

Gilad
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
1. so on server A: point preferred server B?
2. look to be.
3. don't know what you mean?
4. yes it appears to be root hints and ipv6.

@ gilad:

no cluster services.
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
point 3,

if you will run the command "set l", it will show you logonserver name. so that we can confirm that its looking to which server firstly to resolve its query.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
so 'nslookup set 1'?
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
which fails to run
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
it SET L

L as Lima
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
can you type the full command?
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
in command prompt type

SET L

Open in new window


or

echo %logonserver%

Open in new window

0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
okay the logon servers are the locally logged on server.
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
if you have checked it on DC then its good.

r u still facing the DNS issue ?
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
yes, i have a single laptop in there with 3x entries...
However other clients that were on our main site LAN seem to only have 1x record, so we are getting there..
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
Hi,

please check the DNS  and set aging and scavenging properties for the DNS server. May be some steal records.

http://technet.microsoft.com/en-us/library/cc753217.aspx
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
all zones under reverse and the servers are set to scavenge.
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
what is your DHCP lease period set and what is scavenging period.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
4 days
7 days

note, the IPs in DNS (10.255.255.0) are not managed by the LAN DHCP Servers.  They get their IPs from the firewall..
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
But you have multiple DNS records in DNS. so, if DHCP lease period is 4 days then change the scavenging period to 4 days.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
ok will do.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
In the properties of each of the 2x reverse lookup zones/IPs that the firewall gives out?  Can i set this to 1hour as DHCP not managing this..?
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
sorry not getting you...
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
In the 2x reverse lookup zones, can these be configured to 1 hour?
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
Hi,

it is not recommended to keep the scavenging period less than DHCP lease period, also keeping 1 hour may cause many issue with service records.

"By default, computers that run Windows Server 2003 and that are statically configured for TCP/IP try to dynamically register host address (A) and pointer (PTR) resource records for IP addresses that are configured and used by their installed network connections. By default, all computer register records are based on the full computer name."

http://support.microsoft.com/kb/816592
http://technet.microsoft.com/en-us/library/cc937921.aspx
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
I dont know what else to try.

My VPN users on a separate IP range are causing the netlogon warnings.  
The DNS servers are also showing duplicate entries (IP addresses) for these VPN clients.

One of the permanent VPN inks manages DNS for the clients but the remote VPN users do not have any specific servers that manage them hence why i think they are showing duplicate IPs.

Also user what are connecting over a wifi connection on the DNS/DHCP LAN then disconnect and connect with cat5 then have 2 records too..  Nslookup results are correct when ran on a client though..

Any other ideas?
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
not sure, please add VPN and networking in Topic, so that other can see your question.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
but its part of DNS..
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
i know but some other experts can share there experience and feedback.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
can i edit this?
0
 
LVL 13

Expert Comment

by:Santosh Gupta
Comment Utility
yes
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
YOur DHCP server scope options should allocate the DNS servers that are local to it.
use netdiag to test
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
One way to "simulate" a failure of a single server is to have a local firewall on your desktop and configure it to block/deny all traffic to/from the server's IP address.
0
 
LVL 19

Expert Comment

by:compdigit44
Comment Utility
Regarding your VPN client have you added the subnet that are coming in from to the site which they should be associated with in AD Site and Services?

Also to help refresh all information can you please upload the results of the following:

dcdiag /e /v >c:\dcdiag.txt
repadmin /showrepl >c:\repadmin.txt

Upload a screeno of your current AD Sites and Services layout and IP schem on servers
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
Is there a time limit before the other DNS server kicks in?  I think it was offline for about 30 minutes while i was getting the errors above with dcdiag.

DNS & DHCP Screenshots attached.
DNS.jpg
DHCP.jpg
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
A DNS server should always be active and ready to respond to queries.

On a IP host configure one or more IP addresses pointing to  DNS servers.  These are the IP addresses that your computer/device will use to look-up host names and host addresses.  

The IP host will send a look-up request to each IP address in order until it gets a respond from one of them or get no response for all of them.

Depending on the device now the look-up process works varies.

I believe most devices these days have a 5 second timeout and tries 2 for each host before going to the next address in the list.  It stops sending requests as soon as it gets a response.

So:

1) Lookup sent to DNS address #1
2) Wait 5 seconds.
3) Lookup sent to DNS address #1
4) Wait 5 seconds.
5) Lookup sent to DNS address #2
6) Wait 5 seconds.
7) Lookup sent to DNS address #2
8) Wait 5 seconds.
9) Lookup sent to DNS address #3
10) Wait 5 seconds.
11) Lookup sent to DNS address #3
12) Wait 5 seconds.

So if you have 3 DNS servers coded it will take 30 seconds for a look-up to fail if NO servers respond.  If you have 2 DNS servers, it only 20 seconds.

Now Windows does something unique.  If DNS address #1 fails to respond and DNS address #2 responds, it will internally move DNS address #2 to the top of the list so the next look-up it will send the query to it first.  The idea being I want to send a request to the something that responded last time.

Now the response does not have to be a success, its just a response.  That is, "no such host name" is a response and the look-up process will stop if it receives this.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
So my brief test should have worked then...
Shutting down one of the DCs should be a good enough test and i should be able to use the other DC/DNS/DHCP box..?
0
 
LVL 19

Expert Comment

by:compdigit44
Comment Utility
If your network and environment is setup correctly, then yes
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
DHCP is only used when you first boot a PC or 1/2 of the way through a lease.  So unless you rebooted your PC or just happened to be 1/2 way through your lease, you would not have tested DHCP.

DC functions are used at boot time and other times, but I'm not 100% what functions would force DC communication, so I don't know if your DC was down long enough to test fail over for these functions.

DNS is used anytime you lookup a host name or host ip address.  Assuming you did something to cause a lookup and it was not already cached, you should have used the active DNS servers.

IMHO execpt for DHCP functions configuring your firewall to block all traffic to/from of of your DC's will test DC functions and DNS.  You will want to flush DNS cache and close and re-open IE to verify that you are not resolving names via your local cache.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
i will down the DC again and update..
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Just to make sure, by shutting down the DC you are not simulating a failure, you are causing an actual "controlled" failure that will affect everything that uses or tries to use that  DC.

That is why I am suggesting firewall on a single computer so that you only affect a single computer, yours.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
hmm, ok.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
the DNS servers are responding to my remote clients fine and allocating name and IPs within DNS.
but when there is already an entry in the LAN side of DNS and the same laptop moves to another site and IP then DNS will not update...

Ideas?
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
The DHCP server should be notifying the DNS server to update DNS.

Also check the NIC's configuration and see if "Register this connection's address in DNS" is checked.  This under Advanced and then the DNS tab.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
the dhcp server for remote clients is the firewall not the windows server.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
You need to then setup scavanging on your DHCP.
One thing is to configure the DHCP server locally to register names, while the NICs on the mobile systems should be configured not to register into the DOMAIN.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
can the firewall do this?
can you explain on the register names side more and also where not to register on the domain is?
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
The only reason a remote computer's NIC registers with your DNS server is because within the NIC properties/ IP properties/advanced/DNS tab there is a check box with register this connection in DNS above the check mark there is a reference to the DOMAIN on which this connection should register. There are a bunch of settings here.

The issue could also be the result of the lease time settings in DHCP,

It is hard from this point to determine the cause of the stale records.

A firewall based block is not an options  since the only way to block it on the firewall is to deny port 53 traffic within the VPN which will prevent any access to inter-lan i.e. site A will not be able to resolve named on site B.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
i have reduced scavenging from 7 & 7 days to 1 & 2 hours in DNS ..
DHCP scope on servers are 4 days.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
That is too short, and you have to be careful with Server IPs that are registered by their DNS rather than created with the scavenging/deletion options off as they will be deleted by the scavenging process.

i.e. server1 on boot registers serever1.yourdomain.com with IP 192.168.0.5
after 7 days, or could be earlier the scavenging process deletes the server1.yourdomain.com record.

The lease times on the DHCP server should not be 7, 14 days.  The length of the lease is inversely proportional to the DHCP traffic. i.e. if the lease time  is short, there will be more frequent DHCP requests to renew the IP.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
will see if it helps.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
okay to confirm, would DHCP lease on our servers be suitable @ 1 day lease and set the servers DNS scavenge on all zones (.local and ip ranges to 3-4days)?

Thanks
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
Scavenging should often be at the same interval as the lease time.
In your suggested, lease one day, scavenge 3-4 days that would mean there will be a 2-3 day window when a hostname may have a stale IP....
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
okay so 1 day for all settings?
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
or in order to help with the remote machines not clearing out old IPs/names - reduce it to a couple of hours?
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
looks like i can change the scavenge settings on the 2x remote VPN subnets within the DNS servers separately.  Would this help us with remote machines that have local (192.168.*.*) records and remote records (10.255)?

thanks
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
The issue you face deal with forward records yourdomain.com and less so with reverse.
Host.yourdomain.com


The clean up on the reverse should be fine.
0
 
LVL 1

Author Comment

by:CHI-LTD
Comment Utility
okay so leave domain.local as 7 days & change reverse host zones to 1 hour say?
0
 
LVL 1

Author Closing Comment

by:CHI-LTD
Comment Utility
the dcdiag command was what's required.

i seem to have asked multiple questions on different issues too :)
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Sometimes drives fill up and we don't know why.  If you don't understand the best way to use the tools available, you may end up being stumped as to why your drive says it's not full when you have no space left!  Here's how you can find out...
Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now