DNS issues?

we have 3 domain controllers.

1, is the PDC with the FSMO roles, DHCP, DNS etc. - its a 2008 R2 server
2, is a DNS server, and holds the second half of the DHCP, DNS - its a 2003 R2 machine
3, is another 2003 R2 machine.

basically we had a full 2003 R2 system, then needed to upgrade the AD.
we trasnferred the roles to the new PDC, everything was running fine for a few days,
then we demoted the old PDC and changed its IP, we then gave the new PDC the same IP as the old PDC had, (to make things easier)

this is when the issues started.
we cannot get stable ping's to the new PDC. basically we get like 3 good, then a few bad, then 10 good, 20 bad, just really intermittent.

from the PDC, we get the same ping results.

we have everything installed on a DVswitch on vmware esxi, so the domain controllers are on the same DV switch, and so replication is occuring... funny enough, i think the hosts pass the network traffic over the dvswitch without hitting the core switch.

from the hosts, i can ping the core switch, so there seems no issues there.
we think there may be a DNS issue, but cannot place our finger on it. we were thinking that the DNS was trying to resolve outside the local domain first instead of checking inside the domain first, then going outside, however i am not entirely sure how to check that. i checked the DNS setup, and there are forwarders set for the external DNS.

any ideas?
i have pasted below the dcdiag results i just ran.


Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = *MYSERVER*
   * Identified AD Forest.
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\*MYSERVER*
      Starting test: Connectivity
         ......................... *MYSERVER* passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\*MYSERVER*
      Starting test: Advertising
         ......................... *MYSERVER* passed test Advertising
      Starting test: FrsEvent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... *MYSERVER* passed test FrsEvent
      Starting test: DFSREvent
         ......................... *MYSERVER* passed test DFSREvent
      Starting test: SysVolCheck
         ......................... *MYSERVER* passed test SysVolCheck
      Starting test: KccEvent
         A warning event occurred.  EventID: 0x800004D0
            Time Generated: 10/19/2011   16:22:07
            Event String:
            Active Directory Domain Services attempted to perform a remote proce
dure call (RPC) to the following server.  The call timed out and was cancelled.

         A warning event occurred.  EventID: 0x800004D0
            Time Generated: 10/19/2011   16:27:54
            Event String:
            Active Directory Domain Services attempted to perform a remote proce
dure call (RPC) to the following server.  The call timed out and was cancelled.

         ......................... *MYSERVER* passed test KccEvent
      Starting test: KnowsOfRoleHolders
         ......................... *MYSERVER* passed test KnowsOfRoleHolders
      Starting test: MachineAccount
         ......................... *MYSERVER* passed test MachineAccount
      Starting test: NCSecDesc
         ......................... *MYSERVER* passed test NCSecDesc
      Starting test: NetLogons
         ......................... *MYSERVER* passed test NetLogons
      Starting test: ObjectsReplicated
         ......................... *MYSERVER* passed test ObjectsReplicated
      Starting test: Replications
         [DAVID] DsBindWithSpnEx() failed with error 1722,
         The RPC server is unavailable..
         ......................... *MYSERVER* failed test Replications
      Starting test: RidManager
         ......................... *MYSERVER* passed test RidManager
      Starting test: Services
         ......................... *MYSERVER* passed test Services
      Starting test: SystemLog
         A warning event occurred.  EventID: 0x0000000B
            Time Generated: 10/19/2011   15:36:38
            Event String:
            Custom dynamic link libraries are being loaded for every application
. The system administrator should review the list of libraries to ensure they ar
e related to trusted applications.
         A warning event occurred.  EventID: 0x00000420
            Time Generated: 10/19/2011   15:37:27
            Event String:
            The DHCP service has detected that it is running on a DC and has no
credentials configured for use with Dynamic DNS registrations initiated by the D
HCP service.   This is not a recommended security configuration.  Credentials fo
r Dynamic DNS registrations may be configured using the command line "netsh dhcp
 server set dnscredentials" or via the DHCP Administrative tool.
         A warning event occurred.  EventID: 0x00002724
            Time Generated: 10/19/2011   15:37:31
            Event String:
            This computer has at least one dynamically assigned IPv6 address.For
 reliable DHCPv6 server operation, you should use only static IPv6 addresses.
         An error event occurred.  EventID: 0xC2000001
            Time Generated: 10/19/2011   15:37:36
            Event String: Unexpected failure. Error code: 490@01010004
         A warning event occurred.  EventID: 0x0000000C
            Time Generated: 10/19/2011   15:37:37
            Event String:
            Time Provider NtpClient: This machine is configured to use the domai
n hierarchy to determine its time source, but it is the AD PDC emulator for the
domain at the root of the forest, so there is no machine above it in the domain
hierarchy to use as a time source. It is recommended that you either configure a
 reliable time service in the root domain, or manually configure the AD PDC to s
ynchronize with an external time source. Otherwise, this machine will function a
s the authoritative time source in the domain hierarchy. If an external time sou
rce is not configured or used for this computer, you may choose to disable the N
         A warning event occurred.  EventID: 0x000727AA
            Time Generated: 10/19/2011   15:39:33
            Event String:
            The WinRM service failed to create the following SPNs: WSMAN/*MYSERVER*.
tareeccs.nsw.edu.au; WSMAN/*MYSERVER*.
         ......................... *MYSERVER* failed test SystemLog
      Starting test: VerifyReferences
         ......................... *MYSERVER* passed test VerifyReferences

   Running partition tests on : DomainDnsZones
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test

   Running partition tests on : ForestDnsZones
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test

   Running partition tests on : Schema
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation

   Running partition tests on : Configuration
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation

   Running partition tests on : tareeccs
      Starting test: CheckSDRefDom
         ......................... tareeccs passed test CheckSDRefDom
      Starting test: CrossRefValidation
         ......................... tareeccs passed test CrossRefValidation

   Running enterprise tests on : mydomain.com
      Starting test: LocatorCheck
         ......................... mydomain.com passed test LocatorCheck
      Starting test: Intersite
         ......................... mydomain.com passed test Intersite


Open in new window

Who is Participating?
jcmurphy777Author Commented:
sorry re the delay in the post.

the 5412zl was faulty and we had it replaced.

new one works great. seems a few other issues have now dissapeared as well..

seems it has been faulty for a while, and because there was no major reasons to suspect the 5412zl, we just thought it was other things.

so all up and running now. new DC's are working fine now.

thanks everyone for your efforts.
I've had an issue like this and it ended up being associated with arp cache.  This solution might not be for you but I fixed my issue by shutting EVERYTHING down, killing power to all my network equipment, then bringing everything back up.

I think my issue was one of the switches somehow thinking an IP address that changed was still pointed to the old device.  I rebooted servers and still had the problem persist.  It was finally when I killed everything that it went away.  Routers, switches, servers, workstations, etc.

It was a nightmare.
Krzysztof PytkoSenior Active Directory EngineerCommented:
I would suggest to follow with Ace Fekay's article on his blog for that. There is good step-by-step explanation what you have to do. Give it a try and then you should succeed.

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

jcmurphy777Author Commented:
thanks for that,

we tried wiping the core switch, as we thought it was an error on the config.

no difference.

i just changed the arp time to 1 minute,
no differences....

proxy arp is turned off on the core as well....

thanks for the prompt reply though Purple Tiddler

jcmurphy777Author Commented:
will have a read iSiek, and will post how it goes.
Well shoot.  It sounded like an arp issue to me.  I was excited that someone else experienced the same thing I did.  Sorry that wasn't it, that would have been too easy. :)
jcmurphy777Author Commented:
am pretty sure that every step was followed correctly, however getting my colleague who did the work to run through the step-by-step to make he didnt miss anything.

will post asap.
Krzysztof PytkoSenior Active Directory EngineerCommented:
OK, great. I'm looking forward for new info :)

jcmurphy777Author Commented:
ok, all was done accordingly.

so we have intermittent pings to the new PDC,
but have complete access to the other 2 DC's, which are the 2003 R2 machines.

all 3 are set to GC as well.
Have you checked your arp tables from a machine that has the ping issue?  Under windows I believe it's arp -a from the cli.  Whenever the pings drop check your arp and see if the mac address for the IP you're pinging doesn't match the machine the IP belongs to.

It really does still sound like an arp issue...
jcmurphy777Author Commented:
I hear you purple_tiddler,

i havnt checked the arp on the server, however i have dumped the nic (its a vm) and re-added it, then run netsh int ip reset and netsh winsock reset which should clear all that..?

One issue is that from the new pdc, you cant even ping the core switch, being a vm, i tried moving it to another host, another dvswitch, but all have the same issue.
Iv also moved the vm to a standard vswitch, changed the port, dumped the firewall...
The vm was cloned from the same template the rest of the servers are running off.
All the servers are on the same dvswitch as well, so should be affecting others if it was a nic or vlan tagging issue.

I have also dumped the core switch config, then reset it back up, using a basic setup ,i.e. No vlans etc. But to no avail. I have had the manufacturer check the core switch, and they gave it a flag of good health.

We have been running extensive checks over the setup, but cannot locate the fault.
We have run dcdiag again and netdiag, and they are coming up with no dns etc errors.

jcmurphy777Author Commented:
We've just had a tech look over the pdc and dns etc, he beleives its a layer 2 switching issue, and that there is no issues with the new setup.

So we now have to look again at the manufacturer of the core switch for help.

I will post the results of that as soon as we have anything...

The arp on the server wouldn't be the problem anyway.  Arp is what other nodes on the network use to convert an IP address to a hardware (MAC) address.  It would make sense that he thinks it's a layer 2 issue since ARP is the translation between layers 3 and 2.  I really do still believe it's all related back to ARP and the best test would be to check the tables on a machine that has intermittent pings to the server in question.
And no, the netsh resets wouldn't clear arp since almost every device on the network will have an arp table.  Routers, switches, devices, any nodes.  That's why I suggested powering down EVERYTHING at the same time.  Shut down any VM guests, power down ESXi, etc.  It'll clear the ARP on everything and wouldn't give the equipment a chance to get a bad ARP lookup from another device.  I don't really know how it works exactly but my understanding is ARP tables are generated from the entire network basically.  There's not any one point that is the authoritative ARP table provider.

Man I really hope this helps.  I feel almost like I'm pushing and pushing more in the wrong direction but having someone else suggest a Layer 2 issue just reaffirms my feelings of ARP issues.
jcmurphy777Author Commented:
I really appreciate tge help.

We have restarted the whole system several times,

once when i reset the core, then again when we first started noticing other issues across the school.

All the servers are vm's and everything has the ping issue,
physical pc's and vm's on another switch.

But at this stage we will try nearly anything again.
I will try and get the system down again

Only other thing I can think of is a bad switch somewhere.  Wish we could get more experts in here to weigh in on the problem.
jcmurphy777Author Commented:
yeah its looking more like a core switch issue, the more we look at it.

we may need to get vmware and HP to work on either end to find the fix.

unless someone else can come up with something....

i may try and request attention to this question.
jcmurphy777Author Commented:
interestingly this morning, everything is working, we changed nothing, so not entirely sure if things took a while to propogate for some reason, or the switch is definitely faulty...we are getting massive ammounts of excessive broadcasts on the switch, as well as ports coming up and down like yoyo's.

we are going to roll back the 5412zl to an earlier firmware, and if that still fails, then try putting back in our previous core, a 5308xl. will post how that goes.
Yikes.  Good luck to you.  Glad you at least have a better idea now of where the problem exists.
jcmurphy777Author Commented:
wow, interesting night.
my colleague was up all night after the 5412zl failed to come up like planned....

we factory reset the switch, and reverted to the old firmware in the end, and the same issue is there.
we swapped the 5412zl for the 5308xl and everything works fine with the exact config setup.

so looks like it is a 5412zl issue...

will post if this changes...
jcmurphy777Author Commented:
Hi everyone,
sorry about the delay in awarding points etc,
just had to nut over all the info etc again.

the article from iSiek was a great tutorial and guide.
Thanks Purple_Tiddler for your info about arp and layer 2 etc.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.