It seems that we have run into a DNS problem that is intermittently affecting backup exec and quite possibly more. For future reference, the well know backup exec "communications failure" was solved for us when we bypassed our primary DNS server (using with LMHOSTS, using another DNS with same records, or manually assigning the IP address of the machine instead of its DNS Name.)
Now that we have identified a potential problem, it seems that their may be other issues affecting the primary DNS or WINS server. Heres the info---
Primary DNS server is running Windows 2003 Enterprise
Setup for scavenging and aging of stale records
NOT svavenging old records..... Multiple duplicate PTR's exist that affect reverse lookup which affects a small portion of our applications (SUS to name one.) These records are marked as stale, and eventid 2501 goes through every day without removing any stale entries.
A sniffer setup on the same network using the DNS server gets the following interesting traffic (sniffer setup with a direct network connection)...
Source: Primary DNS: 53
Destination: sniffer: 30XX
Information::::: DNS query failure
Computer: Some invalid computername that is no longer on the network....
And then does this consistently at about 2 a second.....
In addition to this, practically every computer is incrementing up on the "received address errors" counter (this one, for example, is incrementing up about 2000 a day.)
Packets Received = 3338434556
Received Header Errors = 0
Received Address Errors = 16122
Datagrams Forwarded = 0
Unknown Protocols Received = 0
Received Packets Discarded = 0
Received Packets Delivered = 3338418419
Output Requests = 781894773
Routing Discards = 0
Discarded Output Packets = 0
Output Packet No Route = 0
Reassembly Required = 56
Reassembly Successful = 28
Reassembly Failures = 0
Datagrams Successfully Fragmented = 28
Datagrams Failing Fragmentation = 0
Fragments Created = 56
Going through the layers (OSI model)
Layer 1 is fine from the DNS server to servers potentially having the problem.
Layer 2 connectivity appears to be OK, although I have my doubts due to increased problems after a switch change that affected the pimary DNS server, but the simplicity of layer 2 leaves it looking pretty innocent
Layers 3 and 4 should be just fine, because no changes were made affecting the servers or our access rules
Anything above that could well be a problem.....
Many of the servers are getting this problem intermittently....
Attempt to update DNS Host Name of the computer object in Active Directory failed. The updated value was '(Omitted)'. The following error occurred:
Access is denied.
On the other hand, the event viewer on the primary DNS server
The DNS server is configured to forward to a non-recursive DNS server at Ommitted.
DNS servers in forwarders list MUST be configured to process recursive queries.
1) fix the forwarder (Omitted) to allow recursion
- connect to it with DNS Manager
- bring up server properties
- open "Advanced" tab
- uncheck "Disable Recursion"
- click OK
2) remove this forwarder from this servers forwarders list
- DNS Manager
- bring up server properties
- open "Forwarders" tab
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp
Lastly, the DNS server ip configurations....
Primary DNS Server
First DNS Server: Localhost (using the appropriate address-i.e. not 127.0.0.1)
Second DNS Server: Secondary DNS Server
Secondary DNS Server
First DNS Server: Localhost (using the appropriate address-not 127.0.0.1)
Second DNS Server: Localhost
Third DNS Server: Domain Controller running DNS
The domain controller running DNS is not scavenging any records and sees the same duplicate PTRs as the primary DNS. Any suggestions or need more information? Thanks!