DNS failover not working as expected query
Posted on 2007-07-31
Thanks in advance. I have been working on a really strange issue today and last night. Here's the run down. Basically we have two HPUX 11.i DNS servers lets call them 192.168.1.03 and 192.168.1.04. Each of our UNIX servers had the resolv.conf file configured as: -
We then had a major hardware failure for 192.168.1.04 and were not able to bring it back online. We were then seeing some DNS delays when trying to resolve hostnames but it was working to the secondary name server 192.168.1.03. In order to fix the delay we edited the resolv.conf file and changed the order of the name servers to be: -
This fixed the delay when doing nslookups but we still had issues with some applications including SAP. Our application servers were giving unknown host errors when trying to communicate with each other even though ping and nslookups from the OS were working without any problem.
We then commented out the 192.168.1.04 box from one of the server with the issue. This still did not help. Finally the only thing we could do to work around the issue was to reboot the server with the 192.168.1.04 line commented in the resolv.conf file and this fixed the issue. We were then later able to get the 192.168.1.04 DNS server back online and this resolved the issue on the remaining servers that had not been rebooted but had the resolv.conf file configured as follows: -
I guess what I am wondering is why the servers/applications seemed unable to failover to the secondary DNS server even though at the OS level the failover was working correctly. Does anyone have any thoughts/comments on how to configure this so that failover would work as we would like and prevent any outages in case of a single DNS failure?
Please accept my apologies for the length of this post and thanks for taking the time to read it.
PS. All local host files only contain the localhost and all the nsswitch.conf files were set to dns (noservercontinue) files.