Secondary DNS servers

    Good morning all,
I am sure that this question has been asked multiple times, but here goes again...

Problem: HPUX 11i server will not use secondary W2K DNS server without commenting out the first
         nameserver entry in the /etc/resolv.conf file.

Current settings,
resolv.conf:

search opr.sf.org agcy.sf.org opr.test.sf.org agcy.test.sf.org
nameserver xx.xxx.xxx.200 # wpsbz376.sf.org
nameserver xx.xxx.xxx.201 # wpsbz377.sf.org
nameserver xx.xx.xxx.200  # wpsbz46b.sf.org

Current settings,
nsswitch.conf

hosts: files [NOTFOUND=continue] dns



The problem that we had was, the primary nameserver (W2K server) lost a raid drive and was not resolving dns queries (but it was still pingable). The HPUX server did not start using the second nameserver until I commented the first nameserver line out in the resolv.conf file.

My question is, do you think that I have something configured incorrectly, or do you think that it was becasue the W2K server wasn't really down, but just had problems?

It is my belief that if the W2K server wasn't reloving DNS queries, the the Unix box should have resorted to the second nameserver, even though the primary nameserver was pingable.

Let me know your thoughts on this, and also let me know if you will need further information.

P.S. I work on the largest private network in the world (if you are wondering who, it is a very large insurance firm), so anyway ... I am not familiar with the settings of the W2K server, if this could be a factor.

One last thing, I am not able to perform a nslookup on any of the nameserver's hostnames.
i.e. $nslookup wpsbz377
...returns:
Using /etc/hosts on: cm75ctm1

looking up FILES
Trying DNS
*** wpsbz377.sf.org can't find wpsbz377: Non-existent domain

but if I try $nslookup wpsbz377.sf.org
...returns:
Using /etc/hosts on: cm75ctm1

looking up FILES
Trying DNS
Name: wpsbz377.sf.org
Address: xx.xxx.xxx.201

I think that this is because the 'sf.org' domain is not in my search string.

Just thought that that might be helpful...

j8vyAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
tfewsterConnect With a Mentor Commented:
man /etc/resolv.conf:

...The algorithm used is:  Try a name server; if the query times out, try the next...

If the first nameserver responds e.g.  `No address information is available for "yourhost"`, the HP won't try any further nameservers. It would only do so if the name service failed to give a valid response, i.e. the service was down.
0
 
j8vyAuthor Commented:
I am still doing some research on whether the service was down on the W2K DNS server, but lets just say that it was still up, but responding incorrectly.

I currently have the following line in my /etc/nsswitch.conf file:
hosts: files [NOTFOUND=continue] dns

what would happen if I added the following:
hosts: files [SUCCESS=return NOTFOUND=continue UNAVAIL=continue TRYAGAIN=continue] dns

Would this help any in the situation that I had? Where the DNS server obviously wasn't resolving the names, but the DNS server was possibly still active.

The reason that I ask, is because it appears at this time that the DNS server had problems with its zone files (not that it was down as we had originally thought), which may have caused it to not be able to resolve names.
0
 
tfewsterCommented:
I don't think it will make a difference, as those are the actions to take depending on the result of checking 'files';  The system will then move on to using dns, which uses its own rules (`man /etc/resolv.conf`)
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
j8vyAuthor Commented:
So.....

Like it says in the man page `man resolv.conf`,

"(The algorithm used is:  Try a name server; if the query times out, try the next and continue until all name servers have been tried, then repeat trying all the name servers until a maximum number of retries have been made)."

Aparrently, those request were NOT timing out. If the request were timing out, would there be any entries in /var/adm/syslog/syslog.log ?
0
 
tfewsterCommented:
Sorry, I don't know enough about nslookup to answer that.

Good luck with getting an answer!
Regards,
Tim
0
 
j8vyAuthor Commented:
Tim, I think that we will go ahead and 'PAQ' this question. I was finally able to determine that the reason the Unix box did not try the second nameserver was because the primary nameserver (W2K DNS server 'wpsbz376') was not actually completely down. It was still pingable, and the corrupted Net Raid drive was actually causing the DNS service on the Windows box to send corrupted DNS responses back to the Unix box. So as far as the Unix box new, the responses that it was receiving were accurate and the DNS server was up.

This kind of problem is often hard to troubleshoot in an environment like this, because the call came in to me saying that my Unix server was not resolving DNS request being sent by the application, and that in order to fix it they had to comment out the first nameserver entry. So upon my first investigation, it was reported to me that the Windows server was "down", which I found out later was not the case at all. It had simply lost a Raid drive causing corrupted zone files etc.

So here I am pulling out my hair poring over man pages etc trying to figure out what I had configured incorrectly... assuming the worst of course, and having to provide hard evidence to back myself up the whole way to prove that it wasn't my fault...

I am sure that you are aware of these types of situations and as a sys admin, I am sure it will happen a million times more. Anyway thanks for your help.
0
 
j8vyAuthor Commented:
Thanks Tim,
--Matt
0
 
tfewsterCommented:
Thanks for the points - even if it was just for backing up what you already knew ;-)

We have the same "us and them" issues at my place; Network Ops will never admit there is a problem with a hub etc, so the Unix team have to _prove _ there is no problem with the server; Then Networks check the hub port and the problem automagically goes away (Though Networks deny doing anything). Don't even get me started on the NT support team!

I guess it's inevitable in big companies where everyone specialises in their own area and no-one sees the "big picture"; Bouncing the problem to the Windows server support team, saying "We haven't changed anything; Could you check the DNS server?" may get the immediate problem fixed, but you're still being asked "Why didn't the Unix box second-guess the valid-but-useless response from the primary DNS server?". AFAIK, there's no simple answer to that one, unless a lookup query returns different exit codes for "Address found", "Address not found" and "Lookup sources failed".

Best wishes,
Tim
0
All Courses

From novice to tech pro — start learning today.