Link to home
Start Free TrialLog in
Avatar of psimation
psimationFlag for South Africa

asked on

service named start;p starts 5 named processes?

Hi
I've noticed some strange behaviour on my WBL machine wrt named and sendmail. This mnorning I started to see lots of sendmail errors rejecting mail because the sender address could not be resolved. I then checked my /etc/resolv.conf and found that all the entries " domain xxx.xxx , namserver 127.0.0.1, nameserver my.sec.dns , nameserver some.other.dnsserver" have dissappeared, and only 127.0.0.1 remained.
When fixing that and restarting named, I saw it spawned 4 or 5 named's. Sendmail started to work again, but later, during the day, it again started to moan, this time when users on the server tried to send out, it said sender address does not exist.

I checked my /etc/named.conf, it seems to be ok, and restarting it leaves no errors in the /var/log/messages, only shows all the notifies sent...

Any ideas, thoughts?
ASKER CERTIFIED SOLUTION
Avatar of jlevie
jlevie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of psimation

ASKER

HI ahoffman

well, sendmail basically just said it could not resolve the sender address (which is a local account that has entries in the named.conf and zone files the works, ie a properly configured virtual domain on localhost).

It's as if named "stopped" but there are no signs of named being stopped or anything...
> ..  sendmail basically just said it could not resolve the sender address
you mean that sendmail cannot find a MX record for the sender address?
Hi ahoffmann

Sendmail logs does not show anything specific such as that, it only states "sender address not resolvable" whether that is because sendmail cannot find a valid MX or not, I'm not sure. However, as stated , for those "unresolvable" sender domains, they all have correctly configured DNS records. The same server that does sendmail, also does named, and all those local domains have been configured on the same server, and when I dig @127.0.0.1 xxx.xxx any it resolves...
can you do a
  dig -mx <senderdomain>
Yes, I can do it now, BUT, it most probably could NOT do it at the time sendmail gave the errors. I restarted sendmail and named when I first noticed the problems, and that seemed to fix it. Unfortunately, I still don't know why it happened in the first place, nor why named seems to be spawning more processes than the norm. This is what prompted me to believe that the problem might lie with named, yet, as I stated, I cannot see anything wrong with the configuration or any reason for the system to start 5 processes instead of just one? There are no errors logged when restarting named, so I don't know where to look for any probelms...

Also, the main problem for me is that this could happen at any time again for all i know, causing my sendmail to basically reject all e-mails, and my users getting on my mammaries... ;)

 
Avatar of jlevie
jlevie

Do the multiple instances of named persist, or do they exit after a few minutes? Do you have one or more (how many?) secondaries?
Hi Jim
They persist. I only have one secondary, however, I have already started to think along those lines. The secondary still runs on RH7.0 using bind 8.??, while the box in question is the one I re-installed with WBL + all the latest updates. The zone files are identical to what they were, and so are the syntax used in named.conf, but i'm not sue if that can cause problems? If the primary cannot send notifies to the secondary, surely that will not cause the primary to "malfunction"?
Today I saw again at one stage that the "primary" was unable to lookup some domains , but that was while there were known network issues from the ISP with Intl Bandwidth. However, what happened was this, I did a "test" dig to an intl. site and the response was from 127.0.0.1 that the query timed out. After I restarted named, the same query worked, this time using the seconday nameserver to do the lookup since 127.0.0.1 could not find it. ( the secondary server is located on a different network that did not have a network problem at the time). It's as if the nameserver goes limbo or something and freezes for some reason and does not even try any further servers listed in resolv.conf...
> I did a "test" dig to an intl. site and the response was from 127.0.0.1 that the query timed out.

Okay, that's what should happen if your Internet link is having problems. The local named is responding, it just can't get any data from remote servers.

> After I restarted named, the same query worked, this time using the seconday nameserver to do the lookup since 127.0.0.1could not find it.

That indicates to me that after the restart the local copy of named is not responding to requests at all and dig tried the second name server listed in resolve.conf, which worked.

Using 'host', as I described earlier is a better test in that the specification of the name server IP will restrict the query to exactly that server.
Sorry, Jim, I missed you there, "Using 'host', as described earlier...", did you mean "search your.domain.tld"? If so, i did make that change, but in any event, I thought the reason for having more than one entry in resolv.conf is exactly for the reason of redundancy, ie, if the one cannot resolv, try the next in the list??? ( I admit, I don't know much about DNS...) , but you say the response was then actually correct in that the name stay unresolved even though an altrnate nameserver in the list would be able to resolv?
You don't mention anything about the difference in BIND versions, can I deduct that that would not play a role in this?
I'm talking about executing a 'host a-host.a-domain.tld 127.0.0.1' command on WBL to check the operation of the local name server that is supposed to be listening at 127.0.0.1. In a like manner 'host nother-host.a-domain 123.4.5.6' can be used to check to see if your secondary (at 123.4.5.6) is resolving. In each case the query will go to exactly and only the IP specified for the name server.

Listing two or more name servers in resolv.conf is a redundancy issue. If, an only if, the first name server does not respond at all (as in a timeout while connecting to the name server) the second will be tried, and so forth. With that in mind, take another look at my previous comment.

As far as the resolver libraries are concerned it shouldn't matter whether they are issuing the query to a Bind 8 or 9 server. However, one shouldn't try to use the zone files from a Bind 8 server "as-is' on a Bind 9 server. At the least the zone files all need a $TTL directive and the localhost definition needs to be changed. And of course the named.conf needs to be updated a bit.