Named (BIND) 'spontaneously' dying
Posted on 2004-09-28
Our named instance keeps dying, at seemingly random times.
We had this problem a while ago (maybe 2 months) so after some advice from use groups etc. I made some changes to zone configs.
That didn't work and it still died randomly, but then about maybe a day after those changes were made, it stayed alive.
Two months later, it's happening again.
THIS time tho, I can see something in messages that may be of help. That doesn't mean it wasn't there the first time, I'm very new to a LOT of things in linux, so it's VERY possible I overlooked them before. (The user groups were the ones that informed me of the EXISTANCE of messages for example!)
We have been getting around the problem by restarting named whenever it went down
(service named restart) - and I wrote a PERL script this time 'round to restart it every 30 mins through a cronjob. (it dies anywhere from every 10 mins to every couple of hours)
We are running
- Redhat 7.3 (we can't upgrade redhat sorry)
- BIND 9.2.0
This is the part of messages that seems to show 'why' it's dying, but 'sif I can decipher it ;)
Sep 27 15:14:53 linux01 named: message.c:809: REQUIRE(*rdataset == ((void *)0)) failed
Sep 27 15:14:53 linux01 named: exiting (due to assertion failure)
This is what happens when we restart it (i think) after it dies
Sep 27 15:36:08 linux01 named: named shutdown failed
Sep 27 15:36:08 linux01 named: starting BIND 9.2.0 -u named
Sep 27 15:36:08 linux01 named: using 1 CPU
Sep 27 15:36:08 linux01 named: loading configuration from '/etc/named.conf'
Sep 27 15:36:08 linux01 named: no IPv6 interfaces found
I would appreciate ideas on how to permanently solve this problem, because it is a huge nuisance to us and can reduce productivity a lot.
It would also be of huge benefit if someone can explain what named/bind does in more detail (ie how it works) so I can have a better understanding of it and it's problems, and why it might be dying etc. - that is, the more you can understand my understanding the better! May be worth extra points if I find it very useful. :)
Apart from those lines from messages above, there were quite a few saying 'lame server' etc. but I've been assured they are harmless, at least in regards to our current problem.
Here is another excerpt. (I have replace the domains & stuff with ****)
Sep 27 14:59:27 linux01 named: zone *****1.com.au/IN: loading master file *****1.com.au: file not found
Sep 27 14:59:27 linux01 named: *****2.com.au:1: no TTL specified; using SOA MINTTL instead
Sep 27 14:59:27 linux01 named: zone *****2.com.au/IN: loaded serial 2001091501
Help please. :|