Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 675
  • Last Modified:

Named (BIND) 'spontaneously' dying

Our named instance keeps dying, at seemingly random times.
We had this problem a while ago (maybe 2 months) so after some advice from use groups etc. I made some changes to zone configs.

That didn't work and it still died randomly, but then about maybe a day after those changes were made, it stayed alive.

Two months later, it's happening again.

THIS time tho, I can see something in messages that may be of help. That doesn't mean it wasn't there the first time, I'm very new to a LOT of things in linux, so it's VERY possible I overlooked them before. (The user groups were the ones that informed me of the EXISTANCE of messages for example!)

We have been getting around the problem by restarting named whenever it went down
(service named restart) - and I wrote a PERL script this time 'round to restart it every 30 mins through a cronjob. (it dies anywhere from every 10 mins to every couple of hours)

So -

We are running
- Redhat 7.3 (we can't upgrade redhat sorry)
- BIND 9.2.0

This is the part of messages that seems to show 'why' it's dying, but 'sif I can decipher it ;)

Sep 27 15:14:53 linux01 named[24776]: message.c:809: REQUIRE(*rdataset == ((void *)0)) failed
Sep 27 15:14:53 linux01 named[24776]: exiting (due to assertion failure)


This is what happens when we restart it (i think) after it dies

Sep 27 15:36:08 linux01 named: named shutdown failed
Sep 27 15:36:08 linux01 named[25206]: starting BIND 9.2.0 -u named
Sep 27 15:36:08 linux01 named[25206]: using 1 CPU
Sep 27 15:36:08 linux01 named[25209]: loading configuration from '/etc/named.conf'
Sep 27 15:36:08 linux01 named[25209]: no IPv6 interfaces found


I would appreciate ideas on how to permanently solve this problem, because it is a huge nuisance to us and can reduce productivity a lot.
It would also be of huge benefit if someone can explain what named/bind does in more detail (ie how it works) so I can have a better understanding of it and it's problems, and why it might be dying etc.  - that is, the more you can understand my understanding the better! May be worth extra points if I find it very useful.  :)

...
Apart from those lines from messages above, there were quite a few saying 'lame server' etc. but I've been assured they are harmless, at least in regards to our current problem.

Here is another excerpt. (I have replace the domains & stuff with ****)

Sep 27 14:59:27 linux01 named[24776]: zone *****1.com.au/IN: loading master file *****1.com.au: file not found
Sep 27 14:59:27 linux01 named[24776]: *****2.com.au:1: no TTL specified; using SOA MINTTL instead
Sep 27 14:59:27 linux01 named[24776]: zone *****2.com.au/IN: loaded serial 2001091501



Help please.  :|

Cheers,
Glauron
0
Glauron
Asked:
Glauron
  • 6
  • 3
  • 2
  • +2
1 Solution
 
jlevieCommented:
"zone *****1.com.au/IN: loading master file *****1.com.au: file not found" and "*****2.com.au:1: no TTL specified; using SOA MINTTL instead" are certain clues that your named config files aren't right for a 9.2 copy of Bind. Without seeing all of the files involved I can't say whether "message.c:809: REQUIRE(*rdataset == ((void *)0)) failed" if a result of a configuration error or not.

You can post your namcd.conf file and all of the zone files here (unmodified) or send them as attachments to jlevie@experts-exchange.com and I'll look them over.
0
 
GlauronAuthor Commented:
emailed  =)
0
 
jlevieCommented:
Got it, reply on the way.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
napoleon41Commented:
Hey!  I understand the need for security and everything, but some of us are trying to learn here.  LOL

Could you at least post the part that ends up being the problem?  It's nice to sit in on discussions that I'm not very knowledgable about yet and glean a bit.  ;-)
0
 
jlevieCommented:
I'll describe what the errors are.
0
 
GlauronAuthor Commented:
Yep. No problem with that. =)

I'll trust Jlevie with whatever he deems appropriate to display here. I know he won't go to far, if possible with this info.

Feel free to post whatever you think might help j.
0
 
GlauronAuthor Commented:
Yo - Jlevie.

=)

How'd u go with those zone files? - any luck?
It did it again, named dying - and has started working properly again.
0
 
dsimcoCommented:
I am having this same problem. What was your solution?
0
 
GlauronAuthor Commented:
Hey, glad to know I'm not the only one!  =)
But not glad to know that someone else is having this problem apart from that. =P

No solution yet,

       Jlevie, did you have any ideas?

--------

I got around the problem temporarily by setting a script to restart the named instance every 15 mins or so. Not at all perfect by any means, but when it worked, it saved restarting it manually.
Big pain tho.

Command I use to restart is:
service named restart

Everyone I have ever spoken to has no idea why this is happening. My guess is it has to be a bad zone, that somehow becomes fatal to named at certain times, maybe changes in the other server or something, who knows! I don't that's for sure.

Well, good luck;  and let me know if you get any ideas as well =)

- Glauron
0
 
dsimcoCommented:
Heya Glauron: I have started a new thread addressing this issue and have gotten somegood info. You should check it out.
http://www.experts-exchange.com/Networking/Linux_Networking/Q_21267519.html#12999162 

I believe we are on the right track.
0
 
GlauronAuthor Commented:
Yeah! lol - after answering that then, I found Jlevies profile thing & followed his answers, and came across that question! I thought, wow! That looks almost exactly like my question!

Then I noticed the date as today (actually yesterday from Oz =P )  & saw it was you!
=D

I tried Wesly's advice & the install went without any hiccups.

So now I play the waiting game =)

Well done, we might solve it after all!
0
 
GlauronAuthor Commented:
Anyone else following this post, please also view the question dsimco posted above, found at:

http://www.experts-exchange.com/Networking/Linux_Networking/Q_21267519.html

Has some very useful info pertaining to this issue.
Basically, it seems it is a problem with BIND, and needs to be updated, but also, the OS needs an update for security purposes.  ...

Glau
0
 
wesly_chenCommented:
Hi,

   Since RedHat discontinues the support on RedHat 7.3 but you can still download the latest patches from:
http://download.fedoralegacy.org/redhat/7.3/updates/i386/

   Besides, you can use apt-get to automate the update process:
As root:
wget http://ftp.freshrpms.net/pub/freshrpms/redhat/7.3/apt/apt-0.5.5cnc5-fr0.rh73.2.i386.rpm
rpm -ivh apt-0.5.5cnc5-fr0.rh73.2.i386.rpm
apt-get dist-upgrade

   By the way, upgrade kernel doesn't mean upgrade OS to RH 9 or Fedora. The latest kernel for RH7.3 is:
http://download.fedoralegacy.org/redhat/7.3/updates/i386/kernel-2.4.20-37.7.legacy.i686.rpm
Well, kernel upgrade need to be reboot to load that kernel.

Regards,

Wesly
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now