Solved

Named (BIND) 'spontaneously' dying

Posted on 2004-09-28
16
663 Views
Last Modified: 2012-06-21
Our named instance keeps dying, at seemingly random times.
We had this problem a while ago (maybe 2 months) so after some advice from use groups etc. I made some changes to zone configs.

That didn't work and it still died randomly, but then about maybe a day after those changes were made, it stayed alive.

Two months later, it's happening again.

THIS time tho, I can see something in messages that may be of help. That doesn't mean it wasn't there the first time, I'm very new to a LOT of things in linux, so it's VERY possible I overlooked them before. (The user groups were the ones that informed me of the EXISTANCE of messages for example!)

We have been getting around the problem by restarting named whenever it went down
(service named restart) - and I wrote a PERL script this time 'round to restart it every 30 mins through a cronjob. (it dies anywhere from every 10 mins to every couple of hours)

So -

We are running
- Redhat 7.3 (we can't upgrade redhat sorry)
- BIND 9.2.0

This is the part of messages that seems to show 'why' it's dying, but 'sif I can decipher it ;)

Sep 27 15:14:53 linux01 named[24776]: message.c:809: REQUIRE(*rdataset == ((void *)0)) failed
Sep 27 15:14:53 linux01 named[24776]: exiting (due to assertion failure)


This is what happens when we restart it (i think) after it dies

Sep 27 15:36:08 linux01 named: named shutdown failed
Sep 27 15:36:08 linux01 named[25206]: starting BIND 9.2.0 -u named
Sep 27 15:36:08 linux01 named[25206]: using 1 CPU
Sep 27 15:36:08 linux01 named[25209]: loading configuration from '/etc/named.conf'
Sep 27 15:36:08 linux01 named[25209]: no IPv6 interfaces found


I would appreciate ideas on how to permanently solve this problem, because it is a huge nuisance to us and can reduce productivity a lot.
It would also be of huge benefit if someone can explain what named/bind does in more detail (ie how it works) so I can have a better understanding of it and it's problems, and why it might be dying etc.  - that is, the more you can understand my understanding the better! May be worth extra points if I find it very useful.  :)

...
Apart from those lines from messages above, there were quite a few saying 'lame server' etc. but I've been assured they are harmless, at least in regards to our current problem.

Here is another excerpt. (I have replace the domains & stuff with ****)

Sep 27 14:59:27 linux01 named[24776]: zone *****1.com.au/IN: loading master file *****1.com.au: file not found
Sep 27 14:59:27 linux01 named[24776]: *****2.com.au:1: no TTL specified; using SOA MINTTL instead
Sep 27 14:59:27 linux01 named[24776]: zone *****2.com.au/IN: loaded serial 2001091501



Help please.  :|

Cheers,
Glauron
0
Comment
Question by:Glauron
  • 6
  • 3
  • 2
  • +2
16 Comments
 
LVL 40

Expert Comment

by:jlevie
ID: 12175679
"zone *****1.com.au/IN: loading master file *****1.com.au: file not found" and "*****2.com.au:1: no TTL specified; using SOA MINTTL instead" are certain clues that your named config files aren't right for a 9.2 copy of Bind. Without seeing all of the files involved I can't say whether "message.c:809: REQUIRE(*rdataset == ((void *)0)) failed" if a result of a configuration error or not.

You can post your namcd.conf file and all of the zone files here (unmodified) or send them as attachments to jlevie@experts-exchange.com and I'll look them over.
0
 
LVL 1

Author Comment

by:Glauron
ID: 12175756
emailed  =)
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12175958
Got it, reply on the way.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 5

Expert Comment

by:napoleon41
ID: 12183812
Hey!  I understand the need for security and everything, but some of us are trying to learn here.  LOL

Could you at least post the part that ends up being the problem?  It's nice to sit in on discussions that I'm not very knowledgable about yet and glean a bit.  ;-)
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12184040
I'll describe what the errors are.
0
 
LVL 1

Author Comment

by:Glauron
ID: 12185434
Yep. No problem with that. =)

I'll trust Jlevie with whatever he deems appropriate to display here. I know he won't go to far, if possible with this info.

Feel free to post whatever you think might help j.
0
 
LVL 1

Author Comment

by:Glauron
ID: 12750251
Yo - Jlevie.

=)

How'd u go with those zone files? - any luck?
It did it again, named dying - and has started working properly again.
0
 

Expert Comment

by:dsimco
ID: 12992512
I am having this same problem. What was your solution?
0
 
LVL 1

Author Comment

by:Glauron
ID: 12999059
Hey, glad to know I'm not the only one!  =)
But not glad to know that someone else is having this problem apart from that. =P

No solution yet,

       Jlevie, did you have any ideas?

--------

I got around the problem temporarily by setting a script to restart the named instance every 15 mins or so. Not at all perfect by any means, but when it worked, it saved restarting it manually.
Big pain tho.

Command I use to restart is:
service named restart

Everyone I have ever spoken to has no idea why this is happening. My guess is it has to be a bad zone, that somehow becomes fatal to named at certain times, maybe changes in the other server or something, who knows! I don't that's for sure.

Well, good luck;  and let me know if you get any ideas as well =)

- Glauron
0
 

Expert Comment

by:dsimco
ID: 12999180
Heya Glauron: I have started a new thread addressing this issue and have gotten somegood info. You should check it out.
http://www.experts-exchange.com/Networking/Linux_Networking/Q_21267519.html#12999162 

I believe we are on the right track.
0
 
LVL 1

Author Comment

by:Glauron
ID: 12999297
Yeah! lol - after answering that then, I found Jlevies profile thing & followed his answers, and came across that question! I thought, wow! That looks almost exactly like my question!

Then I noticed the date as today (actually yesterday from Oz =P )  & saw it was you!
=D

I tried Wesly's advice & the install went without any hiccups.

So now I play the waiting game =)

Well done, we might solve it after all!
0
 
LVL 1

Author Comment

by:Glauron
ID: 12999337
Anyone else following this post, please also view the question dsimco posted above, found at:

http://www.experts-exchange.com/Networking/Linux_Networking/Q_21267519.html

Has some very useful info pertaining to this issue.
Basically, it seems it is a problem with BIND, and needs to be updated, but also, the OS needs an update for security purposes.  ...

Glau
0
 
LVL 38

Accepted Solution

by:
wesly_chen earned 500 total points
ID: 12999368
Hi,

   Since RedHat discontinues the support on RedHat 7.3 but you can still download the latest patches from:
http://download.fedoralegacy.org/redhat/7.3/updates/i386/

   Besides, you can use apt-get to automate the update process:
As root:
wget http://ftp.freshrpms.net/pub/freshrpms/redhat/7.3/apt/apt-0.5.5cnc5-fr0.rh73.2.i386.rpm
rpm -ivh apt-0.5.5cnc5-fr0.rh73.2.i386.rpm
apt-get dist-upgrade

   By the way, upgrade kernel doesn't mean upgrade OS to RH 9 or Fedora. The latest kernel for RH7.3 is:
http://download.fedoralegacy.org/redhat/7.3/updates/i386/kernel-2.4.20-37.7.legacy.i686.rpm
Well, kernel upgrade need to be reboot to load that kernel.

Regards,

Wesly
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have seen several blogs and forum entries elsewhere state that because NTFS volumes do not support linux ownership or permissions, they cannot be used for anonymous ftp upload through the vsftpd program.   IT can be done and here's how to get i…
Note: for this to work properly you need to use a Cross-Over network cable. 1. Connect both servers S1 and S2 on the second network slots respectively. Note that you can use the 1st slots but usually these would be occupied by the Service Provide…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question