FC4 box stops responding every now and then - how to debug?

I have an FC4 box that just stops responding every now and then, I have to reboot to get it back up and running.  When it stops responding everything halts, apache, I can't SSH, etc.  Consider me a linux newbie, is there any type of specific logging I can set up to find out why it is crashing, or even what logs to look at to see why it crashed (the most recent crash was last night at 6 pm).
LVL 1
djs120Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

paullamhkgCommented:
try to check the /var/log/messages or the log file of your apache
0
djs120Author Commented:
I checked the /var/log/messages and all it has are tons of messages like these right before the system halted:

...
Apr 22 18:25:06 webserver crond(pam_unix)[31436]: session closed for user xxxxxx
Apr 22 18:30:02 webserver crond(pam_unix)[31518]: session opened for user xxxxxx by (uid=0)
Apr 22 18:30:07 webserver crond(pam_unix)[31518]: session closed for user xxxxxx
Apr 22 18:35:01 webserver crond(pam_unix)[31599]: session opened for user xxxxxx by (uid=0)
...

and the apache logs don't show anything right before crashing.

Any other places I can check the logs?
0
djs120Author Commented:
Not sure if this makes a difference, but the messages you see in /var/log/messages every 5 minutes is an installation of CACTI that I have running, and every 5 minutes it polls my router for traffic stats and updates the database.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

paullamhkgCommented:
have you check the diskspace is not full try df -h  it will show you something like below

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda11            487M  191M  271M  42% /
/dev/hda1             145M  6.0M  131M   5% /boot
none                  752M     0  752M   0% /dev/shm
/dev/hda2              84G  7.6G   72G  10% /home
/dev/md0              221G   21G  189G  10% /mail
/dev/hda9             487M  8.1M  454M   2% /opt
/dev/hda10            487M   12M  450M   3% /tmp
/dev/hda5             9.7G  7.8G  1.4G  86% /usr
/dev/hda3             9.7G  867M  8.3G  10% /var
/dev/hdd1             111G   52G   53G  50% /bkup

what is the usage of /var?? if the system log full it will halt the system.
0
djs120Author Commented:
I'm pretty sure I'm not low on space:

[root@webserver ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00  15G  2.7G   12G  20% /
/dev/hda1              99M  9.8M   84M  11% /boot
/dev/shm              189M     0  189M   0% /dev/shm

Also, I used CACTI to look at the graphs that it has been generating for my FC4 box (it tracks memory usage, CPU utlization, etc) and I saw nothing abnormal right before the crash... memory usage was pretty much constant, and CPU usage was minimal.
0
GreydudeCommented:
I have seen this when there are too many processes running.  The system doesn't crash, it just can't spawn so nothing works.  I've seen it when sendmail had too many incoming emails.
0
paullamhkgCommented:
"Also, I used CACTI to look at the graphs that it has been generating for my FC4 box (it tracks memory usage, CPU utlization, etc) and I saw nothing abnormal right before the crash... memory usage was pretty much constant, and CPU usage was minimal."

It's mean your system running good, I'm just guessing there may be some hardware problem which made your system halt, if prossiable try change your RAM stack and test, but it's only guess

0
m1tk4Commented:
If your system ever does that again, hook up a monitor to it and see what it displays on the screen or when you press Ctrl-Alt-F1, F2, F3, F4. If it's a kernel panic or a halt you won't have anything in message logs but you _might_ have some info in the console.

I'd second paullamhkg's opinion that this most likely is a hw issue.
0
rindiCommented:
Get a copy of the UBCD and use memtest86+ on it to test your RAM. If linux crashes like that it usually is a hardware problem. There are also many other testing tools on that CD you can use.

http://ultimatebootcd.com
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
djs120Author Commented:
Thanks everyone, I'll try the memtest86+ and see what it comes up with.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.