Link to home
Start Free TrialLog in
Avatar of djs120
djs120

asked on

FC4 box stops responding every now and then - how to debug?

I have an FC4 box that just stops responding every now and then, I have to reboot to get it back up and running.  When it stops responding everything halts, apache, I can't SSH, etc.  Consider me a linux newbie, is there any type of specific logging I can set up to find out why it is crashing, or even what logs to look at to see why it crashed (the most recent crash was last night at 6 pm).
Avatar of paullamhkg
paullamhkg

try to check the /var/log/messages or the log file of your apache
Avatar of djs120

ASKER

I checked the /var/log/messages and all it has are tons of messages like these right before the system halted:

...
Apr 22 18:25:06 webserver crond(pam_unix)[31436]: session closed for user xxxxxx
Apr 22 18:30:02 webserver crond(pam_unix)[31518]: session opened for user xxxxxx by (uid=0)
Apr 22 18:30:07 webserver crond(pam_unix)[31518]: session closed for user xxxxxx
Apr 22 18:35:01 webserver crond(pam_unix)[31599]: session opened for user xxxxxx by (uid=0)
...

and the apache logs don't show anything right before crashing.

Any other places I can check the logs?
Avatar of djs120

ASKER

Not sure if this makes a difference, but the messages you see in /var/log/messages every 5 minutes is an installation of CACTI that I have running, and every 5 minutes it polls my router for traffic stats and updates the database.
have you check the diskspace is not full try df -h  it will show you something like below

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda11            487M  191M  271M  42% /
/dev/hda1             145M  6.0M  131M   5% /boot
none                  752M     0  752M   0% /dev/shm
/dev/hda2              84G  7.6G   72G  10% /home
/dev/md0              221G   21G  189G  10% /mail
/dev/hda9             487M  8.1M  454M   2% /opt
/dev/hda10            487M   12M  450M   3% /tmp
/dev/hda5             9.7G  7.8G  1.4G  86% /usr
/dev/hda3             9.7G  867M  8.3G  10% /var
/dev/hdd1             111G   52G   53G  50% /bkup

what is the usage of /var?? if the system log full it will halt the system.
Avatar of djs120

ASKER

I'm pretty sure I'm not low on space:

[root@webserver ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00  15G  2.7G   12G  20% /
/dev/hda1              99M  9.8M   84M  11% /boot
/dev/shm              189M     0  189M   0% /dev/shm

Also, I used CACTI to look at the graphs that it has been generating for my FC4 box (it tracks memory usage, CPU utlization, etc) and I saw nothing abnormal right before the crash... memory usage was pretty much constant, and CPU usage was minimal.
I have seen this when there are too many processes running.  The system doesn't crash, it just can't spawn so nothing works.  I've seen it when sendmail had too many incoming emails.
"Also, I used CACTI to look at the graphs that it has been generating for my FC4 box (it tracks memory usage, CPU utlization, etc) and I saw nothing abnormal right before the crash... memory usage was pretty much constant, and CPU usage was minimal."

It's mean your system running good, I'm just guessing there may be some hardware problem which made your system halt, if prossiable try change your RAM stack and test, but it's only guess

If your system ever does that again, hook up a monitor to it and see what it displays on the screen or when you press Ctrl-Alt-F1, F2, F3, F4. If it's a kernel panic or a halt you won't have anything in message logs but you _might_ have some info in the console.

I'd second paullamhkg's opinion that this most likely is a hw issue.
ASKER CERTIFIED SOLUTION
Avatar of rindi
rindi
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of djs120

ASKER

Thanks everyone, I'll try the memtest86+ and see what it comes up with.