Link to home
Start Free TrialLog in
Avatar of addady
addady

asked on

Why the system hang ???

Hello,

I'm using red hat 6.0 on Sun Cobalt hardware raq4i.

Yesterday the system hang because of an clear reason
The only response was ping (no telnet http ftp ssh etc).
The only way out was turning in off and on !

There was no special activity in the time the hang occur !
(just daily cron job)
After it boot I tried to look in some of the log file to find any clue
why it happen.

/var/log
auth       dmesg    maillog        secure       xferlog
httpd/access httpd/error   cron   kernel   messages

In all those log file I couldn't find any clue.
You can see some normal events until the time of the hang (20:40)
and then the boot and after boot events (14:00)
Yes it took my hosting company 18 hours !  
to reboot it :-(

How can I know why it happen and prevent it ?
Avatar of westlin
westlin
Flag of United States of America image

Have any idea as to the power situation where to box is located?  How old is this hardware?  Any of it have any kind of goofy problems before this?

In cases like this, it's almost impossible to figure out what caused the crash exactly, after the fact. The only thing that might have helped is if you had seen the kernel dump on the screen.

Sun systems (even E6500's that I have worked on before) running solaris sometmies randomly core dump and reboot automatically from time to time. Sun often blames this on inproperly shielded computer cases and possibly memory failure. However, if you get it set up right Solaris can autmatically reboot after a crash.

LInux has the same function, but most people don't know about it. All you need to do;

echo "30" > /proc/sys/kernel/panic

This will make it where Linux will automatically reboot in 30 seconds after a kernel panic.

However, if the system just hangs, without panic. The best you can hope for is to run a gdb on /proc/kcore and see what happens right before the hang, or even set up remote gdb.. I think that used to be under kernel debugging, but it seems to ahve disappeared in 2.4.5 .. *scratch head*

Anyway, I hope some of this helps ask any more questions you might have and good luck!
Avatar of addady
addady

ASKER

> How old is this hardware?

I'm using dedicated server in ISP facility so I don't know.

>In cases like this, it's almost impossible to figure out what caused the crash exactly, after the fact.
>The only thing that might have helped is if you had seen the kernel dump on the screen.

It is kind of hang not crash because it was possible to answer  ping all this time !




As I said, if it happens again, the best thing you can do is run gdb on /proc/kcore and see if you can see what happens right before the hang.

If it doesn't happen again, then it was just a freakish accident. Some kind of power surge or IMF pulse wave. There is nothing you can do right now, except wait to see if it happens again, and if so, do the gdb thing.

The following is pasted out of the kernel configuration help file;

#### Start paste

  If you enabled support for /proc file system then the file
  /proc/kcore will contain the kernel core image. This can be used
  in gdb:

  $ cd /usr/src/linux ; gdb vmlinux /proc/kcore

  You have two choices here: ELF and A.OUT. Selecting ELF will make
  /proc/kcore appear in ELF core format as defined by the Executable
  and Linking Format specification. Selecting A.OUT will choose the
  old "a.out" format which may be necessary for some old versions
  of binutils or on some architectures.

  This is especially useful if you have compiled the kernel with the
  "-g" option to preserve debugging information. It is mainly used
  for examining kernel data structures on the live kernel so if you
  don't understand what this means or are not a kernel hacker, just
  leave it at its default value ELF.

### End paste

Good luck!
Avatar of addady

ASKER

Can't find gdb on my server.

Is /proc/kcore king of memory dump ?

# dir /proc/kcore -l
-r--------   1 root     root     134221824 Jun 14 00:55 /proc/kcore

Is the information the from the last crash (PANIC)?


No, kcore is the kernel core, it's what is currently in memory. In other words it's the core of your whole system, so yes, it is kinda like the King Core.. hehe.. What you posted there tells me that you have 128MB of RAM. You can not modify or put stops or anything like that on kcore, because it's running.

You should have gdb on your system. Try to do `locate gdb` or if that doesn work `find / -iname gdb` .. if that still doesn't work, then you can get an rpm from Redhat, or download it and compile it yourself. you can get the source from;

ftp://ftp.gnu.org/pub/gnu/gdb/gdb-5.0.tar.gz

I haven't compiled gdb for a long time, but it's pretty simple;

cd /usr/src
tar xzvf /path/to/download/gdb-5.0.tar.gz
cd gdb-5.0

Check out the README or INSTALL file for specific info but pretty much all you should have to do is;

./configure
make
make install

And you should have it. Good luck.
Avatar of addady

ASKER

>No, kcore is the kernel core, it's what is currently in memory

1) how can it be ram is much faster !  it can't be update in realtime.
2) The problem occer 2 days a go, why you think the current
memory content contain any clue of what happen before the last boot ?

You did not read my message, or you did not understand it. I said there is no way that you can know what caused the crash before, but you can watch and see what happens IF it happens again. You can do this with GDB.
Avatar of addady

ASKER

find / -iname gdb
find: /proc/6/fd: Permission denied

I guess cobalt remove gbd

Avatar of addady

ASKER

1) how can it be ram is much faster !  it can't be update in realtime.
Urr.. not exactly sure what you mean by this. However, if you are takling about a contract between RAM and DISK speed, it's because RAM has no moving parts.

However, I guess this is not what you are asking. Please be more specific.
ASKER CERTIFIED SOLUTION
Avatar of comotai
comotai
Flag of Hong Kong image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of addady

ASKER

>Does this make sense?

Yes


Thank you for youe effort

No problem! :) I hope things really work out for you and you dont' get any more system hangs!