Solved

Why the system hang ???

Posted on 2001-06-13
14
467 Views
Last Modified: 2013-12-06
Hello,

I'm using red hat 6.0 on Sun Cobalt hardware raq4i.

Yesterday the system hang because of an clear reason
The only response was ping (no telnet http ftp ssh etc).
The only way out was turning in off and on !

There was no special activity in the time the hang occur !
(just daily cron job)
After it boot I tried to look in some of the log file to find any clue
why it happen.

/var/log
auth       dmesg    maillog        secure       xferlog
httpd/access httpd/error   cron   kernel   messages

In all those log file I couldn't find any clue.
You can see some normal events until the time of the hang (20:40)
and then the boot and after boot events (14:00)
Yes it took my hosting company 18 hours !  
to reboot it :-(

How can I know why it happen and prevent it ?
0
Comment
Question by:addady
  • 7
  • 6
14 Comments
 
LVL 1

Expert Comment

by:westlin
Comment Utility
Have any idea as to the power situation where to box is located?  How old is this hardware?  Any of it have any kind of goofy problems before this?

0
 
LVL 3

Expert Comment

by:comotai
Comment Utility
In cases like this, it's almost impossible to figure out what caused the crash exactly, after the fact. The only thing that might have helped is if you had seen the kernel dump on the screen.

Sun systems (even E6500's that I have worked on before) running solaris sometmies randomly core dump and reboot automatically from time to time. Sun often blames this on inproperly shielded computer cases and possibly memory failure. However, if you get it set up right Solaris can autmatically reboot after a crash.

LInux has the same function, but most people don't know about it. All you need to do;

echo "30" > /proc/sys/kernel/panic

This will make it where Linux will automatically reboot in 30 seconds after a kernel panic.

However, if the system just hangs, without panic. The best you can hope for is to run a gdb on /proc/kcore and see what happens right before the hang, or even set up remote gdb.. I think that used to be under kernel debugging, but it seems to ahve disappeared in 2.4.5 .. *scratch head*

Anyway, I hope some of this helps ask any more questions you might have and good luck!
0
 

Author Comment

by:addady
Comment Utility
> How old is this hardware?

I'm using dedicated server in ISP facility so I don't know.

>In cases like this, it's almost impossible to figure out what caused the crash exactly, after the fact.
>The only thing that might have helped is if you had seen the kernel dump on the screen.

It is kind of hang not crash because it was possible to answer  ping all this time !




0
 
LVL 3

Expert Comment

by:comotai
Comment Utility
As I said, if it happens again, the best thing you can do is run gdb on /proc/kcore and see if you can see what happens right before the hang.

If it doesn't happen again, then it was just a freakish accident. Some kind of power surge or IMF pulse wave. There is nothing you can do right now, except wait to see if it happens again, and if so, do the gdb thing.

The following is pasted out of the kernel configuration help file;

#### Start paste

  If you enabled support for /proc file system then the file
  /proc/kcore will contain the kernel core image. This can be used
  in gdb:

  $ cd /usr/src/linux ; gdb vmlinux /proc/kcore

  You have two choices here: ELF and A.OUT. Selecting ELF will make
  /proc/kcore appear in ELF core format as defined by the Executable
  and Linking Format specification. Selecting A.OUT will choose the
  old "a.out" format which may be necessary for some old versions
  of binutils or on some architectures.

  This is especially useful if you have compiled the kernel with the
  "-g" option to preserve debugging information. It is mainly used
  for examining kernel data structures on the live kernel so if you
  don't understand what this means or are not a kernel hacker, just
  leave it at its default value ELF.

### End paste

Good luck!
0
 

Author Comment

by:addady
Comment Utility
Can't find gdb on my server.

Is /proc/kcore king of memory dump ?

# dir /proc/kcore -l
-r--------   1 root     root     134221824 Jun 14 00:55 /proc/kcore

Is the information the from the last crash (PANIC)?


0
 
LVL 3

Expert Comment

by:comotai
Comment Utility
No, kcore is the kernel core, it's what is currently in memory. In other words it's the core of your whole system, so yes, it is kinda like the King Core.. hehe.. What you posted there tells me that you have 128MB of RAM. You can not modify or put stops or anything like that on kcore, because it's running.

You should have gdb on your system. Try to do `locate gdb` or if that doesn work `find / -iname gdb` .. if that still doesn't work, then you can get an rpm from Redhat, or download it and compile it yourself. you can get the source from;

ftp://ftp.gnu.org/pub/gnu/gdb/gdb-5.0.tar.gz

I haven't compiled gdb for a long time, but it's pretty simple;

cd /usr/src
tar xzvf /path/to/download/gdb-5.0.tar.gz
cd gdb-5.0

Check out the README or INSTALL file for specific info but pretty much all you should have to do is;

./configure
make
make install

And you should have it. Good luck.
0
 

Author Comment

by:addady
Comment Utility
>No, kcore is the kernel core, it's what is currently in memory

1) how can it be ram is much faster !  it can't be update in realtime.
2) The problem occer 2 days a go, why you think the current
memory content contain any clue of what happen before the last boot ?

0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 3

Expert Comment

by:comotai
Comment Utility
You did not read my message, or you did not understand it. I said there is no way that you can know what caused the crash before, but you can watch and see what happens IF it happens again. You can do this with GDB.
0
 

Author Comment

by:addady
Comment Utility
find / -iname gdb
find: /proc/6/fd: Permission denied

I guess cobalt remove gbd

0
 

Author Comment

by:addady
Comment Utility
1) how can it be ram is much faster !  it can't be update in realtime.
0
 
LVL 3

Expert Comment

by:comotai
Comment Utility
Urr.. not exactly sure what you mean by this. However, if you are takling about a contract between RAM and DISK speed, it's because RAM has no moving parts.

However, I guess this is not what you are asking. Please be more specific.
0
 
LVL 3

Accepted Solution

by:
comotai earned 100 total points
Comment Utility
contract = constrast

Oh wait.. I think I understand your question now.

/proc is not a REAL file system on the disk. It is a special file sysetm which resides in memory and this files point to certian things in the kernel. /proc/kcore is only a virtual file that points to your memory.

Does this make sense?

So you are actually looking at what is in memory in real time.
0
 

Author Comment

by:addady
Comment Utility
>Does this make sense?

Yes


Thank you for youe effort

0
 
LVL 3

Expert Comment

by:comotai
Comment Utility
No problem! :) I hope things really work out for you and you dont' get any more system hangs!
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now