Need some one to "decode" kernel crash.

Hi,

Every few days server started to crash. I could not locate any info in log files. daily server load is very low. Only today, after recent "crash"  in /var/log/messages I was able to find some info (see attached code snippet). I would like expert-exchange experts to look tinto this info and see if there is any way to determine the "culprit" of my troubles, or at least where to start looking.

Thank You in advance.

P.S. Sorry for my english


Apr 12 15:39:53 ger1 kernel: list_del corruption. next->prev should be c23ebfb8, but was 00200200
Apr 12 15:39:53 ger1 kernel: ------------[ cut here ]------------
Apr 12 15:39:53 ger1 kernel: kernel BUG at lib/list_debug.c:70!
Apr 12 15:39:53 ger1 kernel: invalid opcode: 0000 [#1]
Apr 12 15:39:53 ger1 kernel: SMP
Apr 12 15:39:53 ger1 kernel: last sysfs file: /block/ram0/range
Apr 12 15:39:53 ger1 kernel: Modules linked in: ipt_owner ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables ipv6 xfrm_na$
Apr 12 15:39:53 ger1 kernel: CPU:    0
Apr 12 15:39:53 ger1 kernel: EIP:    0060:[<c04ea6e4>]    Not tainted VLI
Apr 12 15:39:53 ger1 kernel: EFLAGS: 00010046   (2.6.18-128.7.1.el5PAE #1)
Apr 12 15:39:53 ger1 kernel: EIP is at list_del+0x38/0x5c
Apr 12 15:39:53 ger1 kernel: eax: 00000048   ebx: c23ebfb8   ecx: 00000092   edx: 00000000
Apr 12 15:39:53 ger1 kernel: esi: 00000256   edi: c0684780   ebp: c23ebfa0   esp: daa84e44
Apr 12 15:39:53 ger1 kernel: ds: 007b   es: 007b   ss: 0068
Apr 12 15:39:53 ger1 kernel: Process php5 (pid: 31552, ti=daa84000 task=c3531550 task.ti=daa84000)
Apr 12 15:39:53 ger1 kernel: Stack: c063e008 c23ebfb8 00200200 c0684800 c0458d5f fffb4000 00000003 00000000
Apr 12 15:39:53 ger1 kernel:        000280d2 c0685a28 00000000 00000001 00000000 00000001 00000000 c0685a28
Apr 12 15:39:53 ger1 kernel:        000280d2 c0685a28 c3531550 c0458fa7 00000044 00000000 000280d2 00000010
Apr 12 15:39:53 ger1 kernel: Call Trace:
Apr 12 15:39:53 ger1 kernel:  [<c0458d5f>] get_page_from_freelist+0x142/0x333
Apr 12 15:39:53 ger1 kernel:  [<c0458fa7>] __alloc_pages+0x57/0x297
Apr 12 15:39:53 ger1 kernel:  [<c046643c>] anon_vma_prepare+0x11/0xa5
Apr 12 15:39:53 ger1 kernel:  [<c046111d>] __handle_mm_fault+0x4f6/0xb7b
Apr 12 15:39:53 ger1 kernel:  [<c061083b>] do_page_fault+0x2d2/0x600
Apr 12 15:39:53 ger1 kernel:  [<c0610569>] do_page_fault+0x0/0x600
Apr 12 15:39:53 ger1 kernel:  [<c0405a89>] error_code+0x39/0x40
Apr 12 15:39:53 ger1 kernel:  =======================
Apr 12 15:39:53 ger1 kernel: Code: 53 68 ba df 63 c0 e8 c1 a7 f3 ff 0f 0b 41 00 f7 df 63 c0 83 c4 0c 8b 03 8b 40 04 39 d8 74 $
Apr 12 15:39:53 ger1 kernel: EIP: [<c04ea6e4>] list_del+0x38/0x5c SS:ESP 0068:daa84e44
Apr 12 15:39:53 ger1 kernel:  <0>Kernel panic - not syncing: Fatal exception
Apr 12 15:39:53 ger1 kernel:  BUG: warning at arch/i386/kernel/smp.c:550/smp_call_function() (Not tainted)
Apr 12 15:39:53 ger1 kernel:  [<c0415ae0>] stop_this_cpu+0x0/0x33
Apr 12 15:39:53 ger1 kernel:  [<c04158cf>] smp_call_function+0x57/0xc3
Apr 12 15:39:53 ger1 kernel:  [<c0424e9d>] printk+0x18/0x8e
Apr 12 15:39:53 ger1 kernel:  [<c041594e>] smp_send_stop+0x13/0x1c
Apr 12 15:39:53 ger1 kernel:  [<c0424437>] panic+0x4c/0x16d
Apr 12 15:39:53 ger1 kernel:  [<c04064eb>] die+0x25d/0x291
Apr 12 15:39:53 ger1 kernel:  [<c0406b85>] do_invalid_op+0x0/0x9d
Apr 12 15:39:53 ger1 kernel:  [<c0406c16>] do_invalid_op+0x91/0x9d
Apr 12 15:39:53 ger1 kernel:  [<c04ea6e4>] list_del+0x38/0x5c
Apr 12 15:39:53 ger1 kernel:  [<c04248b2>] release_console_sem+0x1b0/0x1b8
Apr 12 15:39:53 ger1 kernel:  [<c045a51b>] blockable_page_cache_readahead+0x46/0x99
Apr 12 15:39:53 ger1 kernel:  [<c0405a89>] error_code+0x39/0x40
Apr 12 15:39:53 ger1 kernel:  [<c04ea6e4>] list_del+0x38/0x5c
Apr 12 15:39:53 ger1 kernel:  [<c0458d5f>] get_page_from_freelist+0x142/0x333
Apr 12 15:39:53 ger1 kernel:  [<c0458fa7>] __alloc_pages+0x57/0x297
Apr 12 15:39:53 ger1 kernel:  [<c046643c>] anon_vma_prepare+0x11/0xa5
Apr 12 15:39:53 ger1 kernel:  [<c046111d>] __handle_mm_fault+0x4f6/0xb7b
Apr 12 15:39:53 ger1 kernel:  [<c061083b>] do_page_fault+0x2d2/0x600
Apr 12 15:39:53 ger1 kernel:  [<c0610569>] do_page_fault+0x0/0x600
Apr 12 15:39:53 ger1 kernel:  [<c0405a89>] error_code+0x39/0x40

Open in new window

zlygisAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

bman21Commented:
Try updating your system.

For Red Hat or Fedora Linux, run  "yum update".  This will update any installed packages on your system.

If that doesn't work, trying upgrading your kernel.  Below is a link to a FAQ sheet that will give you detailed instructions on how to do that.

http://fedoraproject.org/wiki/YumUpgradeFaq

0
JordanH155Commented:
If that does not work, try testing the RAM.  Bad RAM can cause random kernel panics.
0
magicdigitsCommented:
If you have no luck with yum update, try running "yum clean all" first, then yum update.
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

zlygisAuthor Commented:
Well, "yum update" showed that no packages needs to be updated. As to kernel update, running "yum update kernel", I get this strange error "Package(s) kernel available, but not installed."

In the /boot/grub/menu.lst I can see the new kernel, but when I try to switch to it, server wont boot.

The /boot/grub/menu.lst file contents:

timeout 5
default 1

title CentOS (2.6.18-164.15.1.el5PAE)
root (hd0,1)
kernel /vmlinuz-2.6.18-164.15.1.el5PAE ro root=/dev/sda3 vga=0x317
initrd /initrd-2.6.18-164.15.1.el5PAE.img

title CentOS Linux (2.6.18-128.7.1.el5PAE)
root (hd0,1)
kernel /boot/vmlinuz-2.6.18-128.7.1.el5PAE ro root=/dev/sda3 vga=0x317
initrd /boot/initrd-2.6.18-128.7.1.el5PAE.img

0
zlygisAuthor Commented:
OK. Ive managed to update kernel. Now I will wait and see if this was the buggy kernel.
0
zlygisAuthor Commented:
OK, this is not kernels fault. Whats the best way to check RAM? Please note, that I have only remote access to the server.
0
bman21Commented:
memtest86+ is probably your best bet.  You will have to reboot your system to run it though.  Expect an extended downtime too as you will boot to memtest instead of your OS.  :

http://kbase.redhat.com/faq/docs/DOC-16424

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bman21Commented:
sorry, didn't finish my post, the link that I added shows you how to install memtest as a bootable option if you don't have the linux rescue disk or if its not already installed.  I'll copy and paste the  link again in this post for future reference.

http://kbase.redhat.com/faq/docs/DOC-16424
0
JordanH155Commented:
The best way to test the RAM is to boot from a bootable utility such as memtest86 noted above.  You could also run any manufacturer's memory diagnostic for whatever model server you have.  For example, Dell has their program "mpmemory" and "Dell Diagnostics" for their servers.  Anything that is run in the OS will not be as effective as it cannot check the portion of RAM being used by the OS.

If you do not have physical access, you can get whoever is hosting the server to do it.  Most data centers are familiar with this, and do it often (I work in a data center).
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.