?
Solved

ReiserFS causing kernel panic

Posted on 2005-03-31
13
Medium Priority
?
535 Views
Last Modified: 2008-01-09
About a week ago, I had my 40gig "die" on me... it was running reiserfs.  The symptoms were: upon access of certain files, the system would dump error messages, or give i/o errors, or kernel panic.  I managed to get a lot of the stuff I wanted off of it through painful copy stuff until it kernel panics, reboot the box, and continue, avoiding the file that set it off.

Now, my 120gig (where I backed everything up to) is now exhibiting the same symptoms.  ReiserFSCKing will give the odd errors, regardless of the number of times it's run (is reiserfs too dumb to mark bad sectors, and possibly moving stuff onto/rebuilding indexes on bad sectors?).  A --rebuild-tree kernel paniced on me, but on second-run, it came out fine.  Following the maxtor RMA procedure, I downloaded their diagnostics program, and it says the drive is fine.

Anyone got any idea what is going on?  is it a bad version of reiserfs?  is maxor lying?  The kernel panic mentions it was during a swap...  how can I fsck swap?
There are 4 drives in this machine, this is the second one to do this.  (neither of them have been the root or swap disks)
Linux osaka 2.6.10-gentoo-r6-OSAKA #10 Fri Feb 4 22:04:26 EST 2005 i686 AMD Athlon(tm) Processor AuthenticAMD GNU/Linux


Here's the kernel panic... it's during a reiserfsck:
Mar 30 22:12:47 osaka printing eip:
Mar 30 22:12:47 osaka c0151d0e
Mar 30 22:12:47 osaka *pde = 00000000
Mar 30 22:12:47 osaka Oops: 0000 [#1]
Mar 30 22:12:47 osaka PREEMPT
Mar 30 22:12:47 osaka Modules linked in:
Mar 30 22:12:47 osaka CPU:    0
Mar 30 22:12:47 osaka EIP:    0060:[<c0151d0e>]    Not tainted VLI
Mar 30 22:12:47 osaka EFLAGS: 00010202   (2.6.10-gentoo-r6-OSAKA)
Mar 30 22:12:47 osaka EIP is at sync_buffer+0xe/0x50
Mar 30 22:12:47 osaka eax: 41129601   ebx: d7e55d1c   ecx: c0011354   edx: 00000002
Mar 30 22:12:47 osaka esi: d7e55d24   edi: c16ffdc8   ebp: 00000000   esp: d7e55ccc
Mar 30 22:12:47 osaka ds: 007b   es: 007b   ss: 0068
Mar 30 22:12:47 osaka Process reiserfsck (pid: 23072, threadinfo=d7e54000 task=dd2dc020)
Mar 30 22:12:47 osaka Stack: c16ffdc8 00000000 c04f0892 c0011354 c0151d00 d7e55d30 dd2dc020 d7e55d18
Mar 30 22:12:47 osaka c0151d00 c04f0931 00000002 c16ffdc8 c0011354 00000002 00000000 dd2dc020
Mar 30 22:12:47 osaka c0129c10 d7e55d30 d7e55d30 00000018 c0011354 00000002 00000001 dd2dc020
Mar 30 22:12:47 osaka Call Trace:
Mar 30 22:12:47 osaka [<c04f0892>] __wait_on_bit_lock+0x52/0x60
Mar 30 22:12:47 osaka [<c0151d00>] sync_buffer+0x0/0x50
Mar 30 22:12:47 osaka [<c0151d00>] sync_buffer+0x0/0x50
Mar 30 22:12:47 osaka [<c04f0931>] out_of_line_wait_on_bit_lock+0x91/0xa0
Mar 30 22:12:47 osaka [<c0129c10>] wake_bit_function+0x0/0x60
Mar 30 22:12:47 osaka [<c0129c10>] wake_bit_function+0x0/0x60
Mar 30 22:12:47 osaka [<c0151d83>] __lock_buffer+0x33/0x40
Mar 30 22:12:47 osaka [<c01538a7>] block_invalidatepage+0xb7/0xe0
Mar 30 22:12:47 osaka [<c0132ee7>] find_get_pages+0x37/0x70
Mar 30 22:12:47 osaka [<c013d0a7>] do_invalidatepage+0x27/0x30
Mar 30 22:12:47 osaka [<c013d12b>] truncate_complete_page+0x7b/0x80
Mar 30 22:12:47 osaka [<c013d310>] truncate_inode_pages+0xf0/0x2c0
Mar 30 22:12:47 osaka [<c01581bc>] kill_bdev+0x3c/0x50
Mar 30 22:12:47 osaka [<c01592ab>] blkdev_put+0x13b/0x140
Mar 30 22:12:47 osaka [<c0151ab3>] __fput+0x163/0x180
Mar 30 22:12:47 osaka [<c01500f9>] filp_close+0x59/0x90
Mar 30 22:12:47 osaka [<c011754c>] put_files_struct+0x5c/0xd0
Mar 30 22:12:47 osaka [<c01182f2>] do_exit+0x1a2/0x460
Mar 30 22:12:47 osaka [<c0118625>] do_group_exit+0x35/0xb0
Mar 30 22:12:47 osaka [<c0121521>] get_signal_to_deliver+0x1f1/0x2f0
Mar 30 22:12:47 osaka [<c0102524>] do_signal+0x94/0x120
Mar 30 22:12:47 osaka [<c0119d4f>] do_setitimer+0x1af/0x1e0
Mar 30 22:12:47 osaka [<c011e96f>] sys_alarm+0x3f/0x70
Mar 30 22:12:47 osaka [<c010dc70>] do_page_fault+0x0/0x5c7
Mar 30 22:12:47 osaka [<c01025e5>] do_notify_resume+0x35/0x38
Mar 30 22:12:47 osaka [<c0102766>] work_notifysig+0x13/0x15
Mar 30 22:12:47 osaka Code: cd b8 00 e0 ff ff 21 e0 ff 48 14 8b 40 08 a8 08 75 04 31 c0 eb dc e8 72 e4 39 00 eb f5 83 ec 08 8b 44 24 0c 8b 40 20 85 c0 74 1b <
8b> 40 04 8b 80 94 00 00 00 85 c0 74 0e 8b 40 38 85 c0 74 07 8b
Mar 30 22:12:47 osaka su(pam_unix)[22092]: session closed for user root
Mar 30 22:12:48 osaka init: PANIC: segmentation violation at 0xffffe420! sleeping for 30 seconds.
Mar 30 22:12:48 osaka <1>Unable to handle kernel paging request at virtual address ffffe02c
Mar 30 22:12:48 osaka printing eip:
Mar 30 22:12:48 osaka c017d0ad
Mar 30 22:12:48 osaka *pde = 00001067
Mar 30 22:12:48 osaka *pte = 0f78f062
Mar 30 22:12:48 osaka Oops: 0000 [#2]
Mar 30 22:12:48 osaka PREEMPT
Mar 30 22:12:48 osaka Modules linked in:
Mar 30 22:12:48 osaka CPU:    0
Mar 30 22:12:48 osaka EIP:    0060:[<c017d0ad>]    Not tainted VLI
Mar 30 22:12:48 osaka EFLAGS: 00010282   (2.6.10-gentoo-r6-OSAKA)
Mar 30 22:12:48 osaka EIP is at elf_core_dump+0x29d/0xc7a
Mar 30 22:12:48 osaka eax: f14c9ba8   ebx: 00000000   ecx: dadcc420   edx: 0000007b
Mar 30 22:12:48 osaka esi: f2492000   edi: f2493fc4   ebp: f2492000   esp: f2493d6c
Mar 30 22:12:48 osaka ds: 007b   es: 007b   ss: 0068
Mar 30 22:12:48 osaka Process init (pid: 23117, threadinfo=f2492000 task=d99c4500)
Mar 30 22:12:48 osaka Stack: f14c9b60 d99c4500 0000000b 00000048 f76b8354 00000006 00000000 f7dc06b4
Mar 30 22:12:48 osaka f5c8a0d8 00000000 00000002 c02780a4 00000048 f619d400 f26f7860 f26f7d60
Mar 30 22:12:48 osaka f26f7760 f14c9b60 dadcc420 00000000 f619d400 f26f7860 f26f7760 f14c9b60
Mar 30 22:12:48 osaka Call Trace:
Mar 30 22:12:48 osaka [<c02780a4>] inotify_dentry_parent_queue_event+0x64/0xa0
Mar 30 22:12:48 osaka [<c015cc01>] do_coredump+0x1b1/0x1ce
Mar 30 22:12:48 osaka [<c011ef0f>] free_uid+0x1f/0x80
Mar 30 22:12:48 osaka [<c011f7f5>] __dequeue_signal+0xe5/0x1a0
Mar 30 22:12:48 osaka [<c011f8e5>] dequeue_signal+0x35/0x90
Mar 30 22:12:48 osaka [<c012153a>] get_signal_to_deliver+0x20a/0x2f0
Mar 30 22:12:48 osaka [<c0102524>] do_signal+0x94/0x120
Mar 30 22:12:48 osaka [<c0122578>] sys_rt_sigaction+0xa8/0xc0
Mar 30 22:12:48 osaka [<c010dc70>] do_page_fault+0x0/0x5c7
Mar 30 22:12:48 osaka [<c01025e5>] do_notify_resume+0x35/0x38
Mar 30 22:12:48 osaka [<c0102766>] work_notifysig+0x13/0x15
Mar 30 22:12:48 osaka Code: 8c 68 28 8b 57 24 89 50 2c 8b 57 28 89 50 30 8b 57 2c 89 50 34 8b 57 30 89 50 38 8b 57 34 89 50 3c 8b 57 38 89 50 40 8b 4c 24 48 <
0f> b7 3d 2c e0 ff ff 8b 06 8b 40 70 8b 50 28 89 0c 24 c7 44 24
Mar 30 22:12:55 osaka <4>ReiserFS: warning: is_tree_node: node level 256 does not match to the expected one 2
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-5150: search_by_key: invalid format found in block 12898932. Fsck?
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [6 37 0x0 SD]
Mar 30 22:12:55 osaka ReiserFS: warning: is_tree_node: node level 59099 does not match to the expected one 1
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-5150: search_by_key: invalid format found in block 12714251. Fsck?
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [6 84 0x0 SD]
Mar 30 22:12:55 osaka ReiserFS: warning: is_tree_node: node level 256 does not match to the expected one 1
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-5150: search_by_key: invalid format found in block 13081635. Fsck?
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [6 39 0x0 SD]
Mar 30 22:12:55 osaka ReiserFS: warning: is_tree_node: node level 256 does not match to the expected one 2
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-5150: search_by_key: invalid format found in block 12898932. Fsck?
Mar 30 22:12:55 osaka ReiserFS: hdg1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [6 28 0x0 SD]
0
Comment
Question by:Chireru
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 38

Accepted Solution

by:
wesly_chen earned 1500 total points
ID: 13674731
Hi,

   For the different hard disk have the similar error, I would suspect the hardware issue first.

1. Upgrade the motherboard BIOS.

2. Change the hard disk cable, disk power plug. Or replace/swap the power supply.
   Sometimes, dying/weak power supply causes the problem

3. Upgarde the kernel. Latest kernel will have better support for hardware driver and bug fix.

Regards,

Wesly
0
 
LVL 5

Author Comment

by:Chireru
ID: 13677527
Thanks for the reply.

I also thought that it was a hardware problem...  I hate hardware problems :(

1. Possible, but a little extreme to start with.

2. Hard disk cable is a possibility, but both harddrives were on seperate cables.. I don't have any spares around right now, so I'll hold off on that one.  I've had bad luck with power supplies before.. even buying very good ones lands me with sneaky problems like this.  I've put the drive in a different machine, and, so far so good (it's about 1/4 way through a copy, with no errors yet).  

3. I am one minor revision out, kernel.org's changelog doesn't indicate any changes that would affect this, so I doubt this is the case.

If the copy is successful, this narrows it down to PSU, bus, controller, bad reiserfs driver/software, or kernel problem.
The PSU is a 420W Enlight, and had grown to be my fileserver (4 HD's, 2 CDroms, video, 2 NICs, floppy).. but now that I step back, that is quite a bit for a 420W PSU.  For now I've unplugged the CDroms, floppy and one NIC, that should ease the load a bit.  We'll see how it acts after my copying is done.
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 13677572
> (4 HD's, 2 CDroms, video, 2 NICs, floppy)....
Wooooo! You probably need 500W PSU.
CPU, Video (newer one with fan on card), hard disk, CD-ROM are electric current consumers.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Author Comment

by:Chireru
ID: 13678031
The copy completed successfully, and fsck shows the drive clean in the other computer, so all is well with the disk physically, and it's filesystem.

That's a bit of a relief.

However, when I plug it back into the offending system, I'm getting these errors (for starters... I havn't tried to do a lot with it yet)
Mar 31 22:42:33 osaka attempt to access beyond end of device
Mar 31 22:42:33 osaka hdh1: rw=0, want=14140282800, limit=78155279
Mar 31 22:42:33 osaka attempt to access beyond end of device
Mar 31 22:42:33 osaka hdh1: rw=0, want=14140282800, limit=78155279
Mar 31 22:42:33 osaka attempt to access beyond end of device
Mar 31 22:42:33 osaka hdh1: rw=0, want=14140282800, limit=78155279

BIOS showed the power levels decent (3.3v @ 3.5, 5v @ 5.05, 12v @ 12.18 and steady).  I'm gonna try some more things (investigate these new errors, pull ide cables from the other box, upgrade the kernel) tomorrow when I get a chance.
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 13678466
> Mar 30 22:12:48 osaka <1>Unable to handle kernel paging request at virtual address ffffe02c
Run the memory diagnostics, too.
http://www.memtest86.com/
Or download the Ultimate Boot CD, which has a lot of diagnostics tools.
0
 
LVL 5

Author Comment

by:Chireru
ID: 13680576
I saw that earlier too... I had thought that it was the physical harddrive doing it, so I unmounted swap, and still got errors.  I will try memtest.. i have it on cd around here... somewhere...
0
 
LVL 5

Author Comment

by:Chireru
ID: 13686265
After a bit of checking, it looks like my onboard ATA100 controller is the culprit.  Makes sense... both of the problematic drives were on it.

Need to do more testing, but otherwise, I'm gonna have to head out tomorrow and buy myself an IDE controller.
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 13689930
> it looks like my onboard ATA100 controller is the culprit.
Have you upgrade the BIOS of motherboard?
In some cases, it's the motherboard bug, which could be fixed by upgrading the BIOS.
0
 
LVL 5

Author Comment

by:Chireru
ID: 13690147
After confirming that the harddrive works fine in another computer, and on another controller, when I put it back on it's original controller with it's original cable, it's no longer giving me any problems.

Any suggestions on how I can test the bus & controller for errors without just moving stuff on & off of my harddrive waiting for it to corrupt the filesystem (and possibly lose more data)?
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 13690756
> I put it back on it's original controller with it's original cable, it's no longer giving me any problems.
Did you plug back all the devices on power supply?

As my experience, BIOS and power supply are most likely the cause.
0
 
LVL 5

Author Comment

by:Chireru
ID: 13690844
OK, after doing some testing, it looks like it's the controller.  (attempted to copy stuff off the drive.. fail means there was IO errors)

Test1: Original configuration = failed
Test2: Removed as much power draw as I could = failed
Test3: Moved harddrive to another controller and IDE cable = passed
Test4: Used new IDE cable as in test3 on original controller = failed
Test5: Moved harddrive back to controller on test3, and added as much power drain as I could find, including all harddrives on a single cable = passed

So, at this point it could be the controller, or maybe the driver for the controller... either way, I'm happy to go out and buy a new controller to solve this and not lose any more data.

Thanks for your help Wesly, I'll keep you updated as to whether the new controller fixes it or not.

0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 13690872
Good luck and cross my fingers for you.
0
 
LVL 5

Author Comment

by:Chireru
ID: 13738223
Looks like that solved it.
Thanks
0

Featured Post

Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Suggested Courses
Course of the Month13 days, 13 hours left to enroll

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question