RH EL4 RAID BOOT ISSUE

System has two drives (250GB) set as RAID-0 mirror.

Partitions are formed as:
/dev/hda1 Boot Linux
/dev/hda2 Linux swap
/dev/hda3 Linux raid autodetect

/dev/hdb1 Linux swap
/dev/hdb2 Linux raid autodetect

This is the console screen at boot time:
Decompressing Linux...done.
Booting the kernel.
Red Hat nash version 4.2.1.6 starting
EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
mount: error 2 mounting none
EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
WARNING: can't access (null)
exec of init ((null)) failed!!!: 14
EXT3-fs error (device md0): ext3_find_entry: reading directory #2 offset 0
unmount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

Here the cursor just blinks indefinately and the Caps and Scroll Lock lights on the keyboard flash.

I need serious help... Thanks in advance...
-greg
LVL 12
Gregory MillerGeneral ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Artysystem administratorCommented:
Was everything working and just recently has been broken or you work with installation of Linux?
It might be helpful of you copy-paste entire boot screen here.
That can be easy done with a serial console connectio to your server. Just connect to COM1 with terminal cable, and add in a GRUB kernel flags 'console=tty0 console=ttyS0,38400n8' (edit before boot). Full screen dump might be more informative.
What about your problem RAID-0 is NOT a mirror, it's a striped device that, once corrupted, cant be restored. If you use 'md0' I guess it's a software RAID, because 'md0' is a driver.
If problem is 'just happened' and everithing was OK before, I can guess that you either have dead mount labels on your devices (EL4 uses labels instead of device names for finding appropriate partitions), or you have corrupted 'md' superbloks at the end of each partition from RAID, or you have corrupted filesystem in a working RAID0, or you have changed 'partition type' on device /dev/hda3 or /dev/hdb2.

You may read about more kernel flags in 'md' manual here: http://www.squarebox.co.uk/cgi-squarebox/manServer/md.4

Then you may try to use kernel flags to manualy define RAID: 'ro raid=noautodetect md=0,/dev/hda3,/dev/hdb2'

Your problem is really serious and it whould be nice to have a backup copy of all your data...

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Gregory MillerGeneral ManagerAuthor Commented:
I have gotten it solved. As you mentioned, the problem was VERY serious and I ended up hiring a fellow out of Michigan (not found through EE) to assist over the phone to resolve. Unfortunately, the entire file structure of the RAID was sent to Lost+Found, due to some corruption in the directory, and I now get the job of picking through the scraps to find specific data that is important.

The resolution was to boot to rescue mode and manually assign the RAID and then fsck the RAID.

boot from RH install CD
at prompt: "linux rescue"

This command reconnected the two drives into the RAID. The partitions definition was still intact luckily...
mdadm -Ac partitions -m 0 /dev/md0

This command fixed the issue:
fsck /dev/md0        or it may have been    fsck /dev/hda3   (sorry, it was really late)

About an hour later, voila... All done!
0
Gregory MillerGeneral ManagerAuthor Commented:
Noplus,

I saw in your post some detail that the fellow I paid was covering as well. Even though the question was answered yesterday, I will give you the points because of the similarities and I think between us, it would have gotten figured out.

thanks,
-greg
0
Artysystem administratorCommented:
Technodweeb, thank you.
Really live assistance via phone is much more helpfull (onsite visit is better).
I am glad that problem is partially resolved (you need to find useful data in lost+found). Now you may use 'file' utility to do a fast check of data type of unknown file.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.