Red Hat Enterprise Linux server file system is corrupt, how to repair?

We have a server that is running Red Hat Enterprise Linux.  Recently the server became unresponsive and had to be rebooted manually, by holding down the power button until it shut off.  

We were able to get the server back up and running, but after 24 hours it became unresponsive again and had to be manually rebooted.  This time it took several attempts to get the server up and running and now the file system is in RO mode and we can not do anything with it.

We do not have much experience with Linux and inherited this system from the previous IT members.  This system runs an Oracle instance with several schemas on it.  The Oracle databases live on a LUN on our SAN unit, an EMC Clarion CX300.

We do have an rsync that was running and it appears to have a copy of most of the file system that was on the local disks for the linux install.  We also have the original install disks for Red Hat and also have a Linux Live CD, KNOPTIX I believe.

Is there anyway to repair or restore the file system from the rsync copy?  How can we determine if it is a hardware failure that is causing the issues?  I am leaning toward a failing hard drive, but do not know enough about linux to troubleshoot.

We do have a second Linux server that i believe is running the same version of Red Hat.  We could install Oracle on that server and then attempt to attach the LUN to the new server and bring the databases online that way, but would prefer to get the original server online if possible.
aksealifeIS ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gireeshbabuCommented:
If that is a local file system, we could use fsck command to fix it.
0
aksealifeIS ManagerAuthor Commented:
can i run fsck from the install disk? If so how would I go about this?

The server is currently up and running, but i am not able to get the file browser or the terminal to come up.  is there a way to do a clean shutdown via a remote command line?

I apologize in advance for my lack of knowledge when it comes to Linux
0
penguins_ruleCommented:
possible way to get clean shutdown from remote command line
 login as root
  shutdown -h now
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

aksealifeIS ManagerAuthor Commented:
If I try to SSH in using Putty i get "access denied" when i enter the password for root.
0
gireeshbabuCommented:
If you get the grub prompt at the machine by pressing F1 while booting, you may press letter e to add a kernal param linux single, to boot the machine in run level 0.  There you get a prompt directly to type the command fsck.  You will not even be asked for a root password. Mount the local disk and run fsck with proper device path.
0
aksealifeIS ManagerAuthor Commented:
I was able to get to the grub prompt by pressing F1 during boot. I get a list of different OS options to boot and if I press e on the highlighted one I get a what looks like a list of boot parameters? How do I ass the param to boot in run level 0?

Please be as detailed as possible? I have very limited Linux experience.

Thank you.
0
aksealifeIS ManagerAuthor Commented:
screen shot of grub
This is what i see when i press F1 during the boot process.
0
gheistCommented:
It is RHEL4, EOL long ago.

press e on any of options
then in line that starts with kernel add "single" in the end (it is not saved, applies just for this boot)

now boot up
now run fsck -f -p
any errors?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Nicola MackinIndependent ConsultantCommented:
Don't use run level 0: that is halt! Instead use runlevel 1

Also, DO NOT run fsck on a mounted file system, Unmount first! If fsck itself is on the filesystem you need to fix then you will have to book using a rescue book disk.
0
penguins_ruleCommented:
if you logon as a regular user with putty, can you do
# su
and then key in root's password?

How about
# su -
and then root's password?

For RedHat Enterprise Linux 6.4 and newer:
The file /etc/init/control-alt-delete.conf allows ctrl-alt-delete to shutdown the system

]# cat control-alt-delete.conf
# control-alt-delete - emergency keypress handling
#
# This task is run whenever the Control-Alt-Delete key combination is
# pressed.  Usually used to shut down the machine.
# REMOVE the asterisks at beginning of both lines to allow it to work
# start on control-alt-delete
# exec /sbin/shutdown -r now "Control-Alt-Delete pressed"
0
aksealifeIS ManagerAuthor Commented:
So here is where we are. I can't get Linux to boot normally at all. I can get the grub screen to show with the OS choices by pressing F1 firing boot. I then see the screen I posted earlier. If I press the "e" key on the highlighted choice I get the following lines

root (hd0,0)
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-22.ELsmp.img
 
What do I need to add to the kernel line to attempt to boot so that I can run fsck?

If using a rescue disk to boot is my best option how do I go about that?

Again sorry for the lack of Linux know how.
0
gheistCommented:
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet single
0
aksealifeIS ManagerAuthor Commented:
after adding "single" as suggested above and booting i get the following.

Screen shot
0
gheistCommented:
now run fsck /dev/VG/LV0
..
..
..
in a row
tell if there are any problems
0
aksealifeIS ManagerAuthor Commented:
Is it OK to run fsck with the filesystem mounted?
0
aksealifeIS ManagerAuthor Commented:
Do you want me to run fsck on LV0
then
LV1
LV2
LV3
LV4
LV5
LV6

and report any errors?
0
gheistCommented:
i dont see LV4 or LV5 among filesystems... but yes.
fsck says that filesystem is mounted and asks before each repair
0
aksealifeIS ManagerAuthor Commented:
Ok when I attempt to run fsck I get the following

WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.

I picked n to not continue.

Same result for LV0 and LV1
0
gheistCommented:
tip: you can umount filesystems and check
0
aksealifeIS ManagerAuthor Commented:
will typing df show me all mounted drives?
0
aksealifeIS ManagerAuthor Commented:
how would i unmount the filesystems?
0
aksealifeIS ManagerAuthor Commented:
would i run?

unmount -A
0
gheistCommented:
you can umount using device or mount point.
0
aksealifeIS ManagerAuthor Commented:
When I type

unmount /dev/VG/LV0

I get

unmount: command not found

Also tried
unmount /dev/mapper/VG-LV0
unmount /home
0
aksealifeIS ManagerAuthor Commented:
Ah, just noticed my error. it is umount and not unmount.
0
aksealifeIS ManagerAuthor Commented:
Ok was able to umount and run fsck, it doesn't appear to give any error.  I did not umount the two empower mounts.  These are the oracle dbs we are trying to get to.  should i umount them and run fsck on them as well?  

Do i need to remount these filesystems?  They should mount on the next boot, is that correct?

Here are the results of the FSCK
FSCK results
here is what df showed me
DF
0
gheistCommented:
Normally journal replay is fine.
You must use umount and
fsck -f
0
aksealifeIS ManagerAuthor Commented:
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?
0
aksealifeIS ManagerAuthor Commented:
should i add -y and just let fsck fix the errors?
0
aksealifeIS ManagerAuthor Commented:
I am finishing up the fsck on LV0, looks like lots of errors.

We do have a recent rsync backup of what looks like the entire local filesystem. Is it possible to restore this? If we put new drives in the server and install RH can we then just copy over the backed up files?
0
gheistCommented:
you need to restore only /home
thats cheap and easy.

install "smartmontools" package and do smartctl -a /dev/sda
it will show if your disk needs replacement.
0
aksealifeIS ManagerAuthor Commented:
Do we need to reinstall red hat first or can I install this package via single user mode?
0
gheistCommented:
As you wish
It wil tell if disk gone bad or not so you know if you need to install new server or this is still alive.
Centos4 packages are here: http://vault.centos.org/4.9/
0
aksealifeIS ManagerAuthor Commented:
I have a Knoppix Live CD that has smartmontools on it.  However i am not able to determine if the disk(s) are bad.

If i view the file system on Sda1 and sda2 i see that /home /var /usr as well as other directories are all empty.  I am currently using the live cd to copy the oracle databases to an external drive.
0
gheistCommented:
On knoppix:
$ sudo -s
#smartctl -a /dev/sda
#smartctl -a /dev/sdb
0
gheistCommented:
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?


Disk is BAD, just dump it ASAP
0
aksealifeIS ManagerAuthor Commented:
If i install a new disk how do i go about recovering the server?  

What would need to be copied over to bring the server back to it's working state before the drive failure?

I have an RSYNC backup of the entire drive prior to the start of these issues, the main thing i need is to get Oracle up and running and Oracle support has been zero help.
0
gheistCommented:
Oracle will need RMAN backup or export. For all normal files rsync is sufficient.
Make sure you install smartmontools and run smartd to have disk failure prediction on all your disks to avoid future data loss.
0
aksealifeIS ManagerAuthor Commented:
Sounds good.  We do have RMAN backups of the Oracle DBs.

thank you for all of  your help.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.