Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1639
  • Last Modified:

Red Hat Enterprise Linux server file system is corrupt, how to repair?

We have a server that is running Red Hat Enterprise Linux.  Recently the server became unresponsive and had to be rebooted manually, by holding down the power button until it shut off.  

We were able to get the server back up and running, but after 24 hours it became unresponsive again and had to be manually rebooted.  This time it took several attempts to get the server up and running and now the file system is in RO mode and we can not do anything with it.

We do not have much experience with Linux and inherited this system from the previous IT members.  This system runs an Oracle instance with several schemas on it.  The Oracle databases live on a LUN on our SAN unit, an EMC Clarion CX300.

We do have an rsync that was running and it appears to have a copy of most of the file system that was on the local disks for the linux install.  We also have the original install disks for Red Hat and also have a Linux Live CD, KNOPTIX I believe.

Is there anyway to repair or restore the file system from the rsync copy?  How can we determine if it is a hardware failure that is causing the issues?  I am leaning toward a failing hard drive, but do not know enough about linux to troubleshoot.

We do have a second Linux server that i believe is running the same version of Red Hat.  We could install Oracle on that server and then attempt to attach the LUN to the new server and bring the databases online that way, but would prefer to get the original server online if possible.
0
aksealife
Asked:
aksealife
  • 22
  • 12
  • 2
  • +2
4 Solutions
 
gireeshbabuCommented:
If that is a local file system, we could use fsck command to fix it.
0
 
aksealifeAuthor Commented:
can i run fsck from the install disk? If so how would I go about this?

The server is currently up and running, but i am not able to get the file browser or the terminal to come up.  is there a way to do a clean shutdown via a remote command line?

I apologize in advance for my lack of knowledge when it comes to Linux
0
 
penguins_ruleCommented:
possible way to get clean shutdown from remote command line
 login as root
  shutdown -h now
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
aksealifeAuthor Commented:
If I try to SSH in using Putty i get "access denied" when i enter the password for root.
0
 
gireeshbabuCommented:
If you get the grub prompt at the machine by pressing F1 while booting, you may press letter e to add a kernal param linux single, to boot the machine in run level 0.  There you get a prompt directly to type the command fsck.  You will not even be asked for a root password. Mount the local disk and run fsck with proper device path.
0
 
aksealifeAuthor Commented:
I was able to get to the grub prompt by pressing F1 during boot. I get a list of different OS options to boot and if I press e on the highlighted one I get a what looks like a list of boot parameters? How do I ass the param to boot in run level 0?

Please be as detailed as possible? I have very limited Linux experience.

Thank you.
0
 
aksealifeAuthor Commented:
screen shot of grub
This is what i see when i press F1 during the boot process.
0
 
gheistCommented:
It is RHEL4, EOL long ago.

press e on any of options
then in line that starts with kernel add "single" in the end (it is not saved, applies just for this boot)

now boot up
now run fsck -f -p
any errors?
0
 
Nicola MackinCommented:
Don't use run level 0: that is halt! Instead use runlevel 1

Also, DO NOT run fsck on a mounted file system, Unmount first! If fsck itself is on the filesystem you need to fix then you will have to book using a rescue book disk.
0
 
penguins_ruleCommented:
if you logon as a regular user with putty, can you do
# su
and then key in root's password?

How about
# su -
and then root's password?

For RedHat Enterprise Linux 6.4 and newer:
The file /etc/init/control-alt-delete.conf allows ctrl-alt-delete to shutdown the system

]# cat control-alt-delete.conf
# control-alt-delete - emergency keypress handling
#
# This task is run whenever the Control-Alt-Delete key combination is
# pressed.  Usually used to shut down the machine.
# REMOVE the asterisks at beginning of both lines to allow it to work
# start on control-alt-delete
# exec /sbin/shutdown -r now "Control-Alt-Delete pressed"
0
 
aksealifeAuthor Commented:
So here is where we are. I can't get Linux to boot normally at all. I can get the grub screen to show with the OS choices by pressing F1 firing boot. I then see the screen I posted earlier. If I press the "e" key on the highlighted choice I get the following lines

root (hd0,0)
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-22.ELsmp.img
 
What do I need to add to the kernel line to attempt to boot so that I can run fsck?

If using a rescue disk to boot is my best option how do I go about that?

Again sorry for the lack of Linux know how.
0
 
gheistCommented:
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet single
0
 
aksealifeAuthor Commented:
after adding "single" as suggested above and booting i get the following.

Screen shot
0
 
gheistCommented:
now run fsck /dev/VG/LV0
..
..
..
in a row
tell if there are any problems
0
 
aksealifeAuthor Commented:
Is it OK to run fsck with the filesystem mounted?
0
 
aksealifeAuthor Commented:
Do you want me to run fsck on LV0
then
LV1
LV2
LV3
LV4
LV5
LV6

and report any errors?
0
 
gheistCommented:
i dont see LV4 or LV5 among filesystems... but yes.
fsck says that filesystem is mounted and asks before each repair
0
 
aksealifeAuthor Commented:
Ok when I attempt to run fsck I get the following

WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.

I picked n to not continue.

Same result for LV0 and LV1
0
 
gheistCommented:
tip: you can umount filesystems and check
0
 
aksealifeAuthor Commented:
will typing df show me all mounted drives?
0
 
aksealifeAuthor Commented:
how would i unmount the filesystems?
0
 
aksealifeAuthor Commented:
would i run?

unmount -A
0
 
gheistCommented:
you can umount using device or mount point.
0
 
aksealifeAuthor Commented:
When I type

unmount /dev/VG/LV0

I get

unmount: command not found

Also tried
unmount /dev/mapper/VG-LV0
unmount /home
0
 
aksealifeAuthor Commented:
Ah, just noticed my error. it is umount and not unmount.
0
 
aksealifeAuthor Commented:
Ok was able to umount and run fsck, it doesn't appear to give any error.  I did not umount the two empower mounts.  These are the oracle dbs we are trying to get to.  should i umount them and run fsck on them as well?  

Do i need to remount these filesystems?  They should mount on the next boot, is that correct?

Here are the results of the FSCK
FSCK results
here is what df showed me
DF
0
 
gheistCommented:
Normally journal replay is fine.
You must use umount and
fsck -f
0
 
aksealifeAuthor Commented:
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?
0
 
aksealifeAuthor Commented:
should i add -y and just let fsck fix the errors?
0
 
aksealifeAuthor Commented:
I am finishing up the fsck on LV0, looks like lots of errors.

We do have a recent rsync backup of what looks like the entire local filesystem. Is it possible to restore this? If we put new drives in the server and install RH can we then just copy over the backed up files?
0
 
gheistCommented:
you need to restore only /home
thats cheap and easy.

install "smartmontools" package and do smartctl -a /dev/sda
it will show if your disk needs replacement.
0
 
aksealifeAuthor Commented:
Do we need to reinstall red hat first or can I install this package via single user mode?
0
 
gheistCommented:
As you wish
It wil tell if disk gone bad or not so you know if you need to install new server or this is still alive.
Centos4 packages are here: http://vault.centos.org/4.9/
0
 
aksealifeAuthor Commented:
I have a Knoppix Live CD that has smartmontools on it.  However i am not able to determine if the disk(s) are bad.

If i view the file system on Sda1 and sda2 i see that /home /var /usr as well as other directories are all empty.  I am currently using the live cd to copy the oracle databases to an external drive.
0
 
gheistCommented:
On knoppix:
$ sudo -s
#smartctl -a /dev/sda
#smartctl -a /dev/sdb
0
 
gheistCommented:
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?


Disk is BAD, just dump it ASAP
0
 
aksealifeAuthor Commented:
If i install a new disk how do i go about recovering the server?  

What would need to be copied over to bring the server back to it's working state before the drive failure?

I have an RSYNC backup of the entire drive prior to the start of these issues, the main thing i need is to get Oracle up and running and Oracle support has been zero help.
0
 
gheistCommented:
Oracle will need RMAN backup or export. For all normal files rsync is sufficient.
Make sure you install smartmontools and run smartd to have disk failure prediction on all your disks to avoid future data loss.
0
 
aksealifeAuthor Commented:
Sounds good.  We do have RMAN backups of the Oracle DBs.

thank you for all of  your help.
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

  • 22
  • 12
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now