Solved

Red Hat Enterprise Linux server file system is corrupt, how to repair?

Posted on 2014-07-30
39
1,202 Views
Last Modified: 2014-08-19
We have a server that is running Red Hat Enterprise Linux.  Recently the server became unresponsive and had to be rebooted manually, by holding down the power button until it shut off.  

We were able to get the server back up and running, but after 24 hours it became unresponsive again and had to be manually rebooted.  This time it took several attempts to get the server up and running and now the file system is in RO mode and we can not do anything with it.

We do not have much experience with Linux and inherited this system from the previous IT members.  This system runs an Oracle instance with several schemas on it.  The Oracle databases live on a LUN on our SAN unit, an EMC Clarion CX300.

We do have an rsync that was running and it appears to have a copy of most of the file system that was on the local disks for the linux install.  We also have the original install disks for Red Hat and also have a Linux Live CD, KNOPTIX I believe.

Is there anyway to repair or restore the file system from the rsync copy?  How can we determine if it is a hardware failure that is causing the issues?  I am leaning toward a failing hard drive, but do not know enough about linux to troubleshoot.

We do have a second Linux server that i believe is running the same version of Red Hat.  We could install Oracle on that server and then attempt to attach the LUN to the new server and bring the databases online that way, but would prefer to get the original server online if possible.
0
Comment
Question by:aksealife
  • 22
  • 12
  • 2
  • +2
39 Comments
 
LVL 1

Expert Comment

by:gireeshbabu
ID: 40230006
If that is a local file system, we could use fsck command to fix it.
0
 

Author Comment

by:aksealife
ID: 40230020
can i run fsck from the install disk? If so how would I go about this?

The server is currently up and running, but i am not able to get the file browser or the terminal to come up.  is there a way to do a clean shutdown via a remote command line?

I apologize in advance for my lack of knowledge when it comes to Linux
0
 
LVL 1

Expert Comment

by:penguins_rule
ID: 40230258
possible way to get clean shutdown from remote command line
 login as root
  shutdown -h now
0
 

Author Comment

by:aksealife
ID: 40230315
If I try to SSH in using Putty i get "access denied" when i enter the password for root.
0
 
LVL 1

Expert Comment

by:gireeshbabu
ID: 40230384
If you get the grub prompt at the machine by pressing F1 while booting, you may press letter e to add a kernal param linux single, to boot the machine in run level 0.  There you get a prompt directly to type the command fsck.  You will not even be asked for a root password. Mount the local disk and run fsck with proper device path.
0
 

Author Comment

by:aksealife
ID: 40230536
I was able to get to the grub prompt by pressing F1 during boot. I get a list of different OS options to boot and if I press e on the highlighted one I get a what looks like a list of boot parameters? How do I ass the param to boot in run level 0?

Please be as detailed as possible? I have very limited Linux experience.

Thank you.
0
 

Author Comment

by:aksealife
ID: 40230566
screen shot of grub
This is what i see when i press F1 during the boot process.
0
 
LVL 61

Accepted Solution

by:
gheist earned 500 total points
ID: 40231034
It is RHEL4, EOL long ago.

press e on any of options
then in line that starts with kernel add "single" in the end (it is not saved, applies just for this boot)

now boot up
now run fsck -f -p
any errors?
0
 
LVL 1

Expert Comment

by:Nicola Mackin
ID: 40231429
Don't use run level 0: that is halt! Instead use runlevel 1

Also, DO NOT run fsck on a mounted file system, Unmount first! If fsck itself is on the filesystem you need to fix then you will have to book using a rescue book disk.
0
 
LVL 1

Expert Comment

by:penguins_rule
ID: 40231806
if you logon as a regular user with putty, can you do
# su
and then key in root's password?

How about
# su -
and then root's password?

For RedHat Enterprise Linux 6.4 and newer:
The file /etc/init/control-alt-delete.conf allows ctrl-alt-delete to shutdown the system

]# cat control-alt-delete.conf
# control-alt-delete - emergency keypress handling
#
# This task is run whenever the Control-Alt-Delete key combination is
# pressed.  Usually used to shut down the machine.
# REMOVE the asterisks at beginning of both lines to allow it to work
# start on control-alt-delete
# exec /sbin/shutdown -r now "Control-Alt-Delete pressed"
0
 

Author Comment

by:aksealife
ID: 40232503
So here is where we are. I can't get Linux to boot normally at all. I can get the grub screen to show with the OS choices by pressing F1 firing boot. I then see the screen I posted earlier. If I press the "e" key on the highlighted choice I get the following lines

root (hd0,0)
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-22.ELsmp.img
 
What do I need to add to the kernel line to attempt to boot so that I can run fsck?

If using a rescue disk to boot is my best option how do I go about that?

Again sorry for the lack of Linux know how.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40232589
kernel /vmlinuz-2.6.9-22.ELsmp ro root=LABEL=/ rhgb quiet single
0
 

Author Comment

by:aksealife
ID: 40232758
after adding "single" as suggested above and booting i get the following.

Screen shot
0
 
LVL 61

Expert Comment

by:gheist
ID: 40232956
now run fsck /dev/VG/LV0
..
..
..
in a row
tell if there are any problems
0
 

Author Comment

by:aksealife
ID: 40232972
Is it OK to run fsck with the filesystem mounted?
0
 

Author Comment

by:aksealife
ID: 40233012
Do you want me to run fsck on LV0
then
LV1
LV2
LV3
LV4
LV5
LV6

and report any errors?
0
 
LVL 61

Expert Comment

by:gheist
ID: 40233062
i dont see LV4 or LV5 among filesystems... but yes.
fsck says that filesystem is mounted and asks before each repair
0
 

Author Comment

by:aksealife
ID: 40233071
Ok when I attempt to run fsck I get the following

WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.

I picked n to not continue.

Same result for LV0 and LV1
0
 
LVL 61

Expert Comment

by:gheist
ID: 40233075
tip: you can umount filesystems and check
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 

Author Comment

by:aksealife
ID: 40233076
will typing df show me all mounted drives?
0
 

Author Comment

by:aksealife
ID: 40233077
how would i unmount the filesystems?
0
 

Author Comment

by:aksealife
ID: 40233082
would i run?

unmount -A
0
 
LVL 61

Expert Comment

by:gheist
ID: 40233109
you can umount using device or mount point.
0
 

Author Comment

by:aksealife
ID: 40233116
When I type

unmount /dev/VG/LV0

I get

unmount: command not found

Also tried
unmount /dev/mapper/VG-LV0
unmount /home
0
 

Author Comment

by:aksealife
ID: 40233118
Ah, just noticed my error. it is umount and not unmount.
0
 

Author Comment

by:aksealife
ID: 40233132
Ok was able to umount and run fsck, it doesn't appear to give any error.  I did not umount the two empower mounts.  These are the oracle dbs we are trying to get to.  should i umount them and run fsck on them as well?  

Do i need to remount these filesystems?  They should mount on the next boot, is that correct?

Here are the results of the FSCK
FSCK results
here is what df showed me
DF
0
 
LVL 61

Expert Comment

by:gheist
ID: 40233515
Normally journal replay is fine.
You must use umount and
fsck -f
0
 

Author Comment

by:aksealife
ID: 40234775
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?
0
 

Author Comment

by:aksealife
ID: 40234863
should i add -y and just let fsck fix the errors?
0
 

Author Comment

by:aksealife
ID: 40235106
I am finishing up the fsck on LV0, looks like lots of errors.

We do have a recent rsync backup of what looks like the entire local filesystem. Is it possible to restore this? If we put new drives in the server and install RH can we then just copy over the backed up files?
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 500 total points
ID: 40235196
you need to restore only /home
thats cheap and easy.

install "smartmontools" package and do smartctl -a /dev/sda
it will show if your disk needs replacement.
0
 

Author Comment

by:aksealife
ID: 40235209
Do we need to reinstall red hat first or can I install this package via single user mode?
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 500 total points
ID: 40235266
As you wish
It wil tell if disk gone bad or not so you know if you need to install new server or this is still alive.
Centos4 packages are here: http://vault.centos.org/4.9/
0
 

Author Comment

by:aksealife
ID: 40235496
I have a Knoppix Live CD that has smartmontools on it.  However i am not able to determine if the disk(s) are bad.

If i view the file system on Sda1 and sda2 i see that /home /var /usr as well as other directories are all empty.  I am currently using the live cd to copy the oracle databases to an external drive.
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 500 total points
ID: 40235790
On knoppix:
$ sudo -s
#smartctl -a /dev/sda
#smartctl -a /dev/sdb
0
 
LVL 61

Expert Comment

by:gheist
ID: 40264249
Ran fsck -f on /dev/VG/ LV0

Returned this error

Error reading block 68053 (Attempt to read block from filesystem resulted in shrt read) while reading indirect blocks of inode 32010.  Ignore error<y>?

Should I ignore?


Disk is BAD, just dump it ASAP
0
 

Author Comment

by:aksealife
ID: 40264274
If i install a new disk how do i go about recovering the server?  

What would need to be copied over to bring the server back to it's working state before the drive failure?

I have an RSYNC backup of the entire drive prior to the start of these issues, the main thing i need is to get Oracle up and running and Oracle support has been zero help.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40264712
Oracle will need RMAN backup or export. For all normal files rsync is sufficient.
Make sure you install smartmontools and run smartd to have disk failure prediction on all your disks to avoid future data loss.
0
 

Author Comment

by:aksealife
ID: 40271538
Sounds good.  We do have RMAN backups of the Oracle DBs.

thank you for all of  your help.
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Suggested Solutions

This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now