Solved

what happened on this server, hard disk failure or just some bad blocks.

Posted on 2013-05-14
16
345 Views
Last Modified: 2013-05-31
My EDI server stopped working all of sudden, it's a red hat linux. When I was rebooting server, it found some errors on the FS and was successful for correcting them. Although it boot up and enters into OS without problem, I still want to know some precautions strategies for prevent this kind of error happens again. Could anyone give me some advice on how to check the healthy status of this server. Here is the log file of the system.

thanks.



May  5 04:02:15 luna syslogd 1.4.1: restart.
May  5 05:02:20 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 05:20:13 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 05:36:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 06:27:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 06:44:47 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 07:18:59 luna ntpd[3353]: time reset +1.373396 s
May  5 07:22:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 07:23:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 07:31:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 08:04:53 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 08:38:33 luna ntpd[3353]: time reset +0.352645 s
May  5 08:42:28 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 09:04:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:04:12 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 10:21:09 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:38:12 luna ntpd[3353]: time reset +0.468421 s
May  5 10:42:19 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 10:43:22 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:51:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 11:45:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 12:03:51 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 12:20:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 13:11:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 13:45:28 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 14:20:18 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 14:36:41 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 15:28:13 luna ntpd[3353]: time reset +0.262163 s
May  5 15:32:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 15:33:35 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 15:48:16 luna ntpd[3353]: time reset +0.130489 s
May  5 15:51:58 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 15:53:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 16:54:08 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 19:11:32 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 19:28:41 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 19:44:44 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 20:18:55 luna ntpd[3353]: time reset +1.716473 s
May  5 20:22:23 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 20:24:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 20:40:46 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 22:38:32 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 23:12:42 luna ntpd[3353]: time reset +1.191759 s
May  5 23:16:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 23:17:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 23:28:09 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 00:31:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 00:48:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 01:22:34 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 01:39:38 luna ntpd[3353]: time reset +1.063791 s
May  6 01:43:33 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 01:44:03 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 01:59:43 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 03:35:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 04:09:44 luna ntpd[3353]: time reset +0.970253 s
May  6 04:13:10 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 04:33:22 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 05:30:08 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 05:47:34 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 06:04:19 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 06:38:30 luna ntpd[3353]: time reset +0.652795 s
May  6 06:42:01 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 06:43:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 07:10:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 07:29:37 luna avahi-daemon[3903]: Invalid query packet.
May  6 07:30:17 luna last message repeated 7 times
May  6 07:49:39 luna avahi-daemon[3903]: Invalid query packet.
May  6 08:09:19 luna last message repeated 14 times
May  6 08:32:16 luna last message repeated 9 times
May  6 08:36:47 luna last message repeated 24 times
May  6 08:37:27 luna last message repeated 9 times
May  6 09:21:01 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 09:42:02 luna avahi-daemon[3903]: Invalid query packet.
May  6 09:42:42 luna last message repeated 7 times
May  6 09:55:10 luna ntpd[3353]: time reset +1.077468 s
May  6 09:58:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 10:00:37 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 10:14:37 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 12:42:41 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 12:54:05 luna avahi-daemon[3903]: Invalid query packet.
May  6 12:54:09 luna last message repeated 6 times
May  6 12:59:57 luna ntpd[3353]: time reset +1.406363 s
May  6 13:04:13 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 13:05:17 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 13:12:50 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 13:23:44 luna avahi-daemon[3903]: Invalid query packet.
May  6 13:24:25 luna last message repeated 7 times
May  6 14:13:42 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 14:48:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 15:04:56 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 15:39:10 luna ntpd[3353]: time reset +1.119836 s
May  6 15:42:40 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 15:44:47 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 15:54:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 16:38:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 16:55:38 luna ntpd[3353]: time reset +0.429475 s
May  6 16:59:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 17:01:05 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 17:12:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 17:33:50 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 18:16:32 luna ntpd[3353]: time reset +0.655949 s
May  6 18:20:43 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 18:21:16 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 18:22:52 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 18:43:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 19:42:42 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 20:16:30 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 20:22:56 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452128 in dir #78447464
May  6 20:22:56 luna kernel: Aborting journal on device dm-4.
May  6 20:22:56 luna kernel: ext3_abort called.
May  6 20:22:56 luna kernel: EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
May  6 20:22:56 luna kernel: Remounting filesystem read-only
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452124 in dir #78448586
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452126 in dir #78448586
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452125 in dir #78448586
May  6 20:22:59 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452105 in dir #78447723
May  6 20:23:02 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452127 in dir #78447518
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452097 in dir #78448625
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452099 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452123 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452100 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452098 in dir #78447250
May  6 20:50:39 luna ntpd[3353]: time reset +0.714664 s
May  6 20:54:53 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 21:23:54 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 21:41:08 luna ntpd[3353]: time reset +0.301325 s
May  6 21:44:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 22:06:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 22:51:30 luna kernel: printk: 9 messages suppressed.
May  6 22:51:30 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452128 in dir #78447464
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452124 in dir #78448586
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452126 in dir #78448586
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452125 in dir #78448586
May  6 22:51:33 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452105 in dir #78447723
May  6 22:51:34 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452127 in dir #78447518
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452097 in dir #78448625
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452099 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452123 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452100 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452098 in dir #78447250
May  6 22:51:41 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452106 in dir #78448565
May  6 23:22:01 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 23:39:09 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 23:55:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 00:29:38 luna ntpd[3353]: time reset +0.419783 s
May  7 00:33:16 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 01:07:07 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 01:59:05 luna ntpd[3353]: time reset -0.226514 s
May  7 02:03:21 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 02:03:58 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 02:05:30 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 03:14:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 03:48:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 04:05:43 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 04:39:53 luna ntpd[3353]: time reset +0.513939 s
May  7 04:43:15 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 04:45:22 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 05:14:25 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 06:26:24 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 06:43:31 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 06:59:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 07:36:27 luna avahi-daemon[3903]: Invalid query packet.
May  7 07:37:07 luna last message repeated 9 times
May  7 07:49:44 luna avahi-daemon[3903]: Invalid query packet.
May  7 07:50:42 luna last message repeated 15 times
May  7 07:50:45 luna ntpd[3353]: time reset +0.743814 s
May  7 07:54:57 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 08:12:29 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:13:09 luna last message repeated 8 times
May  7 08:14:38 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:15:18 luna last message repeated 7 times
May  7 08:30:06 luna gconfd (root-20075): starting (version 2.14.0), pid 20075 user 'root'
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
May  7 08:30:07 luna gconfd (root-20075): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
May  7 08:30:08 luna nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh:     read connection 'System eth1'
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh:     read connection 'System eth0'
May  7 08:38:35 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:39:06 luna last message repeated 15 times
May  7 08:39:15 luna last message repeated 6 times
May  7 08:52:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
0
Comment
Question by:Jason Yu
  • 6
  • 4
  • 2
  • +3
16 Comments
 

Author Comment

by:Jason Yu
Comment Utility
here is the copy screen when it was rebooting. it says found fs errors.
Luna-FS-error.JPG
0
 
LVL 5

Assisted Solution

by:atechnicnate
atechnicnate earned 112 total points
Comment Utility
It can be either hard disk failure or due to a read-only mounted folder or possibly something else.  It's hard to say without more data but I'd probably start with an fsck on it.
0
 
LVL 87

Assisted Solution

by:rindi
rindi earned 112 total points
Comment Utility
Most servers have utilities you can install in your OS that can give you the state of the disks in the array and also other hardware status. Also, Dell servers have an iDRAC, or HP's a lights-out installed (often), and those can also give you the hardware status. Just check the manuals of your hardware for more details.
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 166 total points
Comment Utility
what type of disk is dm-4?
0
 

Author Comment

by:Jason Yu
Comment Utility
DM-4 is a local partition. It's a logic volume.
0
 
LVL 5

Assisted Solution

by:atechnicnate
atechnicnate earned 112 total points
Comment Utility
If your drive is smart capable you could use some of the smartmon tools.
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 166 total points
Comment Utility
Have you run a full disk diagnostic test?
0
 

Author Comment

by:Jason Yu
Comment Utility
not yet, how to do a disk diagnostic. I remember last time I press F2 when the server boot up, is this the correct way.

thanks.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 6

Assisted Solution

by:Vijay Pratap Singh
Vijay Pratap Singh earned 55 total points
Comment Utility
Try to run e2fsck this will fix the inode table.
0
 
LVL 87

Accepted Solution

by:
rindi earned 112 total points
Comment Utility
As I said, most server's include tools to diagnose your disks. They are usually a part of the RAID controller and management software. It totally depends on your server so you must check it's manuals for details.

For normal PC's you would boot it using the UBCD and then running the HD manufacturer's diagnostic utility, but the RAID controllers of servers mask those disks from the utilities, so you would first have to remove each disk from the RAID controller and attach them to a non-RAID controller, then run the diagnostics.

http://ultimatebootcd.com
http://pharry.org/data/ubcd523.iso
0
 

Author Comment

by:Jason Yu
Comment Utility
Thanks a lot, I will create a case with Dell and run a diagnostic test on it.
0
 
LVL 21

Expert Comment

by:Mazdajai
Comment Utility
It should be F12, not sure if it has changed.
0
 

Author Comment

by:Jason Yu
Comment Utility
Can I do a e2fsck online from inside the OS?

I remember when the issue was happening, I was trying to umount this partition and do a fsck command. however, it didn't let me umount this partition. If I meet this case again, what should I do, thanks.
0
 
LVL 27

Assisted Solution

by:serialband
serialband earned 55 total points
Comment Utility
Boot into single user mode with init 1 and you can run fsck on the boot/kernel partition

or from grub

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/System_Administration_Guide/s1-rescuemode-booting-single.html
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 166 total points
Comment Utility
Quickest way is to force a fsck after reboot -

shutdown -rF now

Open in new window

0
 

Author Closing Comment

by:Jason Yu
Comment Utility
Thanks experts here, I appreciate your valuable posts and advice.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now