Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

what happened on this server, hard disk failure or just some bad blocks.

Posted on 2013-05-14
16
Medium Priority
?
383 Views
Last Modified: 2013-05-31
My EDI server stopped working all of sudden, it's a red hat linux. When I was rebooting server, it found some errors on the FS and was successful for correcting them. Although it boot up and enters into OS without problem, I still want to know some precautions strategies for prevent this kind of error happens again. Could anyone give me some advice on how to check the healthy status of this server. Here is the log file of the system.

thanks.



May  5 04:02:15 luna syslogd 1.4.1: restart.
May  5 05:02:20 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 05:20:13 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 05:36:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 06:27:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 06:44:47 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 07:18:59 luna ntpd[3353]: time reset +1.373396 s
May  5 07:22:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 07:23:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 07:31:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 08:04:53 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 08:38:33 luna ntpd[3353]: time reset +0.352645 s
May  5 08:42:28 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 09:04:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:04:12 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 10:21:09 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:38:12 luna ntpd[3353]: time reset +0.468421 s
May  5 10:42:19 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 10:43:22 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 10:51:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 11:45:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 12:03:51 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 12:20:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 13:11:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 13:45:28 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 14:20:18 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 14:36:41 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 15:28:13 luna ntpd[3353]: time reset +0.262163 s
May  5 15:32:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 15:33:35 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 15:48:16 luna ntpd[3353]: time reset +0.130489 s
May  5 15:51:58 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 15:53:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 16:54:08 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 19:11:32 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 19:28:41 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 19:44:44 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 20:18:55 luna ntpd[3353]: time reset +1.716473 s
May  5 20:22:23 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 20:24:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 20:40:46 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 22:38:32 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 23:12:42 luna ntpd[3353]: time reset +1.191759 s
May  5 23:16:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  5 23:17:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  5 23:28:09 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 00:31:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 00:48:54 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 01:22:34 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 01:39:38 luna ntpd[3353]: time reset +1.063791 s
May  6 01:43:33 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 01:44:03 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 01:59:43 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 03:35:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 04:09:44 luna ntpd[3353]: time reset +0.970253 s
May  6 04:13:10 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 04:33:22 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 05:30:08 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 05:47:34 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 06:04:19 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 06:38:30 luna ntpd[3353]: time reset +0.652795 s
May  6 06:42:01 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 06:43:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 07:10:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 07:29:37 luna avahi-daemon[3903]: Invalid query packet.
May  6 07:30:17 luna last message repeated 7 times
May  6 07:49:39 luna avahi-daemon[3903]: Invalid query packet.
May  6 08:09:19 luna last message repeated 14 times
May  6 08:32:16 luna last message repeated 9 times
May  6 08:36:47 luna last message repeated 24 times
May  6 08:37:27 luna last message repeated 9 times
May  6 09:21:01 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 09:42:02 luna avahi-daemon[3903]: Invalid query packet.
May  6 09:42:42 luna last message repeated 7 times
May  6 09:55:10 luna ntpd[3353]: time reset +1.077468 s
May  6 09:58:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 10:00:37 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 10:14:37 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 12:42:41 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 12:54:05 luna avahi-daemon[3903]: Invalid query packet.
May  6 12:54:09 luna last message repeated 6 times
May  6 12:59:57 luna ntpd[3353]: time reset +1.406363 s
May  6 13:04:13 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 13:05:17 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 13:12:50 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 13:23:44 luna avahi-daemon[3903]: Invalid query packet.
May  6 13:24:25 luna last message repeated 7 times
May  6 14:13:42 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 14:48:26 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 15:04:56 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 15:39:10 luna ntpd[3353]: time reset +1.119836 s
May  6 15:42:40 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 15:44:47 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 15:54:29 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 16:38:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 16:55:38 luna ntpd[3353]: time reset +0.429475 s
May  6 16:59:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 17:01:05 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 17:12:59 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 17:33:50 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 18:16:32 luna ntpd[3353]: time reset +0.655949 s
May  6 18:20:43 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 18:21:16 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 18:22:52 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 18:43:23 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 19:42:42 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 20:16:30 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 20:22:56 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452128 in dir #78447464
May  6 20:22:56 luna kernel: Aborting journal on device dm-4.
May  6 20:22:56 luna kernel: ext3_abort called.
May  6 20:22:56 luna kernel: EXT3-fs error (device dm-4): ext3_journal_start_sb: Detected aborted journal
May  6 20:22:56 luna kernel: Remounting filesystem read-only
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452124 in dir #78448586
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452126 in dir #78448586
May  6 20:22:58 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452125 in dir #78448586
May  6 20:22:59 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452105 in dir #78447723
May  6 20:23:02 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452127 in dir #78447518
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452097 in dir #78448625
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452099 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452123 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452100 in dir #78447250
May  6 20:23:03 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452098 in dir #78447250
May  6 20:50:39 luna ntpd[3353]: time reset +0.714664 s
May  6 20:54:53 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 21:23:54 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 21:41:08 luna ntpd[3353]: time reset +0.301325 s
May  6 21:44:57 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 22:06:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 22:51:30 luna kernel: printk: 9 messages suppressed.
May  6 22:51:30 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452128 in dir #78447464
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452124 in dir #78448586
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452126 in dir #78448586
May  6 22:51:32 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452125 in dir #78448586
May  6 22:51:33 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452105 in dir #78447723
May  6 22:51:34 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452127 in dir #78447518
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452097 in dir #78448625
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452099 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452123 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452100 in dir #78447250
May  6 22:51:40 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452098 in dir #78447250
May  6 22:51:41 luna kernel: EXT3-fs error (device dm-4): ext3_lookup: unlinked inode 78452106 in dir #78448565
May  6 23:22:01 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  6 23:39:09 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  6 23:55:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 00:29:38 luna ntpd[3353]: time reset +0.419783 s
May  7 00:33:16 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 01:07:07 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 01:59:05 luna ntpd[3353]: time reset -0.226514 s
May  7 02:03:21 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 02:03:58 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 02:05:30 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 03:14:29 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 03:48:55 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 04:05:43 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 04:39:53 luna ntpd[3353]: time reset +0.513939 s
May  7 04:43:15 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 04:45:22 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 05:14:25 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 06:26:24 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 06:43:31 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 06:59:31 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
May  7 07:36:27 luna avahi-daemon[3903]: Invalid query packet.
May  7 07:37:07 luna last message repeated 9 times
May  7 07:49:44 luna avahi-daemon[3903]: Invalid query packet.
May  7 07:50:42 luna last message repeated 15 times
May  7 07:50:45 luna ntpd[3353]: time reset +0.743814 s
May  7 07:54:57 luna ntpd[3353]: synchronized to LOCAL(0), stratum 10
May  7 08:12:29 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:13:09 luna last message repeated 8 times
May  7 08:14:38 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:15:18 luna last message repeated 7 times
May  7 08:30:06 luna gconfd (root-20075): starting (version 2.14.0), pid 20075 user 'root'
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
May  7 08:30:06 luna gconfd (root-20075): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
May  7 08:30:07 luna gconfd (root-20075): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
May  7 08:30:08 luna nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh:     read connection 'System eth1'
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
May  7 08:30:08 luna nm-system-settings:    ifcfg-rh:     read connection 'System eth0'
May  7 08:38:35 luna avahi-daemon[3903]: Invalid query packet.
May  7 08:39:06 luna last message repeated 15 times
May  7 08:39:15 luna last message repeated 6 times
May  7 08:52:04 luna ntpd[3353]: synchronized to 10.10.4.10, stratum 3
0
Comment
Question by:Jason Yu
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 2
  • +3
16 Comments
 

Author Comment

by:Jason Yu
ID: 39166232
here is the copy screen when it was rebooting. it says found fs errors.
Luna-FS-error.JPG
0
 
LVL 5

Assisted Solution

by:atechnicnate
atechnicnate earned 448 total points
ID: 39166246
It can be either hard disk failure or due to a read-only mounted folder or possibly something else.  It's hard to say without more data but I'd probably start with an fsck on it.
0
 
LVL 88

Assisted Solution

by:rindi
rindi earned 448 total points
ID: 39166275
Most servers have utilities you can install in your OS that can give you the state of the disks in the array and also other hardware status. Also, Dell servers have an iDRAC, or HP's a lights-out installed (often), and those can also give you the hardware status. Just check the manuals of your hardware for more details.
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 664 total points
ID: 39166858
what type of disk is dm-4?
0
 

Author Comment

by:Jason Yu
ID: 39166866
DM-4 is a local partition. It's a logic volume.
0
 
LVL 5

Assisted Solution

by:atechnicnate
atechnicnate earned 448 total points
ID: 39166871
If your drive is smart capable you could use some of the smartmon tools.
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 664 total points
ID: 39166878
Have you run a full disk diagnostic test?
0
 

Author Comment

by:Jason Yu
ID: 39167056
not yet, how to do a disk diagnostic. I remember last time I press F2 when the server boot up, is this the correct way.

thanks.
0
 
LVL 6

Assisted Solution

by:Vijay Pratap Singh
Vijay Pratap Singh earned 220 total points
ID: 39167083
Try to run e2fsck this will fix the inode table.
0
 
LVL 88

Accepted Solution

by:
rindi earned 448 total points
ID: 39167187
As I said, most server's include tools to diagnose your disks. They are usually a part of the RAID controller and management software. It totally depends on your server so you must check it's manuals for details.

For normal PC's you would boot it using the UBCD and then running the HD manufacturer's diagnostic utility, but the RAID controllers of servers mask those disks from the utilities, so you would first have to remove each disk from the RAID controller and attach them to a non-RAID controller, then run the diagnostics.

http://ultimatebootcd.com
http://pharry.org/data/ubcd523.iso
0
 

Author Comment

by:Jason Yu
ID: 39169483
Thanks a lot, I will create a case with Dell and run a diagnostic test on it.
0
 
LVL 21

Expert Comment

by:Mazdajai
ID: 39169498
It should be F12, not sure if it has changed.
0
 

Author Comment

by:Jason Yu
ID: 39169564
Can I do a e2fsck online from inside the OS?

I remember when the issue was happening, I was trying to umount this partition and do a fsck command. however, it didn't let me umount this partition. If I meet this case again, what should I do, thanks.
0
 
LVL 30

Assisted Solution

by:serialband
serialband earned 220 total points
ID: 39169611
Boot into single user mode with init 1 and you can run fsck on the boot/kernel partition

or from grub

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/System_Administration_Guide/s1-rescuemode-booting-single.html
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 664 total points
ID: 39169903
Quickest way is to force a fsck after reboot -

shutdown -rF now

Open in new window

0
 

Author Closing Comment

by:Jason Yu
ID: 39212077
Thanks experts here, I appreciate your valuable posts and advice.
0

Featured Post

Create CentOS 7 Newton Packstack Running Keystone

A bug was filed against RDO for the installation of Keystone v3. This guide is designed to walk you through the configuration for using Keystone v3 with Packstack. You will accomplish this using various repos and the Answers file.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question