Solved

Proactive monitoring of ext3 file system

Posted on 2007-11-24
11
338 Views
Last Modified: 2008-02-01
I am admnistering an RHEL4 based linux cluster having 100 nodes and about a dozen NFS filservers (also RHEL4 boxes) with a total capacity of a few tens of TB. As usual, a few blocks keep going bad every now and then. What do I need to proactively monitor the storage system so that I could try recovery before it is too late and advise the right user, identifying files that might have been affected?
0
Comment
Question by:vinod
  • 5
  • 3
11 Comments
 
LVL 43

Expert Comment

by:ravenpl
Comment Utility
> so that I could try recovery before it is too
What You mean - predicting the failure? Some "predictions" can be gathered from http://en.wikipedia.org/wiki/S.M.A.R.T.
smartctl -A /dev/sda
0
 

Author Comment

by:vinod
Comment Utility
Our experience with SMART monitoring has not been very good. I noticed significant increase in file system corruptions while SMART service was enabled and we had to turn it off about a year ago. I could never understand how only a "monitoring" service could cause corruption. Even if it works fine, I believe SMART can give you only statistics of error counters which does not help in identifying the affected file(s).

My concern is slightly different. Disk blocks may go bad randomly. How will I find it unless user uses the file or I run fsck, both of which may not happen for a long time? Even at that time, the block may or may not be in correctable state. Is there something that I can do frequently, say every night, to check the file system consistency without umounting it and identifying files, not just bad blocks?
0
 
LVL 43

Accepted Solution

by:
ravenpl earned 125 total points
Comment Utility
You can run badblocks(RO mode) to get the list of bads - You already know.
No, there is no way(really) to match block to file (no such data in filesystem).
So what You can, is to get blocklist for given file.
So basically You can do it for each file/dir and check if one block is within bads set.
Unfortunatelly there is no tool for printing block list for given file.
filefrag -v # gives some information
http://lists.netisland.net/archives/phlpm/phlpm-2002/msg00354.html
is really for You, just modify the printed result
< $result .= $_ ? "*" : ".";
> $result .= "$_ ";
0
 
LVL 43

Expert Comment

by:ravenpl
Comment Utility
0
Save on storage to protect fatherhood memories

You're the dad who has everything. This Father's Day, make sure your family memories are protected. My Passport Ultra has automatic backup and password protection to keep your cherished photos and videos safe. With up to 3TB, you have plenty of room to hold the adventures ahead.

 

Author Comment

by:vinod
Comment Utility
fileblocks does not compile on my RHEL4 machine:

# gcc -o fileblocks fileblocks.c
/tmp/ccXzML8K.o(.text+0x121): In function `printOneFile':
: undefined reference to `_IO'
collect2: ld returned 1 exit status
0
 
LVL 43

Expert Comment

by:ravenpl
Comment Utility
my bad, download again.
0
 

Author Comment

by:vinod
Comment Utility
I think I will give my points to ravenpl for sending me useful utility although his solution won't be very practical for me. I was hoping that if a file gets corrupted due to a sector going bad randomly then OS should be able to catch it and I should get an alter rather than me skimming tens of tera byte myself. It seems the only way to do it will be to enable SMART which had its own problem when I tried it more than a year ago.
0
 
LVL 43

Expert Comment

by:ravenpl
Comment Utility
> sector going bad randomly then OS should be able to catch it
But when OS have the chance spotting it? Only if the file is read. Reading block is not enough, cause there's no map from block to file. There's only the reverse map. At least for ext3.
Maybe one could investigate other filesystems internals (jfs, xfs, reiser*, zfs, etc.) - maybe they'r smarter than that.
0
 
LVL 1

Expert Comment

by:Computer101
Comment Utility
Forced accept.

Computer101
EE Admin
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

Suggested Solutions

Ever notice how you can't use a new drive in Windows without having Windows assigning a Disk Signature?  Ever have a signature collision problem (especially with Virtual Machines?)  This article is intended to help you understand what's going on and…
Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now