Solved

Proactive monitoring of ext3 file system

Posted on 2007-11-24
11
346 Views
Last Modified: 2008-02-01
I am admnistering an RHEL4 based linux cluster having 100 nodes and about a dozen NFS filservers (also RHEL4 boxes) with a total capacity of a few tens of TB. As usual, a few blocks keep going bad every now and then. What do I need to proactively monitor the storage system so that I could try recovery before it is too late and advise the right user, identifying files that might have been affected?
0
Comment
Question by:vinod
  • 5
  • 3
11 Comments
 
LVL 43

Expert Comment

by:ravenpl
ID: 20343738
> so that I could try recovery before it is too
What You mean - predicting the failure? Some "predictions" can be gathered from http://en.wikipedia.org/wiki/S.M.A.R.T.
smartctl -A /dev/sda
0
 

Author Comment

by:vinod
ID: 20345022
Our experience with SMART monitoring has not been very good. I noticed significant increase in file system corruptions while SMART service was enabled and we had to turn it off about a year ago. I could never understand how only a "monitoring" service could cause corruption. Even if it works fine, I believe SMART can give you only statistics of error counters which does not help in identifying the affected file(s).

My concern is slightly different. Disk blocks may go bad randomly. How will I find it unless user uses the file or I run fsck, both of which may not happen for a long time? Even at that time, the block may or may not be in correctable state. Is there something that I can do frequently, say every night, to check the file system consistency without umounting it and identifying files, not just bad blocks?
0
 
LVL 43

Accepted Solution

by:
ravenpl earned 125 total points
ID: 20345089
You can run badblocks(RO mode) to get the list of bads - You already know.
No, there is no way(really) to match block to file (no such data in filesystem).
So what You can, is to get blocklist for given file.
So basically You can do it for each file/dir and check if one block is within bads set.
Unfortunatelly there is no tool for printing block list for given file.
filefrag -v # gives some information
http://lists.netisland.net/archives/phlpm/phlpm-2002/msg00354.html
is really for You, just modify the printed result
< $result .= $_ ? "*" : ".";
> $result .= "$_ ";
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 43

Expert Comment

by:ravenpl
ID: 20345671
0
 

Author Comment

by:vinod
ID: 20363287
fileblocks does not compile on my RHEL4 machine:

# gcc -o fileblocks fileblocks.c
/tmp/ccXzML8K.o(.text+0x121): In function `printOneFile':
: undefined reference to `_IO'
collect2: ld returned 1 exit status
0
 
LVL 43

Expert Comment

by:ravenpl
ID: 20366184
my bad, download again.
0
 

Author Comment

by:vinod
ID: 20496578
I think I will give my points to ravenpl for sending me useful utility although his solution won't be very practical for me. I was hoping that if a file gets corrupted due to a sector going bad randomly then OS should be able to catch it and I should get an alter rather than me skimming tens of tera byte myself. It seems the only way to do it will be to enable SMART which had its own problem when I tried it more than a year ago.
0
 
LVL 43

Expert Comment

by:ravenpl
ID: 20498148
> sector going bad randomly then OS should be able to catch it
But when OS have the chance spotting it? Only if the file is read. Reading block is not enough, cause there's no map from block to file. There's only the reverse map. At least for ext3.
Maybe one could investigate other filesystems internals (jfs, xfs, reiser*, zfs, etc.) - maybe they'r smarter than that.
0
 
LVL 1

Expert Comment

by:Computer101
ID: 20545852
Forced accept.

Computer101
EE Admin
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question