Proactive monitoring of ext3 file system

Posted on 2007-11-24
Last Modified: 2008-02-01
I am admnistering an RHEL4 based linux cluster having 100 nodes and about a dozen NFS filservers (also RHEL4 boxes) with a total capacity of a few tens of TB. As usual, a few blocks keep going bad every now and then. What do I need to proactively monitor the storage system so that I could try recovery before it is too late and advise the right user, identifying files that might have been affected?
Question by:vinod
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
LVL 43

Expert Comment

ID: 20343738
> so that I could try recovery before it is too
What You mean - predicting the failure? Some "predictions" can be gathered from
smartctl -A /dev/sda

Author Comment

ID: 20345022
Our experience with SMART monitoring has not been very good. I noticed significant increase in file system corruptions while SMART service was enabled and we had to turn it off about a year ago. I could never understand how only a "monitoring" service could cause corruption. Even if it works fine, I believe SMART can give you only statistics of error counters which does not help in identifying the affected file(s).

My concern is slightly different. Disk blocks may go bad randomly. How will I find it unless user uses the file or I run fsck, both of which may not happen for a long time? Even at that time, the block may or may not be in correctable state. Is there something that I can do frequently, say every night, to check the file system consistency without umounting it and identifying files, not just bad blocks?
LVL 43

Accepted Solution

ravenpl earned 125 total points
ID: 20345089
You can run badblocks(RO mode) to get the list of bads - You already know.
No, there is no way(really) to match block to file (no such data in filesystem).
So what You can, is to get blocklist for given file.
So basically You can do it for each file/dir and check if one block is within bads set.
Unfortunatelly there is no tool for printing block list for given file.
filefrag -v # gives some information
is really for You, just modify the printed result
< $result .= $_ ? "*" : ".";
> $result .= "$_ ";
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 43

Expert Comment

ID: 20345671

Author Comment

ID: 20363287
fileblocks does not compile on my RHEL4 machine:

# gcc -o fileblocks fileblocks.c
/tmp/ccXzML8K.o(.text+0x121): In function `printOneFile':
: undefined reference to `_IO'
collect2: ld returned 1 exit status
LVL 43

Expert Comment

ID: 20366184
my bad, download again.

Author Comment

ID: 20496578
I think I will give my points to ravenpl for sending me useful utility although his solution won't be very practical for me. I was hoping that if a file gets corrupted due to a sector going bad randomly then OS should be able to catch it and I should get an alter rather than me skimming tens of tera byte myself. It seems the only way to do it will be to enable SMART which had its own problem when I tried it more than a year ago.
LVL 43

Expert Comment

ID: 20498148
> sector going bad randomly then OS should be able to catch it
But when OS have the chance spotting it? Only if the file is read. Reading block is not enough, cause there's no map from block to file. There's only the reverse map. At least for ext3.
Maybe one could investigate other filesystems internals (jfs, xfs, reiser*, zfs, etc.) - maybe they'r smarter than that.

Expert Comment

ID: 20545852
Forced accept.

EE Admin

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
Suggested Courses

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question