Proactive monitoring of ext3 file system

Posted on 2007-11-24
Last Modified: 2008-02-01
I am admnistering an RHEL4 based linux cluster having 100 nodes and about a dozen NFS filservers (also RHEL4 boxes) with a total capacity of a few tens of TB. As usual, a few blocks keep going bad every now and then. What do I need to proactively monitor the storage system so that I could try recovery before it is too late and advise the right user, identifying files that might have been affected?
Question by:vinod
  • 5
  • 3
LVL 43

Expert Comment

ID: 20343738
> so that I could try recovery before it is too
What You mean - predicting the failure? Some "predictions" can be gathered from
smartctl -A /dev/sda

Author Comment

ID: 20345022
Our experience with SMART monitoring has not been very good. I noticed significant increase in file system corruptions while SMART service was enabled and we had to turn it off about a year ago. I could never understand how only a "monitoring" service could cause corruption. Even if it works fine, I believe SMART can give you only statistics of error counters which does not help in identifying the affected file(s).

My concern is slightly different. Disk blocks may go bad randomly. How will I find it unless user uses the file or I run fsck, both of which may not happen for a long time? Even at that time, the block may or may not be in correctable state. Is there something that I can do frequently, say every night, to check the file system consistency without umounting it and identifying files, not just bad blocks?
LVL 43

Accepted Solution

ravenpl earned 125 total points
ID: 20345089
You can run badblocks(RO mode) to get the list of bads - You already know.
No, there is no way(really) to match block to file (no such data in filesystem).
So what You can, is to get blocklist for given file.
So basically You can do it for each file/dir and check if one block is within bads set.
Unfortunatelly there is no tool for printing block list for given file.
filefrag -v # gives some information
is really for You, just modify the printed result
< $result .= $_ ? "*" : ".";
> $result .= "$_ ";
LVL 43

Expert Comment

ID: 20345671
New! My Passport Wireless Pro Wi-Fi Mobile Storage

Portable wireless storage to offload, edit, and stream anywhere.

High-capacity, wireless mobile storage designed to accompany professional photographers and videographers in the field to easily offload, edit and stream captured photos and high-definition videos.


Author Comment

ID: 20363287
fileblocks does not compile on my RHEL4 machine:

# gcc -o fileblocks fileblocks.c
/tmp/ccXzML8K.o(.text+0x121): In function `printOneFile':
: undefined reference to `_IO'
collect2: ld returned 1 exit status
LVL 43

Expert Comment

ID: 20366184
my bad, download again.

Author Comment

ID: 20496578
I think I will give my points to ravenpl for sending me useful utility although his solution won't be very practical for me. I was hoping that if a file gets corrupted due to a sector going bad randomly then OS should be able to catch it and I should get an alter rather than me skimming tens of tera byte myself. It seems the only way to do it will be to enable SMART which had its own problem when I tried it more than a year ago.
LVL 43

Expert Comment

ID: 20498148
> sector going bad randomly then OS should be able to catch it
But when OS have the chance spotting it? Only if the file is read. Reading block is not enough, cause there's no map from block to file. There's only the reverse map. At least for ext3.
Maybe one could investigate other filesystems internals (jfs, xfs, reiser*, zfs, etc.) - maybe they'r smarter than that.

Expert Comment

ID: 20545852
Forced accept.

EE Admin

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Having issues meeting security compliance criteria because of those pesky USB drives? Then I can help you! This article will explain how to disable USB Mass Storage devices in Windows Server 2008 R2.
Lets start to have a small explanation what is VAAI(vStorage API for Array Integration ) and what are the benefits using it. VAAI is an API framework in VMware that enable some Storage tasks. It first presented in ESXi 4.1, but only after 5.x sup…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now