Link to home
Start Free TrialLog in
Avatar of loshdog
loshdog

asked on

Critical Impact Alert <Bad Block on HD>

Hello and thank you for your time...

One of my clients is running a MS server 2003 sbs sp2. Our monitoring agents has reported:
"bad block on the Hard Drive".
My question is what is the best way to run a disk repair utility?
Should I run it from MS-Dos? (if so what command should i use /c /r for check and repair?
Or should i run it form within windows or some other way?
Avatar of buckobilly
buckobilly

I've always ran them through windows check disk and have never had a problem.

I check the fix errors box and let things go.  I would also perform a full backup prior, use a product like ghost or something like that.
Avatar of Dr. Klahn
Do not try to repair a drive that is throwing bad blocks if it is being used in a server.  Replace it.  When a drive starts showing bad blocks, it means that it is unable to revector the block in question, and deterioration is well underway.
Avatar of loshdog

ASKER

Hello and thank you...

They have a raid 5 configuration using 4 drives. There are two partitions C; for OS and D: for storage.
How would i determine which drive is going bad?
The reporting tool should be able to tell you what channel the bad drive is on, and I agree with the above poster. If the drive is starting to show bad blocks, replace it as it will fail soon even if you bypass those bad blacks now.
Avatar of loshdog

ASKER

Where would i find the reporting tool. I check Event Viewer but did not mention anything about HD bad sectors.
What exactly was used to get this from your original post?

Our monitoring agents has reported:
"bad block on the Hard Drive".

Avatar of loshdog

ASKER

We have a SAAZ agent installed on the server which monitors all different aspects. It reports to our Network Operation Center. The center then forwards the message or ticket to a tech or owner.

Hope this answers your question...
Can you attach the full message?

It should say something like Hard Drive 0, channel 1... or at worst can we get a screen cap from the app?
Avatar of loshdog

ASKER

Thank you.. Screen Shot below...


Untitled-1.jpg
ASKER CERTIFIED SOLUTION
Avatar of athomsfere
athomsfere
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of loshdog

ASKER

Wow... Thank you all for your input.

Last time i booted into the controller card i remember seeing a SMART tab.. I will check that first and see if it provides me w/ any useful info. If that does not work..
I will go to DELL website and see what type of controller card it has and see if it's hot swappable. After gaining that info. I will take the manly man way, remove the driver and connected to a workstation and scan it that way.... This will be time consuming....

Once again thank you all...
For future reference I mentioned the Speedfan because you to check the SMART without real down time, and sometimes that matters more then not installing little freeware apps. It does allow you to check each drive as well.
I'd be MOST interested in what the RAID controller status says about the drives.  There's also very likely a health monitoring cleanup process you can invoke.

s.m.a.r.t. is ok, but it's not a perfect way to diagnose.  For example, recent QnA HDD Sentinel says drive is poor, but s.m.a.r.t. reader shows "ok"  https://www.experts-exchange.com/questions/26407063/differences-between-HDD-Sentinel-and-S-M-A-R-T-readings.html?anchorAnswerId=33457490#a33457490
 http://www.passmark.com/forum/showthread.php?t=1723

Basically different RAID cards handle bad sectors differently, to say nothing of the myriad of RAID interpretations and implementations.  Some enterprise-class ECC systems WILLl automatically badtrack, that is automatically set aside bad sectors and remap them to "spare" sectors on-the-fly, whereas some RAID cards just leave the bad sector and read the data from the redundant sister/twin/brother, whether that's the mirror or parity or dual parity.  The problem with the latter is those spots are half at-risk, in that the same sector goes bad on a sister, and unless it's "dual-parity" you're in trouble (that's how dual-parity came to be "invented").   It's not great that it might NOT "do" anything about the bad sector on a drive UNTIL THAT IS you run the occaisional housekeeping that you're supposed to do with the RAID controller utility (which, we'll assume in this case, has not been done for some indeteminate length of time).   This might be your situation, and raid is warning there's quite a few unhandled bad sectors that need dealing with.

Do NOT, do NOT handle bad sectors on the drives "directly individually" by connecting them elsewhere and doing low-level bad-sectoring/badtracking.  Only use the RAID controller utilities for that.  Otherwise, you can completely corrupt your stripe/mirror/parity so unless you want to rebuild your raid arrays from scratch and recover from last known good backup, so don't.

See, many RAID cards do NOT handle bad sectors the same as single drive operation (that you'd see measured on each drive's s.m.a.r.t.) in another surprising way.  Briefly put, what many RAID cards do when "preparing" the drives is set aside more space than the normal "spare sectors" for badsectoring, because, when there's a bad sector on one spot one of the drives in a "set", EITHER automatically or else as part of the raid "cleanup" maintenance process discusssed, many RAIDs set aside that sector on ALL the drives in the set if so much as one of them has it bad, the reason being, then, it can keep ALL the drives sectoring/blocking mirror/stripe/parity TABLES "In SYNC" so-to-speak, marching in lock-step, rather than dealing with odd exceptions for each drive individually.  But the result of that is, whether it's two, or three, or four, five or six drives in a set, a bad sector on one is "set aside" on them all, so the spare area has to be much bigger because it's going to get used up much quicker.  Thus, A RAID controller might be raising errors becaise the "volume" is almost out of spare sectors, and yet individually connected directly the drives seem fine.

Would it be better if bad sectors were relocated to spares on an individual disk basis? Yep.  Some do, but many DON'T preferring to simplify constant operation and performance versus the resulting "different behaviour" that seems like a complication but it's not and yes is slightly less efficient but these are after all supposed to be "redundant array of inexpensive disks" (or independant, both meanings are used)

Also, just hot-swapping drives, does NOT "reclaim" what had been bad sectors on evey disk in the set and somehow make them good sectors again, not once they've been set aside by the RAID.  So you could put in a perfectly good disk with no badsectors and as soon as RAID has reconstructed the set, for all intents and purposes the drive already has a whackload of bad sectors set aside.  On the other hand a hot-swap will help IF and only IF the bad sectors haven't been set aside yet (automatically, it'll be too late, manually, okay you may not have done it yet) BUT ONLY IF you happen to be swapping out the drive with the bad sectors in question.  BUT, BUT lets say sector 1234567 is good on the drive you exchange out, and bad on one of the sisters, well then the raid cannot/may not be able to reconstruct the data on sector 1234567 because on it's twin that's a bad sector that is unreadable and hasn't been remapped yet, and you just pulled the drive that had the other copy of the data for 1234567.  Mind you, if there is parity or dual-parity then the reconstruction should nevertheless be successful thanks to reverse engineering 1234567 from the extra redundancy.
So you see ultimately, the maintenance should be done (unless it's automatic).  It's definitely best to read the status conditions before any actions, and it's typically more preferable to do the maintenance before a swap out rather than not.
And ultimately, at some point, the array may have to be rebuilt because no amount of hot-swapping is going to make the spares area bigger if it's almost full.  (unless your RAID will do that maybe if you "extend" the volume by replacing with larger drives)