Is it advisable/safe to run CHKDSK against a RAID5?

We've got a Windows Server 2003 box that had a drive go bad in the RAID.  We've replaced the drive, and the RAID appears to be 100% back to normal.  Our manufacturer-supplied RAID tools confirm the health of the array at this point.  

The problem is that while the drive was bad, Windows started wanting to run CHKDSK against the volume at boot, and it still does.

I've heard conflicting information as to whether CHKDSK /F is something safe or advisable to run in this situation.  We're not seeing any indications of problems with the file system other than the OS wanting to run CHKDSK.

Is this something advisable?  Is there potential for unintended data loss?  And will it even work?

Thanks.
jeffrey615Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DavidPresidentCommented:
Absolutely run Chkdsk!

Chkdsk fixes filesystem corruption.  After the XOR parity is checked/rebuilt, then chkdsk is the next thing you are supposed to do.  If the RAID5 was offline then it would be a different story
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DavidPresidentCommented:
Think of it this way, once the RAID is online & healthy, then it appears to the O/S as a single logical disk drive.  Of course it is OK to run on a logical disk drive.   The confusion comes from running chkdsk when you have a degraded array, unreadable blocks, the RAID was reassembled out of order .. in other words, you can do damage if the RAID is in stress.
0
JeffBethCommented:
CHKDSK(or FSCK for the UNIX peeps) will certainly not hurt the system.   CHKDSK/FSCK are typically only forced to run when either:

1) the file system has not been chk'd in a certain amount of days

or more likely...

2) the file system was not unmounted cleanly(a bit is set in the FS and flipped when unmounted to warn the OS if the FS was unmoutned correclty or not).  this is typical after a crash.

In most cases, the FS is not aware of the disk change out... and the disk rebuilt from its RAID 5 parity.

To answer your question though.. yes, you should be more than fine to run the chkdsk.

 
0
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

gmsolutionsCommented:
chkdisk is unable to recover raid data. in most cases you will perminantly destroy your data using chkdisk on any of the volumes. it is not raid aware so dont run it.
0
snafumasterDirector of Information TechnologyCommented:
My experience and from what I have been told a read say that CHKDSK will likely not help and may very well cause damage to the data or the array.
What kind of RAID is it?  If it is a mirror, maybe break the raid and run the CHKDSK on the single drive?
0
jeffrey615Author Commented:
RAID 5 arcoss 8 drives
0
snafumasterDirector of Information TechnologyCommented:
I'd personally look into cancelling the CHKDSK.  Here's a link that shows how it can be done...
http://www.raymond.cc/blog/archives/2008/02/23/disable-or-stop-auto-chkdsk-during-windows-startup/
0
DavidPresidentCommented:
I've been a storage architect for 20+ years.  I assure you, it is appropriate and correct to clean a file system on a healthy array.  That goes for CHKDSK, and fsck (if UNIX/LINUX), volchk, and every other file system correction utilities.

0
DavidPresidentCommented:
CHKDSK has no knowledge of the physical layout of the device it is using.   In addition the RAID subsystem has no knowledge of the filesystem or operating system used. (There are some exceptions).

"Moving any of the pieces around on any one of the drives will have disastrous consequences for the data set stored collectively on an array of multiple individual disk drives."

If above were true, then would this not be the case for running ANY program?  Certainly the RAID has no knowledge whether the user is running CHKDSK, copying files, defrag, or just optimizing a database.

As stated by the author, the RAID is online, healthy, and consistent. The file system needs to be repaired, and chkdsk does just that.   As for the myth of the dangers of defragmentation, one merely has to model a small program to prove that it is beneficial.  A fragmented file or filesystem requires more I/O requests of shorter lengths to read files. This is true regardless of whether you have RAID1, a non-RAID disk, or RAID6 at the back-end.  

"do NOT allow the RAID controller to overwrite data on any striped RAID array"..
If above was true, then this means you can't ever write to a disk drive.  The minute you write anything, even to a freshly initialized HDD, then you are overwriting zeros to ones.  




0
DavidPresidentCommented:
Finally, "constraints for safety of data lying in striped drive arrays also precludes use of defragmentation tools",

This is also just as absurd. The RAID can not differentiate between somebody fragmenting a file or filesystem, or defragmenting it.  Defragmentation is effectively reading from offset X, and writing to offset Y, then doing some housekeeping.  So if this statement was true, then it could only mean that the controller is OK when an application fragments files by reading from X and writing to Y ... but God forbid, you read from Y and write to X ?!!
0
snafumasterDirector of Information TechnologyCommented:
great points dlethe...I think I learned something from your posts.  I know I sure as heck haven't been doing this since the early nineties...lol.
I must say though, even with your great explanations, this seems to be a debate that goes on.  Although, I think you've converted me.  I guess I personally will have to wait till my next RAID failure on a machine which the data do not matter (of course why would it be RAID if that weere the case) to give it a try.
0
DavidPresidentCommented:
Great. That is what EE is all about .. learning.  One thing, there are certainly times when chkdsk & defrag should never be run, and that is when the RAID has just been degraded, prior to any recovery techniques.   Nothing inherently wrong specific to chkdsk/defrag in this situation, as they are both just mechanisms to read/write data.  Any I/O should be avoided for the same reason.  

That is probably how these urban myths get started, somebody hears you should not run chkdsk, and the part where the statement is qualified to a freshly degraded RAID gets dropped.

I'll probably write a paper on it. I see this come up often enough that it is worth debunking once and for all, at least on EE.
0
snafumasterDirector of Information TechnologyCommented:
EE MythBusters!  You can be Jamie!  Thanks again for the info.
0
jeffrey615Author Commented:
So I have a little update/wrinkle that changes the game slightly.  Our Event Log is still showing the ntfs error it was before we swapped out the drive our RAID was reporting had problems.  This is the error we're receiving:

Event Type: Error
Event Source: Ntfs
Event Category: Disk
Event ID: 55
Date: 04/21/2010 Time: 02:36:39
User: N/A
Computer: <ComputerName>
Description:
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume E:.

Is it likely that the RAID is still degraded and the error stems from that, or is it just that we need to run that CHKDSK?  We obviously don't want to run it against a degraded RAID, but at this point it's getting to be a bit hard to tell where the problem really lies.
0
DavidPresidentCommented:
You posted that the RAID was healthy. To verify, just run the RAID consistency check to confirm.   You have 2 possible types of corruption.  RAID corruption, I'll call it, that deals with XOR parity, unreadable blocks, dead disk(s), etc ... and filesystem corruption.

Take the RAID out of the equation once you confirm it is healthy, and treat the problem exactly as if you had just a single disk drive.  A RAID-5 array will not protect you against filesystem corruption.  Windows will gladly lock up, bluescreen, have programs crash, application bugs, regardless of the physical disk layout.

Traditionally, the main way you can get massive data loss with a RAID that passes the parity / consistency check is if the RAID subsystem was reassembled out out order.  to make it easy, say you had a 3-disk RAID5, disks A,B,C.       IF it is reconstructed so that disks now appear in CBA format, then it will pass every RAID diagnostic there is, because the parity is correct.  The order of the disks makes a difference to the logical file system.  (There are other things, like getting stripe size as a multiple of what it was supposed to be, but won't go down there, as I doubt your vendor would screw that up)

Now unless  your RAID lost metadata and they had to humpty dumpty it, then you have filesystem corruption.

The safest thing to do, is run chkdsk without the /F just to see where things stand.  If it shows a large number of errors, use a commercial product that is better than chkdsk, like runtime.org NTFS reconstructor.  This will run for possibly days, and make no assumptions about the data, and will recover deleted files as well.

0
brycenCommented:
The accepted answer is bogus (e.g. wrong).
0
DavidPresidentCommented:
To clarify, my objection was that the selected answer was wrong for all of the reasons I explained in this thread, and one can easily verify that vendors such as EMC, NetApp, HP, and so on clearly state in their documentation that chkdsk is an appropriate action to run (as well as defrag!)
0
DavidPresidentCommented:
Well, you know what they say .. if it is on the internet, it must be true :)
sheesh - eliminate that answer because it was flat out wrong.  I say award  points to #31324557 & #31345527
0
DavidPresidentCommented:
Thanks ... now maybe this myth that chkdsk/defrag will mess up RAID-protected logical drives will get one more nail in the coffin.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.