Windows replaced bad clusters in file xx on a HP scsi raid 5 array, how to identify defective drive?

Checking file system on C:
The type of the file system is NTFS.

A disk check has been scheduled.
Windows will now check the disk.
Cleaning up minor inconsistencies on the drive.
Cleaning up 57 unused index entries from index $SII of file 0x9.
Cleaning up 57 unused index entries from index $SDH of file 0x9.
Cleaning up 57 unused security descriptors.
CHKDSK is verifying Usn Journal...
Usn Journal verification completed.
CHKDSK is verifying file data (stage 4 of 5)...
Windows replaced bad clusters in file 87
of name \mssql\MSSQL$~1\Data\DISTRI~1.MDF.
Windows replaced bad clusters in file 7220
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\201004~1\TB5CD1~1.BCP.
Windows replaced bad clusters in file 26077
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\201004~1\TBLPDF~1.BCP.
Windows replaced bad clusters in file 32542
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\201003~1\TB5CD1~1.BCP.
Windows replaced bad clusters in file 34123
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\200802~1\TB50D9~1.BCP.
Windows replaced bad clusters in file 59114
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\200904~1\TB4CD1~1.BCP.
Windows replaced bad clusters in file 66747
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\200904~1\TBLPDF~1.BCP.
Windows replaced bad clusters in file 306249
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\200608~1\TB50D9~1.BCP.
Windows replaced bad clusters in file 313926
of name \mssql\MSSQL$~1\REPLDATA\unc\INSIGH~1\200608~2\TB50D9~1.BCP.
File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
Free space verification is complete.
The size specified for the log file is too small.

213371743 KB total disk space.
137811912 KB in 82347 files.
42892 KB in 6088 indexes.
0 KB in bad sectors.
962587 KB in use by the system.
23040 KB occupied by the log file.
74554352 KB available on disk.

4096 bytes in each allocation unit.
53342935 total allocation units on disk.
18638588 allocation units available on disk.



Windows has finished checking your disk.
Please wait while your computer restarts.


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

~~~~

this is on my domain controller, this is a HP raid5 array consisting of 4 72gb scsi disks. how can you get bad clusters on a raided drive? how can I know which physical drive is failing?

did I actually lose any data/get any data corruption?

I have backups of course, the problem if its hardware failure, and I am going to do migration to windows 2008 r2 from windows 2003, it will still take sometime to initiate things, buying a single replacement scsi might be viable but if I can't identify the drive and have to get 4 scsi drives and rebuild the array 1 disk at the time, it would be problematic not to mention prone to disaster.

Just to add this is a HP Proliant ML 350 G3, around 4+ years old.
I would also like to know what kind of remedial/repair action I can take
a) get 4 x 72gb scsi and rebuild the array disc by disc. (which I suspect is prone fo failure
b) mirror/ghost the drive using (recommend? I'm thinking macrium reflect) then plug in the new 4x 72gb scsi or 2x 300gb raid 0 and do a bare metal restore?
c) do nothing as the error appears to be logical rather than physical?

I am planning a infrastructure refresh to win2008 r2+sql2008 from the win2003 + sql2000 as well, so I think keeping costs down for the repair is probably best.

chrisloupAsked:
Who is Participating?
 
DavidPresidentCommented:
First, this is filesystem corruption, not RAID corruption. Treat this problem as if you had a single HDD, and ignore the RAID entirely.  

The way you test for RAID "corruption" is by checking the XOR/parity.  The controller has the ability to to do a consistency check, which goes through each block on each drive and makes sure that the XOR is correct.  In the process it repairs any unreadable blocks on any physical disks by recalculating what is supposed to be there, via the redundant data, and repair it.

a) One does not rebuild the array disk-by-disk.  Bad, awful idea.  Lose a disk during the rebuild, and you have 100% data loss.  Get an unreadable block during the process, you have partial loss. The correct way to replace the  disks is to backup, replace all disks, initialize the new array, then restore.

b) Do not use a 2x300 RAID0 in interim, unless you have the ability to predict the future and know that neither 300GB disk will fail or pick  up a bad block.  Use RAID1

c) It is logical, but that does not mean that there is not a physical cause.   Do you have a UPS with battery backup?  Do you perform proper shutdowns?  Try disabling the RAID write cache if it is enabled.  This will cause performance hit, but it insures data is written to disk drives on every I/O.     Read event logs.  Check for memory problems.  Use ECC memory if  you are not

But bottom line, ignore the RAID controller for purposes of correcting the situation.
0
 
chrisloupAuthor Commented:
yes, I have confirmed it is a ntfs error due to improper shutdowns (cos the whole computer hanged/stalled due to some issues with a usb drive )

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.