Solved

System crashes when running chkdsk on RAID 0

Posted on 2006-06-23
7
2,987 Views
Last Modified: 2008-01-09
OK, I'm at my wit's end with this one.

I have a customer that has brought in a computer that we built for them (spiffy gaming machine).  He was running an ASUS A8N SLI Deluxe (nForce4 Sli chipset) on a RAID 0 (striped) with twin WD 250GB Sata hard disks.  

The problem is this:

Due to an unrelated issue (crappy video drivers that have since been replaced), windows was shut down improperly, naturally, chkdsk came up when the computer booted and attempted to scan as it typically does.  During the first phase (verifying files) the computer locks at 7%, then restarts a few seconds later.  It is ALWAYS 7%.  I can clear the dirty bit with no problem, and we have in the past when it was previously brought in for this same problem, but as soon as he needs to run a chkdsk or windows craps out (as it typically does), we'll be right back where we are now (as we have been four times now).

So far we have:

destroyed and rebuilt the raid array, with a format and reload
low leved formatted the drives using WD Data Lifeguard
COMPLETELY tested the drives using WD data Lifeguard, and Microscope
Replaced the drives
replaced the sata cables
replaced the mainboard
replaced the memory
replaced the video card
removed EVERYTHING non-vital to operation, with new memory and new sata cables
set the speed cap jumpers on the hard disks to cap speed at 150 mb/s
enabled spread-spectrum clocking on both drives
Spent 5 hours on the phone with ASUS, who sent us two replacement motherboards, both of which have exibited the same problem
Attempted to call Microsoft, only to have them demand money for support.
called western digital, who, so for, has not said anything useful
had several heart-to-hearts with google to find others who have had a similar problem.
ran out of ideas.


This customer is driving us absolutely insane.  We have had his computer for about three months, now, and he calls us multiple times a day (I probably would, too).  We can't just disable the raid controller, because he is adamant that he wants to keep his raid setup.  We can't put any more money into this computer, because factoring lost labor time, we have lost hundreds on this deal.  We stand behind our machines, but this one has us dumbfounded.  ASUS claimed they tested the raid capability and its ability to run a checkdisk, however, it still doesn't work.

Any help with this would be immensly appreaciated.

--[adam]

0
Comment
Question by:dapsychous
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 44

Expert Comment

by:scrathcyboy
Comment Utility
The answer to this is -- do NOT run RAID 0 on these drives.  RAID 0 is not fault tolerant, a simple CHKDSK can corrupt the entire array, and it is not worth losing your data for an outdated RAID concept that was not reliable.  Use RAID 1 in future, where you have some redundancy, and also do not run CHKDSK when you have any problems.  check disk can corrupt the system FAT faster than you can say it, and it is not a reliable tool to repair any disk problems.

Recomment you use RAID 1 with 2 disks, and you do not run checkdisk for the drives, let them run as is.
0
 
LVL 1

Author Comment

by:dapsychous
Comment Utility
Well, that's all well and good, but my customer REQUIRES that this be a RAID 0.  We have tried to take him off of it, but he always pitches a fit.  Additionally, I'm not running chkdsk, it's autorunning at boot because of a flagged dirty bit.  Simply clearing the dirty bit won't help, because it just does it again next time an autocheck is scheduled due to a windows glitch or something.
0
 
LVL 1

Author Comment

by:dapsychous
Comment Utility
plus, he doesn't have important data anyway, its his gaming machine.  He keeps his data on his other computer, so this isn't a data issue, it's a speed and whether-or-not-it-works issue.
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Have you tried a different machine(non ASUS) or different (non WD)drives?
Sounds like a bug in the firmware on the ASUS board.

If a dirty bit is being set,it sounds as if the file system is not being shutdown properly(which could be a driver and or firmware).
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 500 total points
Comment Utility
Some disks can be flakey depending upon controllers and chipsets.
I had some old WD 4 gib drives back in the 90's that didn't get recognized by certain controller chipsets.
WD does not test every MOBO and chipset for their drives,so try another brand of drive or disk controller.

Also maybe try turning off DMA or write caching on the controller and see what happens.
0
 
LVL 44

Expert Comment

by:scrathcyboy
Comment Utility
Educate him on the problems with RAID 0 or discontinue support, that is my only other suggestion.  Just because someone is ignorant = unaware of the problems with RAID 0 doesnt mean you can solve it.  To run RAID 0 for a gaming machine is just about as stupid as you can get -- a nerd with much reading, but little understanding, perhaps?
0
 
LVL 8

Expert Comment

by:Disorganise
Comment Utility
Clear the dirty flag and rename the chkdsk executable so it can't run anymore :)

I saw this once years ago: NT4 on a compaq 1600 running RAID5.  same sort of thing, we ran a chkdsk over a weekend and came back to find it hadn't moved beyond 32%.  It was our PDC too (which we then moved of course).
I did a parallel install to it, and then backed up the entire volume (kinda like ghosting the OS without ghost).  restored it to an identical bit of kit in the lab and ran chkdsk (our plan was to swap hardware out to fix the issue).  Well I'd be darned if the clone didn't exhibit the exact same issue.
Unfortunatley we never did resolve the problem - even running from the parallel install would hang. However, we could assume the issue was related to software rather than hardware.

Are you backing up the customers machine and then restoring after re-jiggin the hardware etc?  If you are, maybe try a fresh install and NO restore.
0

Featured Post

Save on storage to protect fatherhood memories

You're the dad who has everything. This Father's Day, make sure your family memories are protected. My Passport Ultra has automatic backup and password protection to keep your cherished photos and videos safe. With up to 3TB, you have plenty of room to hold the adventures ahead.

Join & Write a Comment

If you have a USB Drive that is not recognized by Windows the problem is usually that you have too many network drives or other drives that occupy all the drive letters D: E: or F: which is the normal drive letter of a usb drive. The way to correct …
In this article, I provide some information on storage disks which go into calculations that will help you figure out how much Input/output Operations Per Second (IOPS) your disk subsystem can deliver. To effectively size & tune up applications l…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now