Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

System crashes when running chkdsk on RAID 0

Posted on 2006-06-23
7
Medium Priority
?
2,997 Views
Last Modified: 2008-01-09
OK, I'm at my wit's end with this one.

I have a customer that has brought in a computer that we built for them (spiffy gaming machine).  He was running an ASUS A8N SLI Deluxe (nForce4 Sli chipset) on a RAID 0 (striped) with twin WD 250GB Sata hard disks.  

The problem is this:

Due to an unrelated issue (crappy video drivers that have since been replaced), windows was shut down improperly, naturally, chkdsk came up when the computer booted and attempted to scan as it typically does.  During the first phase (verifying files) the computer locks at 7%, then restarts a few seconds later.  It is ALWAYS 7%.  I can clear the dirty bit with no problem, and we have in the past when it was previously brought in for this same problem, but as soon as he needs to run a chkdsk or windows craps out (as it typically does), we'll be right back where we are now (as we have been four times now).

So far we have:

destroyed and rebuilt the raid array, with a format and reload
low leved formatted the drives using WD Data Lifeguard
COMPLETELY tested the drives using WD data Lifeguard, and Microscope
Replaced the drives
replaced the sata cables
replaced the mainboard
replaced the memory
replaced the video card
removed EVERYTHING non-vital to operation, with new memory and new sata cables
set the speed cap jumpers on the hard disks to cap speed at 150 mb/s
enabled spread-spectrum clocking on both drives
Spent 5 hours on the phone with ASUS, who sent us two replacement motherboards, both of which have exibited the same problem
Attempted to call Microsoft, only to have them demand money for support.
called western digital, who, so for, has not said anything useful
had several heart-to-hearts with google to find others who have had a similar problem.
ran out of ideas.


This customer is driving us absolutely insane.  We have had his computer for about three months, now, and he calls us multiple times a day (I probably would, too).  We can't just disable the raid controller, because he is adamant that he wants to keep his raid setup.  We can't put any more money into this computer, because factoring lost labor time, we have lost hundreds on this deal.  We stand behind our machines, but this one has us dumbfounded.  ASUS claimed they tested the raid capability and its ability to run a checkdisk, however, it still doesn't work.

Any help with this would be immensly appreaciated.

--[adam]

0
Comment
Question by:dapsychous
  • 2
  • 2
  • 2
  • +1
7 Comments
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 16974287
The answer to this is -- do NOT run RAID 0 on these drives.  RAID 0 is not fault tolerant, a simple CHKDSK can corrupt the entire array, and it is not worth losing your data for an outdated RAID concept that was not reliable.  Use RAID 1 in future, where you have some redundancy, and also do not run CHKDSK when you have any problems.  check disk can corrupt the system FAT faster than you can say it, and it is not a reliable tool to repair any disk problems.

Recomment you use RAID 1 with 2 disks, and you do not run checkdisk for the drives, let them run as is.
0
 
LVL 1

Author Comment

by:dapsychous
ID: 16975287
Well, that's all well and good, but my customer REQUIRES that this be a RAID 0.  We have tried to take him off of it, but he always pitches a fit.  Additionally, I'm not running chkdsk, it's autorunning at boot because of a flagged dirty bit.  Simply clearing the dirty bit won't help, because it just does it again next time an autocheck is scheduled due to a windows glitch or something.
0
 
LVL 1

Author Comment

by:dapsychous
ID: 16975292
plus, he doesn't have important data anyway, its his gaming machine.  He keeps his data on his other computer, so this isn't a data issue, it's a speed and whether-or-not-it-works issue.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 30

Expert Comment

by:pgm554
ID: 16976608
Have you tried a different machine(non ASUS) or different (non WD)drives?
Sounds like a bug in the firmware on the ASUS board.

If a dirty bit is being set,it sounds as if the file system is not being shutdown properly(which could be a driver and or firmware).
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 1000 total points
ID: 16976622
Some disks can be flakey depending upon controllers and chipsets.
I had some old WD 4 gib drives back in the 90's that didn't get recognized by certain controller chipsets.
WD does not test every MOBO and chipset for their drives,so try another brand of drive or disk controller.

Also maybe try turning off DMA or write caching on the controller and see what happens.
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 16977244
Educate him on the problems with RAID 0 or discontinue support, that is my only other suggestion.  Just because someone is ignorant = unaware of the problems with RAID 0 doesnt mean you can solve it.  To run RAID 0 for a gaming machine is just about as stupid as you can get -- a nerd with much reading, but little understanding, perhaps?
0
 
LVL 8

Expert Comment

by:Disorganise
ID: 16977868
Clear the dirty flag and rename the chkdsk executable so it can't run anymore :)

I saw this once years ago: NT4 on a compaq 1600 running RAID5.  same sort of thing, we ran a chkdsk over a weekend and came back to find it hadn't moved beyond 32%.  It was our PDC too (which we then moved of course).
I did a parallel install to it, and then backed up the entire volume (kinda like ghosting the OS without ghost).  restored it to an identical bit of kit in the lab and ran chkdsk (our plan was to swap hardware out to fix the issue).  Well I'd be darned if the clone didn't exhibit the exact same issue.
Unfortunatley we never did resolve the problem - even running from the parallel install would hang. However, we could assume the issue was related to software rather than hardware.

Are you backing up the customers machine and then restoring after re-jiggin the hardware etc?  If you are, maybe try a fresh install and NO restore.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
The business world is becoming increasingly integrated with tech. It’s not just for a select few anymore — but what about if you have a small business? It may be easier than you think to integrate technology into your small business, and it’s likely…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question