Computer intermittently losing hard drives

Hello, Experts.

My computer has been intermittently losing 1 or 2 hard drives ever since I built it.  It is a home built computer running Windows XP with the relevant hardware being an Antec 450W power supply, Intel D945Pvs system board (82801 GR I/O controller hub with ICH7-R) and 3 Maxtor 250GB SATA hard drives of various different models including 6L250S0, 6V250F0, 7V250F0, 6L25020 with most of the space in a RAID-5 configuration.  About every 2-6 weeks 1 or 2 drives in the RAID fail.  

When just 1 drive goes down, the computer will reboot (I actually have an 8GB RAID-0 partition divided equally between the 3 drives for my swap file, and the remaining space on the 3 drives is used in a RAID-5 configuration) because the swap file is no longer available to read or write to.  Sometimes it will run just long enough to tell me that a drive from the RAID set is missing, and give me some errors about the swap file.  After the reboot, and during the POST, the hardware RAID shows 1 drive missing, and then the computer comes back in Windows with the swap file disabled and it runs in a degraded state.  Restarting the computer does not solve the problem, but shutting the computer all the way down and powering it back up allows the drive to be seen, and it gets rebuilt and works again for a few weeks.  

When 2 drives go down they appear to go down at the exact same time.  The computer reboots and during POST the RAID shows 2 drives missing.  Pressing the RESET button or warm booting do not solve the problem.  A power cycle does let both drives to be seen, and I have to go into the RAID configuration and tell it to recover the volumes.  Then it then boots to Windows and rebuilds the RAID set.  After this happened about the 3rd time, I started taking notes.  

Sometimes it happens when I am using the computer and sometimes it happens when I am not at my computer.
The problem does not follow any particular drive or drives.
The problem does not follow any particular SATA port or ports.

The first thing I did was upgrade the BIOS to the latest version. Did not fix the problem.
Then I replaced the SATA cables. Did not fix the problem.
Then I replaced the drive that was failing the most with a brand new drive (still Maxtor, but different model). Did not fix the problem.
Then I replaced the system board with a brand new Intel D945Pvs, and replaced the SATA cables again at the same time. Did not fix the problem.
Then I replaced all 3 drives at once. Did not fix the problem.
Then it happened to the same 2 drives twice in a row, and those drives happened to be on the same SATA power cable coming from the power supply.  So I swapped power connectors on the drives, but it happened again on one of the same drives on the new power connector.
The hottest spot on the external of the hottest drive in the cage is 33 degrees Celsius (I can't read the SMART info because of the hardware RAID), so I don't think it is a thermal issue.
I also replaced a couple more hard drives in between these steps with various models.

I am about out of ideas.  Sorry for the long post, but I wanted to include all information I thought relevant.  Please let me know if you have any more ideas or things to try.
LVL 10
Who is Participating?
rindiConnect With a Mentor Commented:
I'm also one of those who is never again going to buy a new maxtor if I can help it. All maxtors I've seen have been very reliable in one thing, they never last more than 3 months! Having said that I have no idea whether they have improved on their quality in the last 2 years, and I recently heard a rumour maxtor was taken by seagate, and seagates are very reliable!

Also, don't use different raid levels on the same HD, rather get extra disks if you need other raid levels. There isn't much to gain (if there is anything at all) by using raid 0 and 5 on the same three disks. Raid 0 is usually good for speed, and raid 5 for redundancy, but because both are using the same hardware at the same time you are very unlikely to get a higher speed on the raid 0 array when there is also an active raid 5 array using the same hardware. Get extra disks for the raid 0 array.

The actual problem could be caused by bad cables. I've had a similar problem with a raid 5 array with the same controller you are using. It was in a shuttle XPC, and sometimes a disk would go offline, and then the array would have to be rebuilt again. Once 2 disks went offline and I had to reinstall and restore from backups. I got in touch with shuttle and they sent me new SATA data cables, and since then I didn't have any problems anymore. Maybe you need to get high quality SATA cables. If that doesn't help, I'd suspect a bad powersupply...

First just like to say in the future stay away from Maxtor they use to make good drives but in the last few years they royally suck and are failing left and right. I think the main issue is how the raid is setup in my experience its a bad idea to create multiple arrays on the same set of drives, worse if they are different raid types.  Have you tryed creating only 1 array (1 big raid 5)?
GuruGaryAuthor Commented:
Thanks for the tips.  I agree that Maxtor has had some serious reliability issues the past couple of years.  And yes, Seagate did acquire Maxtor a few years ago.  In our shop we keep stacks of the failed drives we have replaced.  About 2-3 years ago, Maxtor was the lowest failed stack.  Over the past 2  years or so, Maxtor has grown to be the largest stack.  250 GB Maxtors just happeded to be the only large drives we had a good stock on when I built the computer.

I haven't had problems with mixing RAID-0 and RAID-5 before, but I'll try making it all just one RAID level.  I doubt it is the cables since I have replaced them a few times already ... but if it is still failing after the RAID consolidation and new power supply I guess a 4th set of SATA cables can't hurt.

If there are any more ideas, please let me know!
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

The comment about not mixing raid levels probably isn't the main cause of the problem, but in my point of view you don't get any advantage with mixing. The main reason for using raid 0 is because of the speed which is acquired because all the disks are accessed at the same time. But if you are using raid 5 on the same disks at the same time you loose that speed advantage again because there will be access to the raid 5 array at the same time. This will reduce the raid 0 speed to less than the speed of your raid 5 array.
jamietonerConnect With a Mentor Commented:
If consoliditating the drives doesnt fix it i would suggest replacing the raid controller or if its and integrated controller just add a pci raid controller.
f-kingIT support technicianCommented:
So your only really left with a power issue?
Have you tried a different power supply? and do you get any electric surges or power failures?

GuruGaryAuthor Commented:
I have not yet tried a different power supply.  It crossed my mind when I swapped power connectors on the drives, but since the issue didn't follow the power connector and the management software didn't detect any fluctuations in voltages, I didn't replace the power supply at that time.  The comptuer is plugged in to an APC Smart-UPS which has power conditioning.  The power at this location is usually very reliable and clean, and I think any surges, brownouts, etc. would be corrected by the UPS.  

For the next step, I will try replacing the power supply.  If that doesn't work then I will replace the system board which has the RAID support built-in.  I'll report back on the progress.
GuruGaryAuthor Commented:
The power supply has been replaced, and so far there have been no problems, but it often takes longer than this to fail ... so if the problem does not occur in the next few weeks, I'll assume the problem is fixed.  For now I'll just wait to see if it fails again.
GuruGaryAuthor Commented:
I think I jinxed myself.  About 2 hours after I posted that everything had been running fine since the new power supply was installed, it failed.  I took some notes, power cycled, let the RAID rebuild itself and now it is back online again.  Since the RAID-0 has been taken out and everything is running as RAID-5, and the power supply has been replaced, the only other suggestions I think I have left are replacing the SATA cables (which I have done once already) and replacing the RAID controller (which is built into the system board and has been replaced already).  

I'll replace all the SATA cables again with brand new quality cables, and see what happens.  If there are any other suggestions, please let me know.
What you may want to try instead of replacing the motherboard for the integrated raid is add a pci sata raid controller. They are usually alot stabler than integrated driver based raid controllers ( most integrtaed sata raid is driver based). A controller card like this one should work and is about the same price as a D945pvs.

GuruGaryAuthor Commented:
Well, I ended up replacing all the drives (again) with WD RE2 drives, and also replaced all the cables (again) with brand new cables.  I haven't seen the error since, but it has only been 3 weeks so it may still happen.  Either way, I guess I have the information needed to fix the issue.  Thanks for the help and ideas, and hopefully the problem is fixed for good!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.