Computer intermittently losing hard drives
Posted on 2006-11-08
My computer has been intermittently losing 1 or 2 hard drives ever since I built it. It is a home built computer running Windows XP with the relevant hardware being an Antec 450W power supply, Intel D945Pvs system board (82801 GR I/O controller hub with ICH7-R) and 3 Maxtor 250GB SATA hard drives of various different models including 6L250S0, 6V250F0, 7V250F0, 6L25020 with most of the space in a RAID-5 configuration. About every 2-6 weeks 1 or 2 drives in the RAID fail.
When just 1 drive goes down, the computer will reboot (I actually have an 8GB RAID-0 partition divided equally between the 3 drives for my swap file, and the remaining space on the 3 drives is used in a RAID-5 configuration) because the swap file is no longer available to read or write to. Sometimes it will run just long enough to tell me that a drive from the RAID set is missing, and give me some errors about the swap file. After the reboot, and during the POST, the hardware RAID shows 1 drive missing, and then the computer comes back in Windows with the swap file disabled and it runs in a degraded state. Restarting the computer does not solve the problem, but shutting the computer all the way down and powering it back up allows the drive to be seen, and it gets rebuilt and works again for a few weeks.
When 2 drives go down they appear to go down at the exact same time. The computer reboots and during POST the RAID shows 2 drives missing. Pressing the RESET button or warm booting do not solve the problem. A power cycle does let both drives to be seen, and I have to go into the RAID configuration and tell it to recover the volumes. Then it then boots to Windows and rebuilds the RAID set. After this happened about the 3rd time, I started taking notes.
Sometimes it happens when I am using the computer and sometimes it happens when I am not at my computer.
The problem does not follow any particular drive or drives.
The problem does not follow any particular SATA port or ports.
The first thing I did was upgrade the BIOS to the latest version. Did not fix the problem.
Then I replaced the SATA cables. Did not fix the problem.
Then I replaced the drive that was failing the most with a brand new drive (still Maxtor, but different model). Did not fix the problem.
Then I replaced the system board with a brand new Intel D945Pvs, and replaced the SATA cables again at the same time. Did not fix the problem.
Then I replaced all 3 drives at once. Did not fix the problem.
Then it happened to the same 2 drives twice in a row, and those drives happened to be on the same SATA power cable coming from the power supply. So I swapped power connectors on the drives, but it happened again on one of the same drives on the new power connector.
The hottest spot on the external of the hottest drive in the cage is 33 degrees Celsius (I can't read the SMART info because of the hardware RAID), so I don't think it is a thermal issue.
I also replaced a couple more hard drives in between these steps with various models.
I am about out of ideas. Sorry for the long post, but I wanted to include all information I thought relevant. Please let me know if you have any more ideas or things to try.