Solved

Computer intermittently losing hard drives

Posted on 2006-11-08
12
496 Views
Last Modified: 2012-08-14
Hello, Experts.

My computer has been intermittently losing 1 or 2 hard drives ever since I built it.  It is a home built computer running Windows XP with the relevant hardware being an Antec 450W power supply, Intel D945Pvs system board (82801 GR I/O controller hub with ICH7-R) and 3 Maxtor 250GB SATA hard drives of various different models including 6L250S0, 6V250F0, 7V250F0, 6L25020 with most of the space in a RAID-5 configuration.  About every 2-6 weeks 1 or 2 drives in the RAID fail.  

When just 1 drive goes down, the computer will reboot (I actually have an 8GB RAID-0 partition divided equally between the 3 drives for my swap file, and the remaining space on the 3 drives is used in a RAID-5 configuration) because the swap file is no longer available to read or write to.  Sometimes it will run just long enough to tell me that a drive from the RAID set is missing, and give me some errors about the swap file.  After the reboot, and during the POST, the hardware RAID shows 1 drive missing, and then the computer comes back in Windows with the swap file disabled and it runs in a degraded state.  Restarting the computer does not solve the problem, but shutting the computer all the way down and powering it back up allows the drive to be seen, and it gets rebuilt and works again for a few weeks.  

When 2 drives go down they appear to go down at the exact same time.  The computer reboots and during POST the RAID shows 2 drives missing.  Pressing the RESET button or warm booting do not solve the problem.  A power cycle does let both drives to be seen, and I have to go into the RAID configuration and tell it to recover the volumes.  Then it then boots to Windows and rebuilds the RAID set.  After this happened about the 3rd time, I started taking notes.  

Sometimes it happens when I am using the computer and sometimes it happens when I am not at my computer.
The problem does not follow any particular drive or drives.
The problem does not follow any particular SATA port or ports.

The first thing I did was upgrade the BIOS to the latest version. Did not fix the problem.
Then I replaced the SATA cables. Did not fix the problem.
Then I replaced the drive that was failing the most with a brand new drive (still Maxtor, but different model). Did not fix the problem.
Then I replaced the system board with a brand new Intel D945Pvs, and replaced the SATA cables again at the same time. Did not fix the problem.
Then I replaced all 3 drives at once. Did not fix the problem.
Then it happened to the same 2 drives twice in a row, and those drives happened to be on the same SATA power cable coming from the power supply.  So I swapped power connectors on the drives, but it happened again on one of the same drives on the new power connector.
The hottest spot on the external of the hottest drive in the cage is 33 degrees Celsius (I can't read the SMART info because of the hardware RAID), so I don't think it is a thermal issue.
I also replaced a couple more hard drives in between these steps with various models.

I am about out of ideas.  Sorry for the long post, but I wanted to include all information I thought relevant.  Please let me know if you have any more ideas or things to try.
0
Comment
Question by:GuruGary
  • 5
  • 3
  • 2
  • +1
12 Comments
 
LVL 34

Expert Comment

by:jamietoner
Comment Utility
First just like to say in the future stay away from Maxtor they use to make good drives but in the last few years they royally suck and are failing left and right. I think the main issue is how the raid is setup in my experience its a bad idea to create multiple arrays on the same set of drives, worse if they are different raid types.  Have you tryed creating only 1 array (1 big raid 5)?
0
 
LVL 87

Accepted Solution

by:
rindi earned 350 total points
Comment Utility
I'm also one of those who is never again going to buy a new maxtor if I can help it. All maxtors I've seen have been very reliable in one thing, they never last more than 3 months! Having said that I have no idea whether they have improved on their quality in the last 2 years, and I recently heard a rumour maxtor was taken by seagate, and seagates are very reliable!

Also, don't use different raid levels on the same HD, rather get extra disks if you need other raid levels. There isn't much to gain (if there is anything at all) by using raid 0 and 5 on the same three disks. Raid 0 is usually good for speed, and raid 5 for redundancy, but because both are using the same hardware at the same time you are very unlikely to get a higher speed on the raid 0 array when there is also an active raid 5 array using the same hardware. Get extra disks for the raid 0 array.

The actual problem could be caused by bad cables. I've had a similar problem with a raid 5 array with the same controller you are using. It was in a shuttle XPC, and sometimes a disk would go offline, and then the array would have to be rebuilt again. Once 2 disks went offline and I had to reinstall and restore from backups. I got in touch with shuttle and they sent me new SATA data cables, and since then I didn't have any problems anymore. Maybe you need to get high quality SATA cables. If that doesn't help, I'd suspect a bad powersupply...

0
 
LVL 10

Author Comment

by:GuruGary
Comment Utility
Thanks for the tips.  I agree that Maxtor has had some serious reliability issues the past couple of years.  And yes, Seagate did acquire Maxtor a few years ago.  In our shop we keep stacks of the failed drives we have replaced.  About 2-3 years ago, Maxtor was the lowest failed stack.  Over the past 2  years or so, Maxtor has grown to be the largest stack.  250 GB Maxtors just happeded to be the only large drives we had a good stock on when I built the computer.

I haven't had problems with mixing RAID-0 and RAID-5 before, but I'll try making it all just one RAID level.  I doubt it is the cables since I have replaced them a few times already ... but if it is still failing after the RAID consolidation and new power supply I guess a 4th set of SATA cables can't hurt.

If there are any more ideas, please let me know!
0
 
LVL 87

Expert Comment

by:rindi
Comment Utility
The comment about not mixing raid levels probably isn't the main cause of the problem, but in my point of view you don't get any advantage with mixing. The main reason for using raid 0 is because of the speed which is acquired because all the disks are accessed at the same time. But if you are using raid 5 on the same disks at the same time you loose that speed advantage again because there will be access to the raid 5 array at the same time. This will reduce the raid 0 speed to less than the speed of your raid 5 array.
0
 
LVL 34

Assisted Solution

by:jamietoner
jamietoner earned 150 total points
Comment Utility
If consoliditating the drives doesnt fix it i would suggest replacing the raid controller or if its and integrated controller just add a pci raid controller.
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 15

Expert Comment

by:f-king
Comment Utility
Hi
So your only really left with a power issue?
Have you tried a different power supply? and do you get any electric surges or power failures?

0
 
LVL 10

Author Comment

by:GuruGary
Comment Utility
I have not yet tried a different power supply.  It crossed my mind when I swapped power connectors on the drives, but since the issue didn't follow the power connector and the management software didn't detect any fluctuations in voltages, I didn't replace the power supply at that time.  The comptuer is plugged in to an APC Smart-UPS which has power conditioning.  The power at this location is usually very reliable and clean, and I think any surges, brownouts, etc. would be corrected by the UPS.  

For the next step, I will try replacing the power supply.  If that doesn't work then I will replace the system board which has the RAID support built-in.  I'll report back on the progress.
0
 
LVL 10

Author Comment

by:GuruGary
Comment Utility
The power supply has been replaced, and so far there have been no problems, but it often takes longer than this to fail ... so if the problem does not occur in the next few weeks, I'll assume the problem is fixed.  For now I'll just wait to see if it fails again.
0
 
LVL 10

Author Comment

by:GuruGary
Comment Utility
I think I jinxed myself.  About 2 hours after I posted that everything had been running fine since the new power supply was installed, it failed.  I took some notes, power cycled, let the RAID rebuild itself and now it is back online again.  Since the RAID-0 has been taken out and everything is running as RAID-5, and the power supply has been replaced, the only other suggestions I think I have left are replacing the SATA cables (which I have done once already) and replacing the RAID controller (which is built into the system board and has been replaced already).  

I'll replace all the SATA cables again with brand new quality cables, and see what happens.  If there are any other suggestions, please let me know.
0
 
LVL 34

Expert Comment

by:jamietoner
Comment Utility
What you may want to try instead of replacing the motherboard for the integrated raid is add a pci sata raid controller. They are usually alot stabler than integrated driver based raid controllers ( most integrtaed sata raid is driver based). A controller card like this one should work and is about the same price as a D945pvs. http://www.newegg.com/Product/Product.asp?Item=N82E16816115029

0
 
LVL 10

Author Comment

by:GuruGary
Comment Utility
Well, I ended up replacing all the drives (again) with WD RE2 drives, and also replaced all the cables (again) with brand new cables.  I haven't seen the error since, but it has only been 3 weeks so it may still happen.  Either way, I guess I have the information needed to fix the issue.  Thanks for the help and ideas, and hopefully the problem is fixed for good!
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
New Netapp Configuration 11 48
USB Drives Missing? 5 54
Recommendation for External Drives 6 35
compact flash card Type II 2 27
If you have a USB Drive that is not recognized by Windows the problem is usually that you have too many network drives or other drives that occupy all the drive letters D: E: or F: which is the normal drive letter of a usb drive. The way to correct …
I have written before on the benefits of using a Boot media other than your HDD when it has become infected.   The article I wrote about creating a bootable CD/DVD/USB (http://e-e.com/A_2343.html) was mainly concerned with building a UBCD4Win on CD …
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now