Are drives in a Failed state a lost cause or is there hope??

Posted on 2006-06-07
Last Modified: 2010-04-03
Hello all:

   I just had a server go down.  It has a RAID 10 with a hot spare.  There was a bad storm and the UPS backup didn't appear to do it's job - or something.  Anyway, the controller is an LSI MegaRAID 300 SATA 8xLP.  I had 5 x 500 GB Seagate SATA drives, 4 in a RAID 10, with one hot spare.  Currently, I show:

Port 0: A1: online 426837 MB
Port 1: A1: online: not responding
Port 3: A2: failed: not responding
Port 4: A2: failed: 476837 MB

I haven't done anything tonight - I guess I want tech support to hold my hand...  Anyway, would a rescan or something possibly reactivate these drives enough for me to boot the server?  It's interesting that these drives show as failed.  Out of all the 15 or so computers, the only ones to have problems during the storm are the ones in the server.  (I assume port 2 was there as well and the drive in port 4 jumped in to take its place...???)

Also, could this be caused by a bad controller??  When adding batteries to the UPS, the technician shorted out the UPS and the server went down hard.  Ever since then, a cold reboot would pop up a question from the LSI RAID controller asking for configuration information.  Selecting the configuration from the disks allowed the server to reboot just fine.  (A replacement controller has been ordered.)  But now, I have beeping and show failed drives.  Any chance these drives can be recovered (or at least one of the drives in A2:)?  (We have made some backups, but haven't been that diligent, so I hope we haven't lost all...)  Comments and suggestions welcome.

Question by:jhuntii
    LVL 87

    Accepted Solution

    The only way to tell is to test the drives. You should do that individually, one by one in another PC without raid, and after the test put them back into the original place on the server. If possible leave the server off while that is done. With raid 10 you do have a good chance to get back up with the least effort anyway. Use the manufacturer's drive test utility and only run non-destructive tests. If they end up ok, the drive is probably still OK. If not, replace the defective drive(s) with a new one. If the array after that test doesn't rebuild, change the SATA cables in the server, and get only high quality cables. If that is no good either, check the controller. Maybe upgrading the firmware could help, but if necessary, change it.

    Author Comment

    Well, I got tech support on the line and we checked the drives for errors using the LSI MegaRAID Bios utility.  They showed no errors - so we forced them on-line and they stayed.  He said that generally you don't want to do that - you want to bring one in and rebuild the other.  But since both had failed, there wasn't any way to tell which to use as the master, I guess.  Anyway, the story still doesn't end that happily.  Windows started to boot, but then died.  It turns out that the data still is not there in the second mirrored set.  My guess is that one drive failed, the hot spare took over and began to rebuild, then either that drive or the original failed, then the final one fell out of the array as well.  It may have something to do with the drives - Seagate ST3500641AS - which are I think are not the server grade drives...?? and may not report errors or status back to the controller fast enough, so the controller drops them out of the array causing a rebuild on the hot spare.  On second thought, the main problem was probably the power.  Bigger UPS is coming...  Thanks for you help and quick response.


    Featured Post

    What Security Threats Are You Missing?

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Join & Write a Comment

    Having issues meeting security compliance criteria because of those pesky USB drives? Then I can help you! This article will explain how to disable USB Mass Storage devices in Windows Server 2008 R2.
    The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
    This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
    This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now