Link to home
Start Free TrialLog in
Avatar of Mark Savastano
Mark Savastano

asked on

RAID 1 both drives listed as global hot spares

I have a Dell T410 server with Perc s300 raid using RAID 1.  There was a failure of one drive and the other is degraded.  I replaced the failed drive because I only had one spare available.  In the boot utility (Ctrl-R) I configured the new drive as a hot spare.  When I rebooted and hit Ctrl-R again I see that the Virtual disk is listed as degraded but both drives are listed as hot spares.

I can still boot to Windows Server 2008 R2 and in Open Manage Server Administrator the Virtual Disk name says NONE and status says Failed.  The only task available is Delete.

Under Physical Disks 0:0:0 has a state of Degraded with failure predicted YES and 0:0:1 has a state of Online with failure predicted NO.  For both the only tasks available are Blink, Unblink and Unassign Global Hot Spare.

The system is running but there is a bad sector in the middle of my SQL database.  I cannot perform a successful backup using the installed Shadow Protect nor can I get a successful database backup using SQL Server.

Yikes - I've ordered 2 additional drives.  How can I proceed with having Server Administrator recognize my the Virtual Disk so I can initiate a rebuild.  It's unclear whether or not the 2nd physical disk is actually functioning as a hot spare.  I did try booting with only the second drive and that failed.  I'm concerned that if drive 0 fails I'm SOL.

I'm looking for very specific advice applicable to this situation and the details described.  Please no woulda, coulda, shoulda comments or general advice that doesn't address the issues that are described.
Thanks
Avatar of yo_bee
yo_bee
Flag of United States of America image

If you have a enterprise level RAID controller card you should not have to play around with any of your RAID configurations.    I am not 100% sure, but from my experience you pull the failed drive out and replace with the new drive (Hot Swappable).  Once the RAID controller see the new healthy drive the rebuild starts.  I am not sure if your change in the Boot Utility caused the issue that you are seeing, but I have to suspect so with what I know about RAID.

The question is how did you find out you had a failed drive as well as a degrading one?   It is not common to see two drives fail or degrade at the same time.
Avatar of Mark Savastano
Mark Savastano

ASKER

It sounds like you're not familiar with this type of controller.  Yes, they did both fail at the same time for unknown reasons.  I am seeking suggestions from someone who has experience with the DELL software and hardware that I mentioned.  Thanks.
hope you have a good recent backup if you can't do one now
the fact that the other drive shows predicted failure will prevent you from rebuilding
you really don't have any options here besides replacing the hardware, creating a new array and restoring (or rebuilding) the server
the loss of one drive leaves a single point of failure; the other one is near failure so you're in a bad position you can't easily recover from
normally i would recommend raid 6 or 10 for performance and better fault tolerance when running things like sql or exchange but that controller doesn't support either of them
In your case I would make server backup immediately. You can use free version of Veeam Agent for Windows. Also create a recovery media so when you have recreated raid on new drives you can boot server from recovery media and restore whole server from backup.
https://www.veeam.com/windows-cloud-server-backup-agent.html
Seen several similar problems with the fakeRAID S100 and S300 giving completely impossible sizes to virtual disks and incorrect status. Best for the future is to replace it with a proper PERC controller such as H700.

As it's only RAID1 you could take one disk off the S300 and connect it direct to a SAS/SATA HBA and recover the data as it doesn't need to be de-striped. There's a multitude of programs that could then be run to try to "repair" it or recover files.

Unfortunate about the bad sector on the SQL database, you can run chkdsk /R on it but that will only "fix" it by replacing the bad sector with zeros as far as the file contents are concerned. You can then run SQL DBCC on the database but even that can't recreate the missing block. DBCC may tell you which records are affected though so at least you could then manually examine those records,
Thanks for that suggestion but I have a feeling that won't work since windows is looking for the S300 controller.  I'll need to do a search on booting the T410 from the sata controller, I know it's not a simple process.  I saw a suggestion on spiceworks to unassign the hot spare designation in the controller bios.  I was going to give that a go also.  Waiting for my replacement drives to arrive tomorrow morning.
Replacing with a decent controller was a suggestion for a future rebuild, not to fix this. You can't migrate directly from S300 to H700 anyway, https://www.dell.com/community/PowerEdge-HDD-SCSI-RAID/Upgrading-T310-from-S300-to-H700i/m-p/4262967 describes the procedure but note the start of the second paragraph, they've seen even more absurd behaviour with the S300.

Similarly connecting the disk to a SAS/SATA HBA to pick files off with a recovery tool would be on a recovery PC that was booted from a different source.
ASKER CERTIFIED SOLUTION
Avatar of Mark Savastano
Mark Savastano

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So you;re going to do what I suggested and get rid of the S300 in the future?
No other suggestions contributed to this conclusion or were helpful in any way.
Glad I could help.
Sorry but your comments had no bearing on my decision.  I had already replaced the server and restored from a backup.  My only concern with respect to this post was regaining access to the hardware for the purposes of recovering the database.