Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Raid 5 array, 2 drive failure imminent!

Posted on 2011-10-23
5
Medium Priority
?
369 Views
Last Modified: 2013-03-18
I have a 6 disk Raid 5 array in an HP Proliant server.  A single drive failed last week.  I hot swapped my last new replacement for it and the rebuild began.  Errors were found on another physical drive and the rebuild stopped, the array is now in a state of 'pending recovery'.  So now I have one drive unwritten to and another about to fail...  the other drive ( with errors) is now in a 'pending failure' state.  My new replacement drives will arrive tomorrow.

Is there any way to take the array offline and force a rebuild to the now failing 'error' disk?
My data is safe but the downtime for system rebuild will be inconvenient, to say the least.
0
Comment
Question by:kmorrison65
5 Comments
 
LVL 47

Expert Comment

by:David
ID: 37014996
Not without high degree of risk for making things worse.  Your system is in stress. Best thing you can do is leave it alone.   Rebuilding a disk drive is as stressful as it gets (other than a power cycle), and your priority is to get the RAID optimal again.   The best thing you can do is eliminate any unnecessary I/O and wait it out.  You have a known bad against a drive that may even be a false positive.

If it was me, however, and you had some vital files that need to be backed up that are stale, then I would risk creating a backup for the files that would cost me the most to recover.

Think of it this way, if the drive does fail, then it might cost you $5000 to get everything recovered via a professional recovery.  If the data that has not been backed up recently is worth more than $5K to you, then kick off a backup now.   Otherwise cross  your fingers and disable any automated tasks that might generate significant I/O until tomorrow (like a defrag).

0
 

Author Comment

by:kmorrison65
ID: 37015363
The data was backed up 2 days ago.  At this point, I'm wondering what my options are to rescue the array, if any.  Out of 6 disks, it's running on 5 with 1 known bad.  What would happen if I brought the system down, swapped the failing drive  with a new one and tried to reboot?
0
 
LVL 47

Expert Comment

by:David
ID: 37015842
You would lose all of your data.
0
 
LVL 56

Expert Comment

by:andyalder
ID: 37018887
You do not "bring the system down" to replace disks on a Smart Array controller, they are designed to have bad disks replaced live.

With read errors on one drive and another one failed it won't be able to rebuild. Maybe they aren't unrecoverable read errors but that's what normally causes rebuild failures.
0
 
LVL 32

Accepted Solution

by:
aleghart earned 750 total points
ID: 37019888
> if I brought the system down, swapped the failing drive  with a new one and tried to reboot

Then you would have a RAID 5 array with two disks lost.  That = a failed array.  It's a trip to the data recovery folks at that point.

With the re-build in process, you may have to leave it alone.  Even running at 100%, it could take 24-36 hours if you are using large (>1TB) drives in RAID 5.

Check your re-build priority.  I'd put it to the maximum, so the rebuild will not get delayed due to IO from users or services.  (As long as that doesn't force a re-boot.)  If possible, remove all access, so the controller is doing nothing but servicing the drives, not user requests.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Steps to fix error: “Couldn’t mount the database that you specified. Specified database: HU-DB; Error code: An Active Manager operation fail”
Today as you open your Outlook, you witness an error message: “Outlook is using an old copy of your Outlook Data File…”. Probably, Outlook is accessing an old OST file.
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question