Solved

HP array Raid 5 nightmare

Posted on 2010-08-19
8
442 Views
Last Modified: 2012-05-10
On my MSA30, raid 5 configuration, I have 2 drive failures.  Data is still accessible, but how can I resolve this issue.   I have no redundancy now.My online spare did not initialize for some reason.  See my correspondence from HP below.  
Here we have the Level 2 action plan:


Unfortunately this is a raid 5 lun with at least 2 drives with problems,
disk 8:3 is on the status "replacement"
disk 8:1 has some errors as well

It is important to clarify that the lun may need to  be recreated and data restored from backup as chances are to get the following options with no success:

options are

1: Try  to reseat drive 8:1, unseat, wait 2 minutes, put it back in, if the system recognize it without the errors it may start to rebuild
2. Reseat drive 8:3, but unseat the drive, wait at least 2 minutes and put it back
3. Unseat drvie 8:3 and wait until it is reported as missing or failed, in order to try force an spare drive to take over
4. Replace drive 8:3 with a new one (keep the original drive handy if needed )

I wil not recommend to risk the rest of the volumes on this enclosuse
if any of this does not solve the issue best recommendation is to recreate that lun
2 drives  are going to be sent onsite anyway to be replace drives 8:1 and 8:3, drive 8:8 is not to be replaced as for now, if any of this steps make the rebuild process to start both drives need replacement .
0
Comment
Question by:onebytesystem
8 Comments
 
LVL 63

Expert Comment

by:SysExpert
ID: 33478607
I would first see if you can backup any critical data since any change could cause a total loss of data.

You should  have the drives probably within 24 hours, so I would work on making sure you have up to date backups and keep all backups done in the last week or so.

Double check all backup logs to make sure you have complete backups

I hope this helps !
0
 
LVL 5

Expert Comment

by:shadowmantx
ID: 33478842
I have been able in the past to Ghost a drive to spare drive just to save the data.  Ghost has a "force" clone switch that will copy even though there are errors.  You may have to attach a USB IDE/SATA to the server since you need the RAID array to read the original drive.  I have found that the USB IDE cables work best.  There is an adapter that will convert SATA to IDE.

Good luck.
0
 
LVL 12

Expert Comment

by:Rant32
ID: 33479211
The MSA will not recognize a cloned drive (or any drive that has been offline) as the original data. It'll re-initialize it anyway (under normal circumstances...)

My guess is that the only reason your array is still working is because 8:1 is not completely dead, but the MSA does not trust the drive enough to rebuild the data off it (if there are unrecoverable read errors). Removing the drive (tip #1) will fail the array because now 2 drives are missing.

In any case, the corresponding stripes are already lost. Backup and restore is the best way to go.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:onebytesystem
ID: 33479596
Do you think the minute 1 drive fails, the spare will come online
0
 
LVL 17

Expert Comment

by:sgsm81
ID: 33479712
Download, install and run Symantec BESR, use this to take a full image of the server to a NAS or removable drive

Test

Reformat and recreate array then restore data
0
 
LVL 12

Expert Comment

by:Rant32
ID: 33479949
onebytesystem: I do not know what you mean with that comment, please explain. You already have a failed drive and another one that is showing errors (what kind of errors?)

Your online spare does not initialize because of errors on drive 8:1.
Drive 8:3 is recognized as a new drive, it has no data.

Ergo, you cannot rebuild the array or increase fault tolerance. Any more failure will lose the array.

And I also think the HP engineer gave you some VERY BAD advice about removing drive 8:1 while the array was still operational, because the array would be lost completely. Do the experts agree?

Also: Is the backup running yet...

I must add that I've seen random failing drives in MSA30 enclosures because of the following (as I was explained by HP): the firmware on a disk drive can be too new for the MSA drive enclosure (e.g. you put a brand new disk drive in an enclosure with old firmware).
The drive will work for some time, but at some point the disk drive does not respond to commands the way the MSA expects it to, and the drive gets failed. This can happen at seemingly random times but will just happen again if you replace the drive. The solution for that was to upgrade the MSA firmware on both controllers, but I left that job to the real specialists. I can't remember any version numbers, sorry. Better note this to the HP rep as well, maybe he can help you.
0
 
LVL 8

Accepted Solution

by:
PaperTiger earned 500 total points
ID: 33484837
I wouldn't sweat it too much at this point.

It's likely that drive 8:3 failed but your 8:1, I assume is your hot spare, didn't come online for whatever reason.

Technically replacing both drives one at a time should be fine but I like to go with a safer route of:

1. Run an image cold backup immediately so that you can preserve the data.
2. after that, replace both drives but one at a time.
0
 

Author Closing Comment

by:onebytesystem
ID: 33827168
Thank you all.  Issue now resolved
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we have discussed the manual scenarios to recover data from Windows 10 through some backup and recovery tools which are offered by it.
A Bare Metal Image backup allows for the restore of an entire system to a similar or dissimilar hardware. They are highly useful for migrations and disaster recovery. Bare Metal Image backups support Full and Incremental backups. Differential backup…
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question