HP array Raid 5 nightmare

Posted on 2010-08-19
Last Modified: 2012-05-10
On my MSA30, raid 5 configuration, I have 2 drive failures.  Data is still accessible, but how can I resolve this issue.   I have no redundancy now.My online spare did not initialize for some reason.  See my correspondence from HP below.  
Here we have the Level 2 action plan:

Unfortunately this is a raid 5 lun with at least 2 drives with problems,
disk 8:3 is on the status "replacement"
disk 8:1 has some errors as well

It is important to clarify that the lun may need to  be recreated and data restored from backup as chances are to get the following options with no success:

options are

1: Try  to reseat drive 8:1, unseat, wait 2 minutes, put it back in, if the system recognize it without the errors it may start to rebuild
2. Reseat drive 8:3, but unseat the drive, wait at least 2 minutes and put it back
3. Unseat drvie 8:3 and wait until it is reported as missing or failed, in order to try force an spare drive to take over
4. Replace drive 8:3 with a new one (keep the original drive handy if needed )

I wil not recommend to risk the rest of the volumes on this enclosuse
if any of this does not solve the issue best recommendation is to recreate that lun
2 drives  are going to be sent onsite anyway to be replace drives 8:1 and 8:3, drive 8:8 is not to be replaced as for now, if any of this steps make the rebuild process to start both drives need replacement .
Question by:onebytesystem
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 63

Expert Comment

ID: 33478607
I would first see if you can backup any critical data since any change could cause a total loss of data.

You should  have the drives probably within 24 hours, so I would work on making sure you have up to date backups and keep all backups done in the last week or so.

Double check all backup logs to make sure you have complete backups

I hope this helps !

Expert Comment

ID: 33478842
I have been able in the past to Ghost a drive to spare drive just to save the data.  Ghost has a "force" clone switch that will copy even though there are errors.  You may have to attach a USB IDE/SATA to the server since you need the RAID array to read the original drive.  I have found that the USB IDE cables work best.  There is an adapter that will convert SATA to IDE.

Good luck.
LVL 12

Expert Comment

ID: 33479211
The MSA will not recognize a cloned drive (or any drive that has been offline) as the original data. It'll re-initialize it anyway (under normal circumstances...)

My guess is that the only reason your array is still working is because 8:1 is not completely dead, but the MSA does not trust the drive enough to rebuild the data off it (if there are unrecoverable read errors). Removing the drive (tip #1) will fail the array because now 2 drives are missing.

In any case, the corresponding stripes are already lost. Backup and restore is the best way to go.
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.


Author Comment

ID: 33479596
Do you think the minute 1 drive fails, the spare will come online
LVL 17

Expert Comment

ID: 33479712
Download, install and run Symantec BESR, use this to take a full image of the server to a NAS or removable drive


Reformat and recreate array then restore data
LVL 12

Expert Comment

ID: 33479949
onebytesystem: I do not know what you mean with that comment, please explain. You already have a failed drive and another one that is showing errors (what kind of errors?)

Your online spare does not initialize because of errors on drive 8:1.
Drive 8:3 is recognized as a new drive, it has no data.

Ergo, you cannot rebuild the array or increase fault tolerance. Any more failure will lose the array.

And I also think the HP engineer gave you some VERY BAD advice about removing drive 8:1 while the array was still operational, because the array would be lost completely. Do the experts agree?

Also: Is the backup running yet...

I must add that I've seen random failing drives in MSA30 enclosures because of the following (as I was explained by HP): the firmware on a disk drive can be too new for the MSA drive enclosure (e.g. you put a brand new disk drive in an enclosure with old firmware).
The drive will work for some time, but at some point the disk drive does not respond to commands the way the MSA expects it to, and the drive gets failed. This can happen at seemingly random times but will just happen again if you replace the drive. The solution for that was to upgrade the MSA firmware on both controllers, but I left that job to the real specialists. I can't remember any version numbers, sorry. Better note this to the HP rep as well, maybe he can help you.

Accepted Solution

PaperTiger earned 500 total points
ID: 33484837
I wouldn't sweat it too much at this point.

It's likely that drive 8:3 failed but your 8:1, I assume is your hot spare, didn't come online for whatever reason.

Technically replacing both drives one at a time should be fine but I like to go with a safer route of:

1. Run an image cold backup immediately so that you can preserve the data.
2. after that, replace both drives but one at a time.

Author Closing Comment

ID: 33827168
Thank you all.  Issue now resolved

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

First I will try to share a design of a Veeam Backup Infrastructure without Direct NFS Access backup. Note: Direct NFS Access backup transport mechanism is only available in Veeam v9 In above I try to design the Veeam Backup flow between i…
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
In this Micro Tutorial viewers will learn how to use Boot Corrector from Paragon Rescue Kit Free to identify and fix the boot problems of Windows 7/8/2012R2 etc. As an example is used Windows 2012R2 which lost its active partition flag (often happen…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question