Solved

HP array Raid 5 nightmare

Posted on 2010-08-19
8
433 Views
Last Modified: 2012-05-10
On my MSA30, raid 5 configuration, I have 2 drive failures.  Data is still accessible, but how can I resolve this issue.   I have no redundancy now.My online spare did not initialize for some reason.  See my correspondence from HP below.  
Here we have the Level 2 action plan:


Unfortunately this is a raid 5 lun with at least 2 drives with problems,
disk 8:3 is on the status "replacement"
disk 8:1 has some errors as well

It is important to clarify that the lun may need to  be recreated and data restored from backup as chances are to get the following options with no success:

options are

1: Try  to reseat drive 8:1, unseat, wait 2 minutes, put it back in, if the system recognize it without the errors it may start to rebuild
2. Reseat drive 8:3, but unseat the drive, wait at least 2 minutes and put it back
3. Unseat drvie 8:3 and wait until it is reported as missing or failed, in order to try force an spare drive to take over
4. Replace drive 8:3 with a new one (keep the original drive handy if needed )

I wil not recommend to risk the rest of the volumes on this enclosuse
if any of this does not solve the issue best recommendation is to recreate that lun
2 drives  are going to be sent onsite anyway to be replace drives 8:1 and 8:3, drive 8:8 is not to be replaced as for now, if any of this steps make the rebuild process to start both drives need replacement .
0
Comment
Question by:onebytesystem
8 Comments
 
LVL 63

Expert Comment

by:SysExpert
Comment Utility
I would first see if you can backup any critical data since any change could cause a total loss of data.

You should  have the drives probably within 24 hours, so I would work on making sure you have up to date backups and keep all backups done in the last week or so.

Double check all backup logs to make sure you have complete backups

I hope this helps !
0
 
LVL 5

Expert Comment

by:shadowmantx
Comment Utility
I have been able in the past to Ghost a drive to spare drive just to save the data.  Ghost has a "force" clone switch that will copy even though there are errors.  You may have to attach a USB IDE/SATA to the server since you need the RAID array to read the original drive.  I have found that the USB IDE cables work best.  There is an adapter that will convert SATA to IDE.

Good luck.
0
 
LVL 12

Expert Comment

by:Rant32
Comment Utility
The MSA will not recognize a cloned drive (or any drive that has been offline) as the original data. It'll re-initialize it anyway (under normal circumstances...)

My guess is that the only reason your array is still working is because 8:1 is not completely dead, but the MSA does not trust the drive enough to rebuild the data off it (if there are unrecoverable read errors). Removing the drive (tip #1) will fail the array because now 2 drives are missing.

In any case, the corresponding stripes are already lost. Backup and restore is the best way to go.
0
 

Author Comment

by:onebytesystem
Comment Utility
Do you think the minute 1 drive fails, the spare will come online
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 17

Expert Comment

by:sgsm81
Comment Utility
Download, install and run Symantec BESR, use this to take a full image of the server to a NAS or removable drive

Test

Reformat and recreate array then restore data
0
 
LVL 12

Expert Comment

by:Rant32
Comment Utility
onebytesystem: I do not know what you mean with that comment, please explain. You already have a failed drive and another one that is showing errors (what kind of errors?)

Your online spare does not initialize because of errors on drive 8:1.
Drive 8:3 is recognized as a new drive, it has no data.

Ergo, you cannot rebuild the array or increase fault tolerance. Any more failure will lose the array.

And I also think the HP engineer gave you some VERY BAD advice about removing drive 8:1 while the array was still operational, because the array would be lost completely. Do the experts agree?

Also: Is the backup running yet...

I must add that I've seen random failing drives in MSA30 enclosures because of the following (as I was explained by HP): the firmware on a disk drive can be too new for the MSA drive enclosure (e.g. you put a brand new disk drive in an enclosure with old firmware).
The drive will work for some time, but at some point the disk drive does not respond to commands the way the MSA expects it to, and the drive gets failed. This can happen at seemingly random times but will just happen again if you replace the drive. The solution for that was to upgrade the MSA firmware on both controllers, but I left that job to the real specialists. I can't remember any version numbers, sorry. Better note this to the HP rep as well, maybe he can help you.
0
 
LVL 8

Accepted Solution

by:
PaperTiger earned 500 total points
Comment Utility
I wouldn't sweat it too much at this point.

It's likely that drive 8:3 failed but your 8:1, I assume is your hot spare, didn't come online for whatever reason.

Technically replacing both drives one at a time should be fine but I like to go with a safer route of:

1. Run an image cold backup immediately so that you can preserve the data.
2. after that, replace both drives but one at a time.
0
 

Author Closing Comment

by:onebytesystem
Comment Utility
Thank you all.  Issue now resolved
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

In this article you will get to know about pros and cons of storage drives HDD, SSD and SSHD.
Learn about cloud computing and its benefits for small business owners.
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now