Solved

HP array Raid 5 nightmare

Posted on 2010-08-19
8
439 Views
Last Modified: 2012-05-10
On my MSA30, raid 5 configuration, I have 2 drive failures.  Data is still accessible, but how can I resolve this issue.   I have no redundancy now.My online spare did not initialize for some reason.  See my correspondence from HP below.  
Here we have the Level 2 action plan:


Unfortunately this is a raid 5 lun with at least 2 drives with problems,
disk 8:3 is on the status "replacement"
disk 8:1 has some errors as well

It is important to clarify that the lun may need to  be recreated and data restored from backup as chances are to get the following options with no success:

options are

1: Try  to reseat drive 8:1, unseat, wait 2 minutes, put it back in, if the system recognize it without the errors it may start to rebuild
2. Reseat drive 8:3, but unseat the drive, wait at least 2 minutes and put it back
3. Unseat drvie 8:3 and wait until it is reported as missing or failed, in order to try force an spare drive to take over
4. Replace drive 8:3 with a new one (keep the original drive handy if needed )

I wil not recommend to risk the rest of the volumes on this enclosuse
if any of this does not solve the issue best recommendation is to recreate that lun
2 drives  are going to be sent onsite anyway to be replace drives 8:1 and 8:3, drive 8:8 is not to be replaced as for now, if any of this steps make the rebuild process to start both drives need replacement .
0
Comment
Question by:onebytesystem
8 Comments
 
LVL 63

Expert Comment

by:SysExpert
ID: 33478607
I would first see if you can backup any critical data since any change could cause a total loss of data.

You should  have the drives probably within 24 hours, so I would work on making sure you have up to date backups and keep all backups done in the last week or so.

Double check all backup logs to make sure you have complete backups

I hope this helps !
0
 
LVL 5

Expert Comment

by:shadowmantx
ID: 33478842
I have been able in the past to Ghost a drive to spare drive just to save the data.  Ghost has a "force" clone switch that will copy even though there are errors.  You may have to attach a USB IDE/SATA to the server since you need the RAID array to read the original drive.  I have found that the USB IDE cables work best.  There is an adapter that will convert SATA to IDE.

Good luck.
0
 
LVL 12

Expert Comment

by:Rant32
ID: 33479211
The MSA will not recognize a cloned drive (or any drive that has been offline) as the original data. It'll re-initialize it anyway (under normal circumstances...)

My guess is that the only reason your array is still working is because 8:1 is not completely dead, but the MSA does not trust the drive enough to rebuild the data off it (if there are unrecoverable read errors). Removing the drive (tip #1) will fail the array because now 2 drives are missing.

In any case, the corresponding stripes are already lost. Backup and restore is the best way to go.
0
 

Author Comment

by:onebytesystem
ID: 33479596
Do you think the minute 1 drive fails, the spare will come online
0
Free camera licenses with purchase of My Cloud NAS

Milestone Arcus software is compatible with thousands of industry-leading cameras for added flexibility. Upon installation on your My Cloud NAS, you will receive two (2) camera licenses already enabled in the software. And for a limited time, get additional camera licenses FREE.

 
LVL 17

Expert Comment

by:sgsm81
ID: 33479712
Download, install and run Symantec BESR, use this to take a full image of the server to a NAS or removable drive

Test

Reformat and recreate array then restore data
0
 
LVL 12

Expert Comment

by:Rant32
ID: 33479949
onebytesystem: I do not know what you mean with that comment, please explain. You already have a failed drive and another one that is showing errors (what kind of errors?)

Your online spare does not initialize because of errors on drive 8:1.
Drive 8:3 is recognized as a new drive, it has no data.

Ergo, you cannot rebuild the array or increase fault tolerance. Any more failure will lose the array.

And I also think the HP engineer gave you some VERY BAD advice about removing drive 8:1 while the array was still operational, because the array would be lost completely. Do the experts agree?

Also: Is the backup running yet...

I must add that I've seen random failing drives in MSA30 enclosures because of the following (as I was explained by HP): the firmware on a disk drive can be too new for the MSA drive enclosure (e.g. you put a brand new disk drive in an enclosure with old firmware).
The drive will work for some time, but at some point the disk drive does not respond to commands the way the MSA expects it to, and the drive gets failed. This can happen at seemingly random times but will just happen again if you replace the drive. The solution for that was to upgrade the MSA firmware on both controllers, but I left that job to the real specialists. I can't remember any version numbers, sorry. Better note this to the HP rep as well, maybe he can help you.
0
 
LVL 8

Accepted Solution

by:
PaperTiger earned 500 total points
ID: 33484837
I wouldn't sweat it too much at this point.

It's likely that drive 8:3 failed but your 8:1, I assume is your hot spare, didn't come online for whatever reason.

Technically replacing both drives one at a time should be fine but I like to go with a safer route of:

1. Run an image cold backup immediately so that you can preserve the data.
2. after that, replace both drives but one at a time.
0
 

Author Closing Comment

by:onebytesystem
ID: 33827168
Thank you all.  Issue now resolved
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we have discussed the manual scenarios to recover data from Windows 10 through some backup and recovery tools which are offered by it.
Learn about cloud computing and its benefits for small business owners.
In this Micro Tutorial viewers will learn how to use Boot Corrector from Paragon Rescue Kit Free to identify and fix the boot problems of Windows 7/8/2012R2 etc. As an example is used Windows 2012R2 which lost its active partition flag (often happen…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now