[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now


HP array Raid 5 nightmare

Posted on 2010-08-19
Medium Priority
Last Modified: 2012-05-10
On my MSA30, raid 5 configuration, I have 2 drive failures.  Data is still accessible, but how can I resolve this issue.   I have no redundancy now.My online spare did not initialize for some reason.  See my correspondence from HP below.  
Here we have the Level 2 action plan:

Unfortunately this is a raid 5 lun with at least 2 drives with problems,
disk 8:3 is on the status "replacement"
disk 8:1 has some errors as well

It is important to clarify that the lun may need to  be recreated and data restored from backup as chances are to get the following options with no success:

options are

1: Try  to reseat drive 8:1, unseat, wait 2 minutes, put it back in, if the system recognize it without the errors it may start to rebuild
2. Reseat drive 8:3, but unseat the drive, wait at least 2 minutes and put it back
3. Unseat drvie 8:3 and wait until it is reported as missing or failed, in order to try force an spare drive to take over
4. Replace drive 8:3 with a new one (keep the original drive handy if needed )

I wil not recommend to risk the rest of the volumes on this enclosuse
if any of this does not solve the issue best recommendation is to recreate that lun
2 drives  are going to be sent onsite anyway to be replace drives 8:1 and 8:3, drive 8:8 is not to be replaced as for now, if any of this steps make the rebuild process to start both drives need replacement .
Question by:onebytesystem
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 63

Expert Comment

ID: 33478607
I would first see if you can backup any critical data since any change could cause a total loss of data.

You should  have the drives probably within 24 hours, so I would work on making sure you have up to date backups and keep all backups done in the last week or so.

Double check all backup logs to make sure you have complete backups

I hope this helps !

Expert Comment

ID: 33478842
I have been able in the past to Ghost a drive to spare drive just to save the data.  Ghost has a "force" clone switch that will copy even though there are errors.  You may have to attach a USB IDE/SATA to the server since you need the RAID array to read the original drive.  I have found that the USB IDE cables work best.  There is an adapter that will convert SATA to IDE.

Good luck.
LVL 12

Expert Comment

ID: 33479211
The MSA will not recognize a cloned drive (or any drive that has been offline) as the original data. It'll re-initialize it anyway (under normal circumstances...)

My guess is that the only reason your array is still working is because 8:1 is not completely dead, but the MSA does not trust the drive enough to rebuild the data off it (if there are unrecoverable read errors). Removing the drive (tip #1) will fail the array because now 2 drives are missing.

In any case, the corresponding stripes are already lost. Backup and restore is the best way to go.
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!


Author Comment

ID: 33479596
Do you think the minute 1 drive fails, the spare will come online
LVL 17

Expert Comment

ID: 33479712
Download, install and run Symantec BESR, use this to take a full image of the server to a NAS or removable drive


Reformat and recreate array then restore data
LVL 12

Expert Comment

ID: 33479949
onebytesystem: I do not know what you mean with that comment, please explain. You already have a failed drive and another one that is showing errors (what kind of errors?)

Your online spare does not initialize because of errors on drive 8:1.
Drive 8:3 is recognized as a new drive, it has no data.

Ergo, you cannot rebuild the array or increase fault tolerance. Any more failure will lose the array.

And I also think the HP engineer gave you some VERY BAD advice about removing drive 8:1 while the array was still operational, because the array would be lost completely. Do the experts agree?

Also: Is the backup running yet...

I must add that I've seen random failing drives in MSA30 enclosures because of the following (as I was explained by HP): the firmware on a disk drive can be too new for the MSA drive enclosure (e.g. you put a brand new disk drive in an enclosure with old firmware).
The drive will work for some time, but at some point the disk drive does not respond to commands the way the MSA expects it to, and the drive gets failed. This can happen at seemingly random times but will just happen again if you replace the drive. The solution for that was to upgrade the MSA firmware on both controllers, but I left that job to the real specialists. I can't remember any version numbers, sorry. Better note this to the HP rep as well, maybe he can help you.

Accepted Solution

PaperTiger earned 1500 total points
ID: 33484837
I wouldn't sweat it too much at this point.

It's likely that drive 8:3 failed but your 8:1, I assume is your hot spare, didn't come online for whatever reason.

Technically replacing both drives one at a time should be fine but I like to go with a safer route of:

1. Run an image cold backup immediately so that you can preserve the data.
2. after that, replace both drives but one at a time.

Author Closing Comment

ID: 33827168
Thank you all.  Issue now resolved

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Learn how the use of a bunch of disparate tools requiring a lot of manual attention led to a series of unfortunate backup events for one company.
In this Micro Tutorial viewers will learn how they can get their files copied out from their unbootable system without need to use recovery services. As an example non-bootable Windows 2012R2 installation is used which has boot problems.
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question