Drive Array Recovery Needed

Drive Array Recovery Needed
Post Error - 1786: Drive Array Recovery Needed
I have the above error an HP Proliant server.  One of the drives died and was replaced about three months ago.  The server has not been rebooted until now.  

Does this mean the RAID is not working and the drive has never rebuilt?
How do I check without rebooting the server if the raid is okay, HP Drive Array Utility does not open but the log file does.  Will the log display something when the drive is rebuilt?  

Sorry this is an inherited server and some off the management apps do not appear to be working.  

Any way I can check this safely would be great.  

Thanks
afflik1923Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DavidPresidentCommented:
The controller *should* be configured to automate this.  Get the correct make/model of disk and while system is still turned on just yank out the bad one and put in the replacement.  Rebuild should be automatic.  If it does not automatically start then we can take it from there.  The controller drivers & mgmt software are not an issue as this is something that is done in the controller firmware.

You NEVER want to mess with drivers, firmware, or software on a degraded array. The job is to get that array optimal and online first before rocking the boat.  Otherwise you risk 100% data loss

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DavidPresidentCommented:
P.s. the array is ONLINE but degraded.  There is no longer any redundancy. If you even get an unrecoverable read error you have partial data loss, so need to get a replacement disk ASAP.  Use this system as little as possible.  Every I/O could be the last.  Take a full backup NOW.
andyalderSaggar maker's framemakerCommented:
Can you post the ADU repoort **As attachment, not in body of thread, it's huge** and we'll look through it. It would also help to know which server/controller you have.

ACU not working is often a problem with the browser, try from a remote PC - https://<ipaddress>:2381/ACU-XE/ACU.htm

As far as the green tick in your jpg goes that means someone's marked it as repaired, it doesn't mean it has been repaired.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

afflik1923Author Commented:
dlethe
My knees started knocking and hair stood on end when I saw this error.  Full backups done, absolutley not worth taking chances on that, agreed.  

As far as I know that is what was done 3 months ago, the drive was replaced with a new one, although larger (size is not a problem it will just be wasted) hopefully all other specs the same, have not been able to confirm 100% yet.  

My worry exactly, no redundancy!


andyalder
Will try get that report posted soon.
Thanks for green tick info.
afflik1923Author Commented:
Okay, upon further investigation the situation is as follows:
OS:             Windows Server 2003
Server:       ML370 G3
Controller:  Smart Array 641
Array 1:     2 drives mirrored for OS
Array 2:     4 drives in RAID 5 with one as spare

Array 2, status - "Ready for rebuild"
   Disk Port 1 ID 4, status - "Predictive Failure"

Questions:
Simply replacing the faulty disk should automatically rebuild, as I understand?
Is there any way to rebuild this without rebooting the server? (Other than above, and as in below)
If I rebuild onto the spare, what would be the best way to do this?

Any suggestions or pointers always appreciated
Thanks




andyalderSaggar maker's framemakerCommented:
Replacing a faulty disk it will rebuild automatically so long as you don't power it off first. That's the answer to your second one too, don't turn it off, just remove / replace.

If there was a hot spare it would hae already rebuilt onto it.

If one disk is down and another in predictive failure if you remove the predictive failure one it'll die because it'll then have two disks down so be absolutely sure there's only one disk with a problem first - we can tell that from the ADU report although you can also see it from physical view under ACU.
afflik1923Author Commented:
Thanks again andyalder.

A little confused and just want to make sure why can't I power off to change the drives?  Will it not pickup that it now has a good drive but empty and then resync it?

The controller picks up the spare and it is assigned to the array but status is spare and ok, perhaps not a "hot" spare.  

All other disks are fine, only the one is faulty. I have plenty backups just in case...

DavidPresidentCommented:
the fast answer is because that is how it is designed. from developers perspective is because on a powerup you start from an unknown state and then the ctrl has to find a quarum. if your battery  is dead or NVRAM doesn't match metadata, and lun has to figure out it is degraded, then you risk losing the config if you have a hiccup.  when online and doing hot replacement the controller KNOWS  the active and inactive disks.   in perfect world it would not matter. but the hp controllers are well engineered to deal with this. batteries and clocks change and die also, so from colld state the config trust isn't as trustworthy as from hot state.  

so online is the SAFEST way to deal with muphys laws :)
afflik1923Author Commented:
Right, I've arrved onsite. But I'm concerned! A drive I replaced earlier does not actually have it's drive light on! Therefore the green disk light whish is on the other drives.

So looking ah the pciture I've attached, the first drive (from left) is the spare, the second one is apparently faulty according to the software, The third one and fourth one is one that previously failed and I had to replace, (HOWEVER does not have green ligth on even thought software reports it is fine).

So now I'm really scrared to pull out one  of the only two drives of the D drive that shows  green drive light on.

afflik1923Author Commented:
Drives as they are nowI forgot to add picture before but here it is now
afflik1923Author Commented:
actually I should add that on the fourth drive from the left (the one I replaced) the green arrow does flash at times.
andyalderSaggar maker's framemakerCommented:
Drive 3 in your pic with the arrow pointing to the disk suggests it is currently being rebuilt onto, but I wouldn't trust the LEDs to be correct. I would quiesce the OS (like Windows shutdown but not power off) and then take a stab at the right one.

You've also got the option to reboot without power off and interrupt booting by putting a CD in the drive, SmartStart for example. The controller won't know much about that except it will get a command to flush the cache.
afflik1923Author Commented:
Hi Andy, tanks for your reply. Let me get my head around this and also add a bit more info

I'm number the drives counting from the left. so drive 4 is the one I previoulsy replaced (It's actually a 15k drive with 144GB drive, but the caddy says 72GB with 10k - it was like this when I bought it)

So firstly I'll add that Drive 4, has not cyclinder hard drive light on, but the green arrow more or less flashed in sync with Drive 3,the one to it's left. This is acting like it writing at the same time as DRive 3 to it's left.

Drive 2 - the the one that according to software needs replacing, has a green arrow that is not flashing in synch wiht anything else.

I was thinking if  did shut down, and then removed Drive 2 and then powered up, at least if I did get the wrong drive at least I would not pull a live non redundent drive.

But let me absorb wht you have written.
afflik1923Author Commented:
I'm not to sure I fully understand
"I would quiesce the OS (like Windows shutdown but not power off) and then take a stab at the right one.

"

Do you mind expanding a bit more.
andyalderSaggar maker's framemakerCommented:
Time to shutdown most services and make a backup?
afflik1923Author Commented:
yes, we installed System restore, and we have take an image of the server.
I think for tongiht I'm going to leave things. We are under staffed tomorrow. then on Wednesday we can look to do it.
Obviousy I'm tempted to unplug thedrive that the software says has failed, but I don't want to spent the night rebuilding a server when we are under staffed tomorrow.

afflik1923Author Commented:
Thanks for all the input. The root issue we had not addressed was that an earlier hard drive repalcment had actually compelted. So we were already not runninig in a redunendent state when we got the new earning.

All sorted now however. Thanks
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.