Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

HP Proliant server with P410i controller recovery

Posted on 2012-09-18
23
Medium Priority
?
3,370 Views
Last Modified: 2012-10-21
Hi Experts,

I have following question, I have 1 HP Proliant server with 60GB of RAM used for ESX hypervisor. I have P410i controller with 512 MB battery-backup cache.

I have 4 RAIDs created on the controller, 3 of them as RAID 1 and 1 RAID0 where only some tests are done and in case of failure this data can be lost.

Now I received some failure of 2 RAIDS 1. Please see picture.

Those problematic RAIDs are consisting of cheap 2 TB disks where fault tolerance is covered by mirroring RAID.

My question is how to correctly solve the problem?

I have new disks which I can replace.

Second picture shows the options which are available when system is starting.

Please advice

Vladimir
Photo-18.09.12-14-05-25.jpg
Photo-18.09.12-10-07-17.jpg
0
Comment
Question by:vladobb
  • 7
  • 7
  • 4
  • +1
21 Comments
 
LVL 5

Expert Comment

by:sfmny
ID: 38410167
I am pretty confident that the server supports hot swappable drives (I'd check that first). So you can take the failed drive out and insert the new drive without having to do anything.

If you have to, you can boot into the controller and check recovery options.
0
 

Author Comment

by:vladobb
ID: 38410201
Those disks that failed are cheap normal 7200 disks, i.e. they are not hot swapable, but it is not a problem, I can stop machine, replace drive and start again. I have hot swapable drives only for system (those are 2 cheetah 15000RPM drives in miror RAID).

But what is confusing me are options:
if I press F2 it says that logical drives will be enabled but SOME DATA will be lost - how it is possible for mirrored RAID? My understanding is that when 1 of 2 disks fails, full image is on another healthy drive.

Second option is F1 when logical drives on RAID will be disabled.

I mean I have no big issue with those RAID, I can completely erase and recover from backup, but why the hell then I aim RAIDING them as mirror?
0
 
LVL 47

Accepted Solution

by:
David earned 2000 total points
ID: 38410212
Desktop drives are unacceptable for use with that RAID controller.   Several reasons why, but the biggest one is that the firmware in those disks are designed for non-raid, and deep recovery.  An unreadable block could lock up the drive for 30+ seconds.

Enterprise drives will typically give up 2-3 seconds as premise is the data is available via parity on other drives.  So what happens is the HP controller thinks that because the disk didn't come back in a few seconds, it died.

The solution is to get the right kind of disk drives if you want to use the SMARTArray controller. Period.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 47

Expert Comment

by:David
ID: 38410232
All SATA disks are hot swappable, by the way.  This is a requirement and design point of the ANSI SATA spec.   There is no such thing as a non-hot swappable SATA disk drive.  THere isn't even a programmatic way for a developer to query a disk to see if it is in a hot-swappable canister or backplane.
0
 
LVL 56

Expert Comment

by:Handy Holder
ID: 38410426
Dlethe's right, the controller threw them out, you can tell that from the "previously failed drives" part of the message.

Also F2 Doesn't imply data loss, it means the data on the replacement (or previously failed) drive will be over-written by the re-mirroring process.
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38410451
Hot swappable is more of a controller and system bus feature now. Since you can stop the machine, I'm not concerned about that now.

How many physical disks do you have? How many of those disks failed?

With RAID 1 +0, if you have 4 drives, you can survive a single hard drive failure for sure. If you have 2 drives fail, then there's a 50-50 chance of recovery - if 2 disks of the same mirrored pair fail. Let me know.
0
 

Author Comment

by:vladobb
ID: 38410456
Super, I will give it a try.

I checked attached manual and there is no problem with light, nothing is amber, i.e. when everything is back online and I will create uptodate backup, how I can recognize which of those 2 disks in mirrored array is bad?
Manual-of-RAID-Controllers-for-H.pdf
0
 

Author Comment

by:vladobb
ID: 38410465
I have 4 raids, each consisting of 2 disks.

There are 2 failed RAIDS as you can see in the picture, one is recovering and one is failed.

each of those 2 RAIDs is consisting of 2 2TB disks, i.e. final 2 RAIDS are 2 TB large.
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38410469
Also, in the image, I see RAID 1+0 not RAID 1 as mentioned above. I'm assuming RAID 1+0.
0
 

Author Comment

by:vladobb
ID: 38410477
What is exactly difference between RAID 1 and RAID 1+0? My understanding is that RAID 1+0 just indicates that there is no spare drive available which can be used immediately by controller in case of problem, am I right?
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38410552
I'm trying to follow along because the solution might lie here. Are the logical drives listed in the image, the same as physical drives you can take out?

So you've got 4 RAID volumes:

3 x RAID 1+0 (not RAID 1 which is different from RAID 1+0)
1 x RAID 0

So is it that:
Each RAID volume is on the same 2 physical drives (2 physical drives in total?)

For data recovery in RAID 1+0 (the image you attached says 1+0), the RAID volume needs to be spread across a minimum of 4 physical drives. For a RAID 1, you need 2 physical drives and you can recover from a single drive failure. It doesn't appear to me that you have RAID 1.

Have I got the right picture of your set up so far? Am I making sense?
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38410565
With RAID 1+0 you have two levels of RAID. This is a quick tutorial I found via Google:
0
 
LVL 56

Expert Comment

by:Handy Holder
ID: 38410702
>Since you can stop the machine, I'm not concerned about that now.

I am concerned about it, you should never replaced disks on a HP Smart Array controller with the server switched off, it should always be done powered on except in a few exceptional circumstances.
0
 

Author Comment

by:vladobb
ID: 38410730
In fact real disks are as follow:
from picture:
1. 2 physical drives Cheetah 15000 RPM 300 GB
2. 2 physical drives, desktop SATA 2 TB
3. 2 physical drives, desktop SATA 2 TB
4. 2 old physical drives, desktop SATA 250 GB

in total 8 running physical drives
0
 
LVL 56

Expert Comment

by:Handy Holder
ID: 38410800
Ignore the fact that ORCA describes them as RAID 1+0, it doesn't differentiate between RAID 1 and 1+0. In your case they are simple RAID 1s.

Whether logical drive 2 is actually failed or just marked as failed depends on whether you pressed F1 or F2 during POST. F1 and it has to take an educated guess as which to overwrite, F2 temporarily disables the logical disk so you can try one or the other on its own in case they both have valid data on them.

More information is available by booting SmartStart CD (or the operating system) and running the Array Configuration Utility or Array Diagnostic Utility.
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38410853
@andyalder - Not sure about that. I have never seen that as a requirement for the controller. Is it something HP recommends?

@vladobb - from what you have here:

1 x RAID 1+0 sits on 2 x Cheetah ... 300GB (call this volume RAID A)
1 x RAID 1+0 sits on 2 x desktop SATA 2 TB (call this volume RAID B)
1 x RAID 1+0 sits on 2 x desktop SATA 2 TB (call this volume RAID C)
1 x RAID 0 sits on 2 x old physical drives - data loss is acceptable here, right? (call this volume RAID D)

With this setup, you can't recover from a RAID 1 + 0 failure since you need 4 drives. also, the largest RAID volume you can build is limited by the smallest size of hard drives.

So depending on the size requirement, you can put 3 RAID 1+0 volumes - RAID ABC on 4 different drives. If you use 2 x 300GB and 2 x 2TB, then 300GB is the limit on the other drives too. If you use all 4 x 2 TB then you can have 2TB as the limit on each drive, however, they are all desktop drives and not 15000rpm. It's a trade off unless you can buy 2 x 15000RPM.

To answer your question, since the setup is RAID 1+0 on 2 drives, you can't recover data loss right now. This is good to know - backup and move the good RAID 1+0 to the right config ASAP.

You either need to switch to RAID 1 on 2 drives or do RAID 1+0 on 4 drives. You can have RAID A, B, C volumes on the SAME 4 physical drives and recover from a single HDD failure. Let me know if it makes sense.
0
 

Author Comment

by:vladobb
ID: 38410943
Hi it makes all sense, Raid D is shit, data "expendable" :-)

I have to learn what exactly is RAID 1+0 as I obviously do not understand concept.

This was the only option offered by P410i controller for hundreds of $ if I selected 2 disks.

My idea was that controller simply create simplest of all RAIDs, i.e. mirrored RAID.

Give me few hours to study what exactly is RAID 1+0

For more than 2 drives I was offered option of RAID 5 (with total capacity of CapacityOfSingleDrive*(NoOfDrives-1)) with possibility for failure of ony one at any time.

Or RAID 6 (however HP  needs some more money for licence - unbelievable) where total capacity will be CapacityOfSingleDrive*(NoOfDrives-2)) with possibility for failure of any 2 drives any time.

Many many thanks for your support

V.

PS: It looks as optimal solution - instead of B, C, D to take them out, replace with 6 server 3TB drives, create RAID 5 with total capacity 15 TB, correct? Then I will be able to survive succesfully lost of any of those 6 disks.

PS2: But what to do with system Cheetah speedy drives? Do I understand correctly that current scenario RAID 1+0 does not protect me FULLY against failure of any of those 2 disks?
0
 
LVL 5

Expert Comment

by:sfmny
ID: 38411167
Hi Vlad,

You can use RAID 5 on 3 drives and upward. It can guard against a single disk failure and only one at a time. The message you received is correct. The problem with RAID 5 is performance while rebuilding an array especially on large disks. It could take more than a day depending on disk size.

If you can buy more drives, I'd suggest you move RAID A as 1+0 onto 4 x Cheetahs and move RAID B, C as 1+0 onto 4 x 2Tb.

If you can't buy drives, can you change RAID types in place? (I don't think p410i supports that but I haven't checked for any firmware updates for a couple of years now so can't say for sure).

With the current scenario - RAID 1+0 on volume A is as vulnerable as the other volumes. If RAID 1+0 is implemented as RAID 1 on 2 drives, then you can recover from one failure. There is no standard rule if you have fewer disks, not sure what the controller would set it to in that case.

You can check this via the SmartUtility CD HP provides for the server. If you don't have it (should've come with install), go to hp.com/support and enter your server name and look for it. You might have to burn the ISO to a CD.
0
 
LVL 56

Expert Comment

by:Handy Holder
ID: 38411372
>I have to learn what exactly is RAID 1+0 as I obviously do not understand concept.

No, you have to learn that HP's GUI displays RAID 1 as RAID 1+0.

Do you need any data to be recovered from the array marked as failed? If so I would be very wary of following any suggestions until verified by more than one person. Also we'd need to know aprox how much it's worth to you since that affects whether to engage a DR expert or give it a go on your own with relatively cheap software.
0
 

Author Comment

by:vladobb
ID: 38506908
I've requested that this question be deleted for the following reason:

None of the answer solved my problem. I have backup all the data, removed all the disks, tested them thoroughtly and have not found any error.
0
 
LVL 47

Expert Comment

by:David
ID: 38506909
Answer http:#a38410212 absolutely explains what is going on.  The drives are not compatible with the firmware due to error recovery timing (TLER is a common acronym for it).

The disks will pass diagnostics because there is nothing wrong with them.  There is nothing wrong with the controller.  It is an INTEROPERABILITY issue.

This will continue to happen.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Windows Server 2003 introduced persistent Volume Shadow Copies and made 2003 a must-do upgrade.  Since then, it's been a must-implement feature for all servers doing any kind of file sharing.
Microsoft Jet database engine errors can crop up out of nowhere to disrupt the working of the Exchange server. Decoding why a particular error occurs goes a long way in determining the right solution for it.
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question