Link to home
Start Free TrialLog in
Avatar of Amr Sayed
Amr Sayed

asked on

RAID 10 "Logical Drive has failed and cannot be used. All data on this logical drive has been lost" with one failed disk only

Hi,
I've an HP ML350 G9 server that has a failed logical drive, this logical drive was setup RAID 10 and I can only see from the log (see below) that there is only one failed disk why am I getting the "Logical Drive 3 has failed and cannot be used. All data on this logical drive has been lost" the RAID 10 should recover till 2 failed disk drives?

can you please help?

Details:

Smart Array P440ar

Critical Status Message(s)

274 0 GB SAS HDD at Port 2I : Box 6 : Bay 6 is bad or missing. To correct this problem, check the data and power connections to the physical drive. For more information, generate a diagnostics report.
298 Array C - 1 Logical Drive(s) contains a failed physical drive. To correct this problem, check the data and power connections to the physical drives or replace the failed drive. For more information, generate a diagnostics report.
271 Logical Drive 3 has failed and cannot be used. All data on this logical drive has been lost. Configuration changes to this logical drive are not allowed until this problem is corrected. Also, if your controller supports Expansion, Extension, or Migration, these operations will not be available for any logical drives in the array until the problem is corrected. Replace any failed physical drives and re-enable the failed logical drive. For more information, generate a diagnostics report.

Warning(s)

822 The cache for Smart Array P440ar in Embedded Slot has been disabled because there is no battery/capacitor attached to the cache module.
341 300 GB SAS HDD at Port 2I : Box 6 : Bay 5 is predicted to fail soon.
341 300 GB SAS HDD at Port 2I : Box 6 : Bay 8 is predicted to fail soon.


ADUReport.txt
Avatar of Travis Martinez
Travis Martinez
Flag of United States of America image

You can loose two disks in a RAID 10 with 4 drives; however, you have one failed drive and two others that are being predicted as failing soon.  I would replace the 21:6:6 disk and let the RAID rebuild.  It's possible that you can reseat the failed drives and the predictive failure will reset although you should be looking at replacing all three disks.  One at a time though.

I don't think your data is lost it's just inaccessible.
You can loose two disks in a RAID 10 with 4 drives

actually, a RAID 1+0 with 4 physical disks has 2 groups with 2 disks in one group (in your case the groups are bay 5 + 6, and bay 7 + 8)

Raid 1+0

4 Disks:                      
Disk1 Disk2 Disk3 Disk4 
----- ----- ----- -----        
| a | | a | | b | | b |
| c | | c | | d | | d | 
----- ----- ----- -----        
G1 = {D1, D2}            
G2 = {D3, D4}  

Open in new window

         
                             
all data is lost if both disks of a group failed.

the whole array was disabled because the risk of lost data is too high. you should be able to activate it again, after the disk in bay6 was replaced. however you may consider to clone the weak disks in bay 5 and bay 8 before such that you have a backup in case it would fail.

so i would do:

  • clone the weak disks of bay 5 and bay 8 for example by using a desktop system and some clone tool
  • put the weak disks back to the RAID and replace bay 6 disk by an empty new disk.
  • activate the array and let the system fill the empty disk.
  • if this fails because of any of the weak disks, try to exchange it with one of the clones.
  • after you recovered exchange the weak disks withe new disks or with the clones
  • (alternatively, you may consider to removing all disks from the array and use the two cloned disks plus two new disks to repair the array)

Sara
Avatar of Member_2_231077
Member_2_231077

If you  upload an ADU report I'll look through it for you. Turn XML off if possible it's nearly impossible to read with that option turned on.
Avatar of Amr Sayed

ASKER

Thank you all for your help here ...

If you  upload an ADU report I'll look through it for you
Attached

the whole array was disabled because the risk of lost data is too high
if it's disabled only, what does this log means! "271 Logical Drive 3 has failed and cannot be used. All data on this logical drive has been lost. "
ADUReport.txt
2I:6:6 has recently failed but 2I:6:8 had previously failed or at least wasn't used although it was meant to be.

4        Physical Drive (300 GB SAS HDD) 2I:6:5 Physical Drive (300 GB SAS HDD) 2I:6:7 Informational
 5        Physical Drive (0 GB SAS HDD) 2I:6:6   Physical Drive (300 GB SAS HDD) 2I:6:8 Informational  <------ failed
 6        Physical Drive (300 GB SAS HDD) 2I:6:7 Physical Drive (300 GB SAS HDD) 2I:6:5 Informational
 7        Physical Drive (300 GB SAS HDD) 2I:6:8 Physical Drive (0 GB SAS HDD) 2I:6:6   Informational <------- 0GB used on this disk !

Perhaps 2I:6:8 had been previously replaced and failed to rebuild.

So two disks in a mirror dead. Do you have a backup?
SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The big RED/Port Wine "do not remove" light is often misinterpreted as the fault LED on these caddies
What if that was the cause of this issue, how can I rebuild the array, I don't have a backup because I was in the middle of a migration process
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Note: this must be done cold with the server off as opposed to hot-plugging a replacement. The reason is the same in both cases, with hot-swap the controller treats it as a new drive, with power off it reads the metadata on all the disks to make sense of the configuration.
Did it come up OK when you put the good disk back in and rebooted?
Smart Array controllers do not disable arrays "because the risk of data loss is too great" the array is either online or dead. In this case the wrong disk was swapped out causing a double fault.