Link to home
Start Free TrialLog in
Avatar of SeeDk
SeeDk

asked on

PowerEdge T110 - Replaced failed drive in RAID10. How to get RAID 10 configuration back to normal operation?

It is using a PERC S100 controller.

There is one Virtual Disk. Configured as RAID 10. State is degraded.

Before replacing the drive:

There were 3 hard drives visible. Split between Span 0 and Span 1.

In Span 0, the drives visible were 0:0 and 0:1.

In Span 1: the drive visible was 0:3.

This meant drive 0:2 had failed and so I replaced it. However, after replacing it; the new drive was placed into it's own RAID 0 by the controller.

I removed that RAID 0 and tried adding the new drive to the RAID10 but the only option I saw available was 'Assign Dedicated Hot Spare'. I selected this and the Virtual Disk started 'rebuilding'.

But now when I go into the Virtual Disk details I see:

In Span 0, the drives visible are 0:0 ,0:1 and 0:2.

0:0 is in a 'Ready' State. 0:1 and 0:2 are 'Online'.

In Span 1: the drive visible is still only 0:3.

The way I interpret is that now it is rebuilding Span 0 with drives 0:1 and 0:2 but not including 0:0...is that correct?

How would I add a drive back into Span1?
Avatar of arnold
arnold
Flag of United States of America image

A hot spare will be auto used.

Which controller is in use?
Sound as though the drives are not hot swap, do you have Dell openmanage server node Instalked through which you can manage the hardware.

The rebuild will take time to complete at which point the volume should be back in optimal mode.

Based your you have a mirror of stripes, this takes longer to rebuild compared to stripe of mirrors. Where a one to one rebuild takes place.

Using the Dell GUI tool, you shoukd see the status of the rebuild.
Why oh why did Dell have to change the names of everything!! Why did a stripe become a Span, Spanning means something different to a Stripe!

If you really do have ONE mirror of multiple stripes that is NOT RAID10 (well maybe to Dell). the accepted definition (by the rest of the storage industry) is that RAID10 is a single stripe of multiple mirrors! The alternative (ie mirror of stripes) is normally seen as being prone to data-loss!
S100 is software/chipset based RAID? Use Open Manage to manage the RAID setup.

Using the service tag on Dell's support site please confirm that the drive cage is hot swap.

I suggest rebooting the server into the RAID "BIOS" to manage the drives if all else fails.
Gerald,
What is a stripe In that conext?
Not sure the origin, Solaris used striping to convey the software meta creation.

Dell's span conveys the idea single volume spans multiple drives.
SOLUTION
Avatar of Gerald Connolly
Gerald Connolly
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The issue might be less Dell, but the asker's statement that he has a raid10
If not mistaken since perc4 raid10 is a stripe of mirrors.
Their technical writers, might, the controller setup ....
Avatar of SeeDk
SeeDk

ASKER

The rebuild is done and this is the result....

User generated image

So as I thought it took out 0:0 from Span0 and put 0:2 in its place instead of placing 0:2 into Span1 where it is needed...not sure why.
And now 0:0 is shown as an available disk...

User generated image
What is my next option here? Try assigning 0:0 as a global hot spare?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You in effect have three
Span0
Span1
Resulting mirroring span0 and span1


Have not worked with the one you have, but r other RAID controllers with hot-swap when a failed drive is removed and its replacement inserted the controllers detects the insertion of the new device at which point a rebuild is auto-triggered.
Avatar of SeeDk

ASKER

Yes I know other RAID controllers should do it automatically but this one doesn't support hot swap.
Per Dell, if the chassis has to be opened to access the drives, it is not hot swappable..this chassis needs to be opened.
I am surprised it even rebuilded Span0 at all since that one was healthy.

I don't understand why selecting 'Assign Dedicated Hot Spare' on the Virtual Disk caused it to rebuild the healthy Span 0 while ignoring the degraded Span1.

Selecting 'Assign Global Hot Spare' from the Physical Disk menu did correctly assign the rebuild of the disk to Span1.

Also from what I understand the way Dell has designed this is that each Span is only a RAID1 holding half of the data...so if that disk 0:3 fails during the rebuild all the data is gone. Is this industry standard or Dell standard?
I do not believe it is rebuilding the Span 0, but indicates that span 0 and span 1 are in a rebuilding state
It can not rebuild span 0
The display is confusing.
The disk display should reflect the drive status

Are you able to look at virtual volumes
Volume 1 status rebuilding 5%
Or this controller, does not do that.

It has to mirror with span0 as the reference to create span1
Avatar of SeeDk

ASKER

The first screenshots I posted was after the initial rebuild which rebuilt Span0...not the one I am doing now. This is the current status:

User generated image
User generated image
Avatar of SeeDk

ASKER

And yes it is a bit confusing because it says rebuilding for the Virtual Disk as a whole rather than showing which hard disk exactly is being rebuilt.
But this last screenshot occurred after I set up 0:0 as a Global Hot Spare. It is even shown in the screenshot that it is a Global Hot Spare. So it must be rebuilding Span1 on disk 0:0.
RAID 10 is a stripe across two mirrored pairs. We don't use it anymore because we had one drive fail then its mirrored pair died not long after replacing the bad one and the rebuild had begun.

So yes, that is what it is for RAID 10.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of SeeDk

ASKER

At the time of  the screenshot - maybe 30 minutes. This controller is very slow to rebuild. It took a day and a half for the Span0 rebuild which I did not want to do.
I imagine the same length of time for this - assuming 0:3 doesn't fail. Bracing myself for a failure!
We stopped using software/chipset RAID a long time ago. Too buggy and tended to go full-stop if a drive failed while a hardware RAID controller would keep going plus have a buzzer to indicate drive failure.

Make sure the backups are good. The extra stress on the mirror partner during rebuild could indeed trigger a failure.
Avatar of SeeDk

ASKER

Well, must be running some good luck. The rebuild completed with no problems. It took long, just north of 48 hours but it finished. Now to replace that degraded disk and it will be in great shape.
What do you mean, with the rebuild complete, the raid shoukd be in optimal state.
If you have a disk that has a predictive failure, you shoukd backup the data. The replacement of the failing drive, will require the same rebuild process.....
Avatar of SeeDk

ASKER

Yes, the RAID is listed as optimal and yes i mentioned in my last post i will also replace the other hard disk which is degraded/predictive failure.
the age of the remaining drives are similar, make sure that you have setup a good backup process.
While at the same time replacing the failing drive.
Avatar of SeeDk

ASKER

Found correct menu option on this controller