How to swap out a failing SATA drive on an HP server with B110i SATA RAID controller

jkirman
jkirman used Ask the Experts™
on
Greetings,

I have recently taken over an account that has a branch office with 3 older HP ProLiant servers and c. 12 PC's.  One of the servers is an older ProLiant ML110 G6 server running Windows Server Standard 2012 R2 with 12 GB RAM.   It is a Domain Controller with no principal applications on it.  It uses an HP Smart Array B110i SATA RAID controller with 2 arrays of SATA drives in RAID-1 config.  The 1st SATA array has 2 X 250 GB drives in Box 1 / Bays (Ports) 1 and 2 , and the 2nd SATA array has 2 X 1 TB drives in Box 1 / Bays (Ports) 3 and 4.

We are seeing a warning message when the server is rebooting, warning of an Imminent Failure of one of the hard drives.  I have attached a screen shot of the screen warnings.  Boot time warning from B110i SATA RAID controller - imminent drive failure.  The failing drive is in Box 1 Bay 2, so it's 1 of the 250 GB drives.  Specifically it is a model ATA VB0250EAVER, F/W version HPG0.

My question on this is whether this is truly a hot-swappable drive + controller.  The message in the attached screen shot says to make sure to only change out the failing drive when all drives are on.    I'm hoping someone who has worked with this controller is familiar with the proper way to swap out the failing drive.  I've worked with Dell servers for c. 25 years and the process there is beyond simple.  With Dell servers, a hot-swap chassis, and a PERC RAID controller, all I have needed to do is a) offline the failing drive, b) pull the drive, and c) put in the new / replacement drive.  Once the drive is replaced, it starts to rebuild.  With this HP B110i controller, I've read in some articles that the host server should never be shut off when the drives are replaced.  I'm also wondering if there is a similar process to offline the failing drive before you pull it out.  Past that I have zero experience with these controllers, and definitely do not want to pull out the hard drive when the system is running if it is not hot-swappable.

Separately, I've had the client get in touch with HP for out-of-warranty support, but they are looking to charge c. $850 for a single out-of-warranty support call (yow!!!).  So I'm hoping someone can provide advice and a drive-swap procedure from their experience on this before the client has to fork up a serious chunk of change to HP.

Thanks in advanced for any assistance on this.

jkirman
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Dr. KlahnPrincipal Software Engineer

Commented:
Back up the failing array to an external device.  Then back it up again to a different device using different backup software.

Buy three drives to replace into the array, replace both drives in the array and then restore from the backup.  Put the third drive on the shelf and label it "RAID HOT REPLACEMENT FOR SYSTEM blah blah **ONLY**" so that (a) you have it immediately available in case of a future problem and (b) nobody steals it for some other system.

Why both drives?  If one drive is failing, the other one is probably also reaching end of life.  Drives are cheap; human time is not.  In any case, you will want two as-close-to-identical drives in the array and it may be difficult to obtain a drive to match the failing one.
jkirmanPrincipal

Author

Commented:
Dr. Klahn - thank you for your recommendations on backing up to 2 media and then replacing the current RAID-1 drives.  However, your response did not answer my question.  Perhaps I wasn't clear, but when you say replace the drives - that is specifically what I'm asking about.  So for example -

1) I am requesting confirmation that the drives and controller are designed for hot-swap, or do I need to power off the server.  I'm 99% sure this is hot-swap, but would prefer someone who has worked with this already can confirm on that.

2) Do I just pull out the failing drive, or do I need to do something with the Array Configuration Utility or the Smart Storage Administrator CLI to prep the system?

3) As I'm looking at the ACU GUI right now using the Physical View, it shows all 4 drives in the system, with the 250 GB SATA in Box 1 Bay 2 showing a yellow exclamation mark.  When I right-click the drive, I only get options for More Information and View Status Alerts.  There are no options or controls anywhere to e.g. Offline the drive, Deactivate the drive, etc.  Again, I'm coming from a Dell world where the PERC controller and Open Manage software provide a lot of functionality and control on this part, and you can e.g. :

a) Offline a failing drive
b) pull out the drive after it shows Offline
c) put in a replacement drive
d) watch the drive rebuild

and that's basically it.  Here, with the B110i and the ACU GUI, I am not seeing any controls on how to physically prep the system to remove the drive as I would be able to with a Dell system, per a) thru d) above.

So again, since I definitely do not want to screw up the hardware, I'm requesting a true step-by-step description of the physical removal process, possibly working with or using the ACU GUI if that is helpful to the process.

Thank you.

jkirman
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Pull out the failing drive and replace with a good working drive, it will start the rebuild automatically.

Nothing further for you to do.

I assume you already have good working backups, just in case you need to restore.
Become a CompTIA Certified Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Top Expert 2014
Commented:
Unfortunately the ML110 G6 is not hot-plug and there are no drive LEDs to identify which is which.

You will have to go into the ACU and get the serial number of the predictive fail disk, then power off and replace it with a new one and power on again and check again that you removed the right one. If you only have a second hand disk you may have to zero out the first few blocks with a PC since you can't hot-plug it easily.
WakeupSpecialist 1

Commented:
Some good information/instructions here:
https://support.hpe.com/hpsc/doc/public/display?docId=c02279604

But Andy's right, not hot-pluggable.
jkirmanPrincipal

Author

Commented:
Thanks for your collective thoughts.  From everything you've written, looks like the approach should be:

- make a note of the serial number of the failing drive per the ACU interface  (already done)
- shut the server
- pull out the failing drive from Box 1, Bay 2, as per the S/N
- make a note of the replacement drive's model and serial number
- install the replacement drive
- power on the server

Now here's the part I'm assuming, since I've never dealt with this specific controller:

- Let the system boot normally
- After booting and login, go into the ACU and I'll see the RAID-1 container rebuilding

I've read through a few online postings by admins with similar issues with this controller, and get the impression that I should NOT go into the B110i BIOS at boot time or interrupt the boot process, but rather should simply let everything start up on its own, once I've replaced the failing drive.

Appreciate any confirmations on the above / latest post in advance, and thanks again for the info and thoughts to date.

jkirman
Top Expert 2014
Commented:
The S/N on the paper label may not be the same as electronic one the ACU displays but check it afterwards to make sure you replaced the right one.

Yes, use the ACU rather than the BIOS utility to check it is rebuilding as it is a fakeRAID controller and does not rebuild until the OS is running. There is no option to take it offline like there is with Dell/LSI controllers, nor is there an option to erase any foreign config on the disk.
jkirmanPrincipal

Author

Commented:
Andyalder, thanks for the additional details regarding the B110i controller,  Seems that these embedded / software controllers have mostly caveats attached to them, as it's somewhat mind-blowing for me to hear that you need to be in the O/S for the RAID-1 container to rebuild.  Then again, for the last couple of decades I've been creature of habit with Dell servers and PERC hardware controllers and have never used, nor even worked with, a software RAID controller.  I have also religiously avoided the S110 / S300 controllers and the like as I have read mostly complaints on performance of software RAID controllers as compared to e.g. the traditional PERC hardware controllers series.  I'm assuming they (s/w controllers) are recommended only for very budget-constrained situations.  Will hopefully be able to address the drive replacement in the next week or so and will advise of my experience with the ACU and the RAID-1 container rebuild.
jkirmanPrincipal

Author

Commented:
Thanks for your assistance and detailed information.  Keys here were that 1) the controller is not hot-plug - in its current form - and 2) that the RAID-1 rebuild will not take place until you boot into the O/S.  FWIW in my readings I found that you could upgrade the B110i to a hot-plug controller by adding a License Key, available from Amazon for c. $75, but since a straight replacement will work fine, I wouldn't be experimenting with hot swap options.

Cheers and many thanks again.

jkirman

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial