Solved

Replacement drive for failed drive in RAID-5 - shutdown and replace or hot-swap replace? - HP DL 380 G3

Posted on 2010-11-26
9
1,104 Views
Last Modified: 2012-05-10
This morning I discovered that our Windows 2003 Server (and Active Directory controller) had a failed drive. There are 5 36.8 GB drives configured in a RAID-5. I have a replacement drive coming in the next four hours.

Windows boots until the login screen, then it sits there at a gray screen and no users can login on their workstations or access the Exchange server. Even though 1-drive failing should not affect the server except to slow it down, it seems to and I'm concerned.

The HP tech recommend I replace the drive while the machine is on. Then wait for it to rebuild (~9 hours) and then see if it starts working.

This doesn't make sense to me. It makes more sense to me to hard shutdown the server, Replace the drive and then boot and let the rebuild process go on while the Windows services don't try to function off the failed drive.

It also makes me think that the I will get my AD controller functioning sooner this way.

This morning I restarted the server remotely with the ILO controller with a hard reset (not knowing a drive failed). In the first few minutes of the boot process the AD and file sharing services were working, then stopped and then I couldn't log in via the login window.

So, my guess is that the drive is reporting partial failure to the array controller and continuing trying to function, but in reality it is totally not working and the system would function better if the drive was out of the equation.

Following my logic or am I nuts and ignorant? :)

Thanks.

P.S. In another worst case scenario, another drive has failed and it isn't being shown on the drive lights. I can't access the Insight Diagnostics remotely.
0
Comment
Question by:sweetseater
  • 3
  • 2
  • 2
  • +1
9 Comments
 
LVL 76

Accepted Solution

by:
Alan Hardisty earned 125 total points
ID: 34219470
The HP DL380 G3 has hot-swap drives, so you can happily replace a drive with it still running.

As for the grey screen - sounds a little peculiar, but might be disk issues that get resolved once the replacement drive has been installed.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 125 total points
ID: 34219473
Your premise is a bit incomplete.  The XOR parity in RAID5 protects against both full drive failure, and block failure.  If any surviving disks have a bad block, then you lose the entire stripe, so you can still have data loss.

That is why you need to run regular data consistency check/repairs, to clean up bad blocks that are lurking and are results of media failures AFTER the last time you read that stripe of data.

Best practice is to always do replacements hot.   The most stressful thing you can do to disk is power cycle it.  Why rock the boat?

the HP person is correct.  There is also no reason to have to guess to discover what is going wrong.  Look at the event log in the controller. it will tell you.

0
 

Author Comment

by:sweetseater
ID: 34219574
Dlethe,

I can't access the event log in the controller. Not until the server is fully functioning.

What is the method you recommend to do regular data consistency check/repairs?

Thanks for the help. I will settle down and trust the hot-swapping process.
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 
LVL 47

Expert Comment

by:dlethe
ID: 34219737
HP has a several freebies.  Certainly get latest smartstart / ACU / ADU utilities. I don't know off of my head what controller comes bundled with your system, but some of them have feature to automate the consistency repairs.

Go to the support.hp.com site, download/install latest firmware, updates, drivers, and get the add-on monitoring at the same site as well.
0
 
LVL 76

Expert Comment

by:Alan Hardisty
ID: 34219747
@dlethe - controller is as follows:

Integrated Smart Array 5i Plus Controller with optional Battery-Backed Write Cache (BBWC) Enabler option kit

http://h18000.www1.hp.com/products/quickspecs/11473_div/11473_div.HTML
0
 
LVL 55

Expert Comment

by:andyalder
ID: 34219816
You should always replace drives on Smart Array controllers hot (you can remove cold but should add hot), but that doesn't explain your grey screen. What the HP tech advises you makes no sense to me either, as you say it should be running albeit slower, it ought to boot happily even with the bad drive removed.

Could you boot SmartStart Cd and run Array Diagnostic Utility and post log file **as attachment**, we might spot something wrong with one of the other disks.

You don't have an option to start a manual parity check, Smart Array controllers do this in background after 15 seconds of inactivity.

At a guess I'd say it's more than a flakey disk, chkdsk might show a file system corruption.
0
 

Author Comment

by:sweetseater
ID: 34220146
After the drive rebuilt, which took about 2 hours only (36 GB) I was able to restart the server and it booted fine. Now running chkdsk. I did a hotswap.

I looked through some of the details the HP Insight Online edition gave, but it showed no errors for this failure and no record of it where I was looking. Attached is the ADU from the machine after the drive has been replaced.

It did show one of the drives still in there had 6 "unavailable" errors whereas all the other drives had zero and one had one. Dunno if that is pre-emptive notice an issue, but none of the drives showed read errors, etc.

 Prolaw---Array-Diagnostic-Report.zip
0
 
LVL 47

Expert Comment

by:dlethe
ID: 34220323
Don't confuse XOR parity errors with file system errors.   They are independent of each other.  
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever notice how you can't use a new drive in Windows without having Windows assigning a Disk Signature?  Ever have a signature collision problem (especially with Virtual Machines?)  This article is intended to help you understand what's going on and…
Learn about cloud computing and its benefits for small business owners.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question