LSI MegaRaid Controller MR9240-4i - How to rebuild a Failed Hard Drive

Hi, Experts

We have a Terminal Server with Windows Server 2003 64Bit installed, which has the LSi MegaRaid MR9240-4i Raid Controller, Drive in Slot 1 has failed. I have another HDD on the way.

What I need to know, as I've never had to do anything to this controller and it was installed/configured by our Supplier (whom can't tell me for sure),  is if the Rebuild will happen automatically after I power up from replacing the (non-Hot-Swap) HDD that is failed or will I need to take specific steps to do that? If I require steps to make that happen, what are the steps?

I was not successful at finding much specific documentation at LSi website, but not that I exhausted my search.. decided to ask for help here as it will probably be faster.

Thank you
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

noxchoGlobal Support CoordinatorCommented:
What RAID do you have there? RAID1? If yes then usually the rebuild process starts itself after you replace the bad drive. For some controllers it was necessary to start the process manually.
In any case you can take a backup of your drive and simply rebuild the RAID with new drive. After that restore from backup to new configuration.
CATHY-ITAuthor Commented:
Three exact same Hard Drives, Believe that's a Raid 5?
noxchoGlobal Support CoordinatorCommented:
Yes, that's RAID5. Then you do not have any other choice than taking backup and rebuilding the RAID with new drive.
RAID5 needs at least 3 healthy drives.
Do you have already backups? If not - take full system backup urgently. You can do it with backup software such as Hard Disk Manager 14:
Take full HDD backup and then rebuild the RAID. Boot the server from Recovery CD for HDM14 and restore the backup.
If you already use backup software then do with it the same steps.
SolarWinds® IP Control Bundle (IPCB)

Combines SolarWinds IP Address Manager and User Device Tracker to help detect IP conflicts, quickly identify affected systems, and help your team take near instantaneous action. Help improve visibility and enhance reliability with SolarWinds IP Control Bundle.

CATHY-ITAuthor Commented:
I'm not overly knowledgeable on Raids, so sorry for the questions..but the Server is running fine on Two drives right now and I don't see any data loss and I thought the point of Raid was that you could lose at least one drive without losing data, so wouldn't the other two active healthy drives have all the data and just need to rebuild the failed one?.. I just wasn't sure it would rebuild automatically as the Raid 5 in our HP Domain server does with Hot_swap drives..or I would have to use Software to tell the controller to rebuild the raid..
noxchoGlobal Support CoordinatorCommented:
How do you know that one drive is bad? I assume you got this information from RAID controller, right?
It could be that drive 3 is not dead but showing some progress in a way to become bad. Thus the RAiD works yet. But in RAID the data is striped over three disks and remove one of them brakes this. Have a look on the diagrams on this explaning:
noxchoGlobal Support CoordinatorCommented:
Btw, was the server yet restarted since you identified the drive 1 as failed?
CATHY-ITAuthor Commented:
No,  It has not been restarted. I discovered it by first getting email notification of the Degrade and then couple hours later I go notice it failed. In the console, Physical View - I see Drive in slot 1 failed.. I then ordered a new drive that came in today
noxchoGlobal Support CoordinatorCommented:
Ok, then the chances that the server will not restart properly are really high. Take full backup asap.
Then go to Windows Disk Management (right click on My Computer - manage - disk management) and take a screen shot of the window. Upload it here. Is there a single drive or more HDDs?
CATHY-ITAuthor Commented:
System is backed up Nightly and since its a Terminal Server - shouldn't be any critical data on it, as that is on the other Server that Users are accessing through this terminal Server - the worst would be the time it takes to rebuild the OS to do the Restore, should I not be able to boot anymore.

I checked the Disk Manager - Showing only Drive C: (as One Drive), and Status is Healthy.. other drivers listed are just for the External drive ports for Card readers and DVD drive.
noxchoGlobal Support CoordinatorCommented:
Ok, then it ia definitely RAID5. As said already, the only way you can take is backup - rebuild - restore. Plan a window when you can maintain the server and do the work.
What kind of backups were taken? With which software?
CATHY-ITAuthor Commented:
After doing some more checking things out.. I found the User Guide for the LSI MegaRaid Controller Software managers and found the following about Rebuilding failed drives.. guess I should  have done this first perhaps.. I was original searching with How To in Google. Attached is the instructions I believe I need. Which if I understand correctly.. Power down, install new drive, make sure it  in a "Unconfigured Good"  drive State not JBOD and the Software should start the rebuild automatically or right click on the new drive and click Rebuild
noxchoGlobal Support CoordinatorCommented:
This part is valid if you have more than 3 drives in RAID5. Anyway, you can try it if you have already fresh backup.
andyalderSaggar maker's framemakerCommented:
It will normally start the rebuild automatically, the only problem that may occur is if it is a second hand disk, then you may have to clear the foreign confiruration and then set it as a hot spare.

I can't find much on LSI's website either, but there's loads on IBM's, google for "msm hot spare foreign" without the quotes for the MSM user's guide. The Sun, Bull and Cisco versions are on the first page of Google for that search too, they all use generic LSI cards so their version of the MSM user's guide is as good as LSI's.

Whilst it is always advisable to backup just for safety sake it should not be needed unless another drive fails during the quite stressful rebuild process.
CATHY-ITAuthor Commented:
Thank you Andyalder, I too am hoping I can just power down, swap out the failed for Brand New exact same Model HDD that I have and bring Server back online, go to the LSI software manger check that the drive is in Unconditional good and then click on Rebuild.. and yes.. I'm really hoping I don't lose a second drive before or during this rebuild.

I plan to do this install today at Office Closing time .. and I will post back here what my results where
CATHY-ITAuthor Commented:
Figured I'll update this question now as I just finished replacing hard drive.  Here's what needs to be done when you lose one Drive with this controller and Raid5

-In the MegaRaid Storage Manager
-Right Clicked on the Failed Drive
-Noticed - "Mark Drive as Missing" and when Hoover over that option states
"Mark an Offline/Failed drive of a degraded array as Missing in order to prepare for drive replacement"
(I was not aware of this option and the User Guide doesn’t mention it as a step in replacement,)
 but it made sense, I clicked on it and confirmed yes. Drive when Offline
      - Powered Down the Server properly, Physically Switched out the failed drive for with New replacement.
      - Power Up the Server
      - Ignored all Screen warnings and prompts and allowed the Server to boot as normal
      - After OS is loaded - Logged in etc
      - Opened the MegaRaid Storage Manager
      - Slot 1 ( that I just replaced) showing the drive as Foreign (unconfigured Good)
      - Right Clicked on it - No Option to Rebuild?
      - Noted: Option labelled"Replace Missing Drive", When Hoover over it
            Replaces a drive of a degrade array that is marked as missing with a unconfigured good drive
      - I click on Replace Missing Drive
      - Drive status changed to Offline
      - Right Clicked on Drive again
      - Click on "Start Rebuild"
      - Rebuild starts
      - After 5mins - I again went to Manage (menu) and click on Show Progress
      - Progress showed estimated time to rebuild 11hrs or so.. Left to rebuild

So as long as the other two drivers are in good shape, I hope this Raid Array will be back to optimal status by tomorrow

Thank you

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
CATHY-ITAuthor Commented:
My Answer is the best solution only because it was after the fact.

Andy Alder - your answer was in the right direction and gave me the clarification that this can be done with Raid5 but I was hoping for specific steps in hopes that someone had already used this Controller and Storage Manager and knew the steps.. as I was already  pretty sure I could rebuild without data loss with only one drive failed. but wasn't sure how since it wasn't a hot swap either.

NoxCho - my best guess is that you mixed up the Raid1 (first comment) and Raid 5(second comment). as you stated Raid 1 would do what I just did and that Raid 5 would not.. you have me confused (last comment for sure) and a bit nervous at first until I found the LSI User guide to clarify a rebuild was doable with only one drive failure. Though I think the User Guide wasn't as Step by Step friendly as it certain could have been, When it comes to Live Servers I really don't like learning on the fly.. but I understood Raid5 would rebuild one lost drive and now I know how its done with a Mega Raid controller. I'm giving you points though cause you did take the time to help me.

Thank you Both
Ed BCommented:
I know this thread is two years old, but I've replaced many drives and wanted to update this with my simpler (and safer?) solution...

To Replace a failed drive:
 - Silence Alarm (Alarm will start again whenever anything happens: removing/inserting drive, rebuild complete, etc.)
 - Right Click Failed Drive, select "Mark Drive As Missing", check "Confirm", click "Yes"
 - With server still powered ON, Remove drive, Insert new identical drive
 - Megaraid will automatically detect the new drive and start the rebuild
 - Silence any alarms and wait for the rebuild to complete before ever powering down

I believe this to be safer than @CATHY-IT 's solution because it doesn't stop the other drives while the array is in a degraded state.  The problem with powering down the array is that there's always a chance one of the other drives might not spin up.  That's CATASTROPHIC in Raid 5.  But it's also REALLY BAD in raid 6 since now you have to rebuild two drives without a third failing.  That can take days for TB drives.

So IMHO, it's much better to hot swap than shut down your degraded raid array.

Also, you should be doing backups regularly so you don't have to start thinking about backing up data while in a degraded state.  Spending hours backing up data, and taxing your drives reading every bit of data from them, adds lots more risk into your already risky situation.  The primary concerns while in a degraded state should always be 'what happens in another drive fails', and 'what's the least I can ask of these drives until my array is rebuilt'.

Now, if you haven't backed up recently, then my recommendations are as follows:
  RAID6 - just go ahead and just replace the failed drive.  Trying to back up first adds to the overall time in a degraded state and the overall risk of another drive failing.  Yes, you're trusting that your RAID controller won't fry the moment you insert that new drive, but hey you've been trusting that raid controller not to crap out for a while now :)  On the off chance that another drive fails during the rebuild, stop the rebuild (or prioritize it very low in Megaraid) and follow the instruction for RAID5 below...
  RAID5 - Keep calm and make sure you know where your towel is.  You'll need it for wiping away your panic sweat, dusting off drives, and rendering your fellow IT employee unconscious before blaming this all on him.  Now, do NOT just replace the drive.  Your primary concern must be the data.  Immediately start writing the data off to 1) drives that are not in the same raid, 2) external drives, 3) other networks, 4) USB drives.  If you really have none of theses, think about the internet (BOX, etc.), but you'll have to judge if the slow upload to the internet is worth the wait.  Once everything is backed up, proceed with swapping the drive.  Then rat-tail yourself with the towel so you remember to do backups!

One more important thought for everyone...  *** NEVER JUST MARK A FAILED DRIVE AS 'ONLINE'!!!  IT COULD CORRUPT YOUR WHOLE ARRAY!!! ***

Good Luck!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Hardware

From novice to tech pro — start learning today.