Solved

LSI MegaRaid Controller MR9240-4i - How to rebuild a Failed Hard Drive

Posted on 2014-10-01
17
13,595 Views
Last Modified: 2016-08-27
Hi, Experts

We have a Terminal Server with Windows Server 2003 64Bit installed, which has the LSi MegaRaid MR9240-4i Raid Controller, Drive in Slot 1 has failed. I have another HDD on the way.

What I need to know, as I've never had to do anything to this controller and it was installed/configured by our Supplier (whom can't tell me for sure),  is if the Rebuild will happen automatically after I power up from replacing the (non-Hot-Swap) HDD that is failed or will I need to take specific steps to do that? If I require steps to make that happen, what are the steps?

I was not successful at finding much specific documentation at LSi website, but not that I exhausted my search.. decided to ask for help here as it will probably be faster.

Thank you
0
Comment
Question by:CATHY-IT
17 Comments
 
LVL 46

Assisted Solution

by:noxcho
noxcho earned 100 total points
ID: 40354677
What RAID do you have there? RAID1? If yes then usually the rebuild process starts itself after you replace the bad drive. For some controllers it was necessary to start the process manually.
In any case you can take a backup of your drive and simply rebuild the RAID with new drive. After that restore from backup to new configuration.
0
 

Author Comment

by:CATHY-IT
ID: 40354723
Three exact same Hard Drives, Believe that's a Raid 5?
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40354772
Yes, that's RAID5. Then you do not have any other choice than taking backup and rebuilding the RAID with new drive.
RAID5 needs at least 3 healthy drives.
Do you have already backups? If not - take full system backup urgently. You can do it with backup software such as Hard Disk Manager 14: http://www.paragon-software.com/small-business/hdm-business/license.html
Take full HDD backup and then rebuild the RAID. Boot the server from Recovery CD for HDM14 and restore the backup.
If you already use backup software then do with it the same steps.
0
 

Author Comment

by:CATHY-IT
ID: 40354798
I'm not overly knowledgeable on Raids, so sorry for the questions..but the Server is running fine on Two drives right now and I don't see any data loss and I thought the point of Raid was that you could lose at least one drive without losing data, so wouldn't the other two active healthy drives have all the data and just need to rebuild the failed one?.. I just wasn't sure it would rebuild automatically as the Raid 5 in our HP Domain server does with Hot_swap drives..or I would have to use Software to tell the controller to rebuild the raid..
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40354872
How do you know that one drive is bad? I assume you got this information from RAID controller, right?
It could be that drive 3 is not dead but showing some progress in a way to become bad. Thus the RAiD works yet. But in RAID the data is striped over three disks and remove one of them brakes this. Have a look on the diagrams on this explaning: http://www.thegeekstuff.com/2010/08/raid-levels-tutorial/
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40354910
Btw, was the server yet restarted since you identified the drive 1 as failed?
0
 

Author Comment

by:CATHY-IT
ID: 40354934
No,  It has not been restarted. I discovered it by first getting email notification of the Degrade and then couple hours later I go notice it failed. In the console, Physical View - I see Drive in slot 1 failed.. I then ordered a new drive that came in today
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40354977
Ok, then the chances that the server will not restart properly are really high. Take full backup asap.
Then go to Windows Disk Management (right click on My Computer - manage - disk management) and take a screen shot of the window. Upload it here. Is there a single drive or more HDDs?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:CATHY-IT
ID: 40355008
System is backed up Nightly and since its a Terminal Server - shouldn't be any critical data on it, as that is on the other Server that Users are accessing through this terminal Server - the worst would be the time it takes to rebuild the OS to do the Restore, should I not be able to boot anymore.

I checked the Disk Manager - Showing only Drive C: (as One Drive), and Status is Healthy.. other drivers listed are just for the External drive ports for Card readers and DVD drive.
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40355033
Ok, then it ia definitely RAID5. As said already, the only way you can take is backup - rebuild - restore. Plan a window when you can maintain the server and do the work.
What kind of backups were taken? With which software?
0
 

Author Comment

by:CATHY-IT
ID: 40355087
After doing some more checking things out.. I found the User Guide for the LSI MegaRaid Controller Software managers and found the following about Rebuilding failed drives.. guess I should  have done this first perhaps.. I was original searching with How To in Google. Attached is the instructions I believe I need. Which if I understand correctly.. Power down, install new drive, make sure it  in a "Unconfigured Good"  drive State not JBOD and the Software should start the rebuild automatically or right click on the new drive and click Rebuild
LSI-Rebuild-instructions.JPG
0
 
LVL 46

Expert Comment

by:noxcho
ID: 40355182
This part is valid if you have more than 3 drives in RAID5. Anyway, you can try it if you have already fresh backup.
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 400 total points
ID: 40359940
It will normally start the rebuild automatically, the only problem that may occur is if it is a second hand disk, then you may have to clear the foreign confiruration and then set it as a hot spare.

I can't find much on LSI's website either, but there's loads on IBM's, google for "msm hot spare foreign" without the quotes for the MSM user's guide. The Sun, Bull and Cisco versions are on the first page of Google for that search too, they all use generic LSI cards so their version of the MSM user's guide is as good as LSI's.

Whilst it is always advisable to backup just for safety sake it should not be needed unless another drive fails during the quite stressful rebuild process.
0
 

Author Comment

by:CATHY-IT
ID: 40359952
Thank you Andyalder, I too am hoping I can just power down, swap out the failed for Brand New exact same Model HDD that I have and bring Server back online, go to the LSI software manger check that the drive is in Unconditional good and then click on Rebuild.. and yes.. I'm really hoping I don't lose a second drive before or during this rebuild.

I plan to do this install today at Office Closing time .. and I will post back here what my results where
0
 

Accepted Solution

by:
CATHY-IT earned 0 total points
ID: 40360432
Figured I'll update this question now as I just finished replacing hard drive.  Here's what needs to be done when you lose one Drive with this controller and Raid5

-In the MegaRaid Storage Manager
-Right Clicked on the Failed Drive
-Noticed - "Mark Drive as Missing" and when Hoover over that option states
"Mark an Offline/Failed drive of a degraded array as Missing in order to prepare for drive replacement"
(I was not aware of this option and the User Guide doesn’t mention it as a step in replacement,)
 but it made sense, I clicked on it and confirmed yes. Drive when Offline
      - Powered Down the Server properly, Physically Switched out the failed drive for with New replacement.
      - Power Up the Server
      - Ignored all Screen warnings and prompts and allowed the Server to boot as normal
      - After OS is loaded - Logged in etc
      - Opened the MegaRaid Storage Manager
      - Slot 1 ( that I just replaced) showing the drive as Foreign (unconfigured Good)
      - Right Clicked on it - No Option to Rebuild?
      - Noted: Option labelled"Replace Missing Drive", When Hoover over it
            Replaces a drive of a degrade array that is marked as missing with a unconfigured good drive
      - I click on Replace Missing Drive
      - Drive status changed to Offline
      - Right Clicked on Drive again
      - Click on "Start Rebuild"
      - Rebuild starts
      - After 5mins - I again went to Manage (menu) and click on Show Progress
      - Progress showed estimated time to rebuild 11hrs or so.. Left to rebuild

So as long as the other two drivers are in good shape, I hope this Raid Array will be back to optimal status by tomorrow

Thank you
0
 

Author Closing Comment

by:CATHY-IT
ID: 40367919
My Answer is the best solution only because it was after the fact.

Andy Alder - your answer was in the right direction and gave me the clarification that this can be done with Raid5 but I was hoping for specific steps in hopes that someone had already used this Controller and Storage Manager and knew the steps.. as I was already  pretty sure I could rebuild without data loss with only one drive failed. but wasn't sure how since it wasn't a hot swap either.

NoxCho - my best guess is that you mixed up the Raid1 (first comment) and Raid 5(second comment). as you stated Raid 1 would do what I just did and that Raid 5 would not.. you have me confused (last comment for sure) and a bit nervous at first until I found the LSI User guide to clarify a rebuild was doable with only one drive failure. Though I think the User Guide wasn't as Step by Step friendly as it certain could have been, When it comes to Live Servers I really don't like learning on the fly.. but I understood Raid5 would rebuild one lost drive and now I know how its done with a Mega Raid controller. I'm giving you points though cause you did take the time to help me.

Thank you Both
0
 

Expert Comment

by:Ed B
ID: 41773028
I know this thread is two years old, but I've replaced many drives and wanted to update this with my simpler (and safer?) solution...

To Replace a failed drive:
 - Silence Alarm (Alarm will start again whenever anything happens: removing/inserting drive, rebuild complete, etc.)
 - Right Click Failed Drive, select "Mark Drive As Missing", check "Confirm", click "Yes"
 - With server still powered ON, Remove drive, Insert new identical drive
 - Megaraid will automatically detect the new drive and start the rebuild
 - Silence any alarms and wait for the rebuild to complete before ever powering down

I believe this to be safer than @CATHY-IT 's solution because it doesn't stop the other drives while the array is in a degraded state.  The problem with powering down the array is that there's always a chance one of the other drives might not spin up.  That's CATASTROPHIC in Raid 5.  But it's also REALLY BAD in raid 6 since now you have to rebuild two drives without a third failing.  That can take days for TB drives.

So IMHO, it's much better to hot swap than shut down your degraded raid array.

Also, you should be doing backups regularly so you don't have to start thinking about backing up data while in a degraded state.  Spending hours backing up data, and taxing your drives reading every bit of data from them, adds lots more risk into your already risky situation.  The primary concerns while in a degraded state should always be 'what happens in another drive fails', and 'what's the least I can ask of these drives until my array is rebuilt'.

Now, if you haven't backed up recently, then my recommendations are as follows:
  RAID6 - just go ahead and just replace the failed drive.  Trying to back up first adds to the overall time in a degraded state and the overall risk of another drive failing.  Yes, you're trusting that your RAID controller won't fry the moment you insert that new drive, but hey you've been trusting that raid controller not to crap out for a while now :)  On the off chance that another drive fails during the rebuild, stop the rebuild (or prioritize it very low in Megaraid) and follow the instruction for RAID5 below...
  RAID5 - Keep calm and make sure you know where your towel is.  You'll need it for wiping away your panic sweat, dusting off drives, and rendering your fellow IT employee unconscious before blaming this all on him.  Now, do NOT just replace the drive.  Your primary concern must be the data.  Immediately start writing the data off to 1) drives that are not in the same raid, 2) external drives, 3) other networks, 4) USB drives.  If you really have none of theses, think about the internet (BOX, etc.), but you'll have to judge if the slow upload to the internet is worth the wait.  Once everything is backed up, proceed with swapping the drive.  Then rat-tail yourself with the towel so you remember to do backups!

One more important thought for everyone...  *** NEVER JUST MARK A FAILED DRIVE AS 'ONLINE'!!!  IT COULD CORRUPT YOUR WHOLE ARRAY!!! ***

Good Luck!
Ed
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The Samsung SSD 840 EVO and 840 EVO mSATA have a well-known problem with a drop in read performance. I first learned about this in an interesting thread here at Experts Exchange: http://www.experts-exchange.com/Hardware/Storage/Hard_Drives/Q_2852…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
This video Micro Tutorial explains how to clone a hard drive using a commercial software product for Windows systems called Casper from Future Systems Solutions (FSS). Cloning makes an exact, complete copy of one hard disk drive (HDD) onto another d…
This video discusses moving either the default database or any database to a new volume.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now