Link to home
Start Free TrialLog in
Avatar of Denten McDougall
Denten McDougallFlag for Canada

asked on

LSI MegaRAID - Unrecoverable medium error during recovery

We recently started getting a bunch of fatal errors listed on our main server's RAID.  checking the status it shows as no errors and system is optimal, but these errors keep showing up and once I started a consistency check it started listing a bunch more.

not sure what to do next as these errors don't tell me much....

and am I at any risk of losing data at this point?

Thanks

Denten
RAID-ERRORS.jpg
Profile_Summary_9_26_2017.txt
Avatar of arnold
arnold
Flag of United States of America image

It's in consistency check, how old are the drives? Sense errors often point to a possible issue with drives. If I am not misreading the event is not isolated to a single drive....
Almost all of your "medium" errors are in a very tight group which is indicative of a physical shock; but, it looks to me like both drives 1 and 2 are beginning to fail and you had better replace at least one of them ASAP!  The storage though will go to a degraded state until the rebuild has completed and I don't like that the alarms have been turned off.
Avatar of Denten McDougall

ASKER

so, if I was to replace both disks, how would i do that?  replace one, let it rebuild and then replace the other?

drives are about 3 years old...
how do I turn alarms on?  I never turned them off, this is how the system was setup when i inherited the system.

also, what would you recommend for replacement drives?

Thanks

Denten
Before anything, please please make sure you have taken a backup, that you test restore to make sure it works. Currently, your system is going through a consistency check. The issue with raid 5 it will take some time to rebuild during which time a second drive failure will be unfortunate, ...

You gave to wait for the consistency check to complete before.  Depending how far back your logs go, identifying the drive that first started triggering this issue.  If you have available drive slots, add a hot spare.

Is the system still under warranty where you can request hp to provide a replacement.

Often while rebuild, you could force the drive that was kicked out back online ...... In hopes the rebuild process completes before the drive dies outright.
Bay's 4-7 are unused, so i could put some drives in there for hot spares?  should i just replace with same drives or is there something better now a days?

server is off warranty.
You could add one as a hot spare, kick one of the two indicated as having issues, and replace it with the other new one.
Did the consistency complete, or is still in progress.
The other consideration based on load on system and whether to attempt the rebuild in an off-peak time. Raid 5 degrades in performance significantly.


You have OS installed data on a raid 5 volume?
I guess I likely do?  looks like all 4 drives are in the RAID and then partitioned in to C: for OS and D: for data.  is that a bad thing?

Like i said, i inherited this system and up till now it seemed to be working fine so I really never looked into it much...

what would be the best thing to do in this case for best performance/security?  short of replacing the entire server.... any other suggestions would be appreciated...

consistency did complete....
Unfortunately, the drives are almost undoubtedly proprietary so you must purchase compatible drives using the manufacturers part numbers.  The LSI Megaraid I support, for example, is inside an IBM Xserver and, if a replacement drive doesn't have IBM's signature embedded in its firmware, the drive is not even recognized.
So, find the actual support page for your server, get the installation manual from it, then look for the compatible drive part numbers and search for them.
really?  never even knew that was a thing... :(
When you get to medium to high end storage controllers, its been true for well over a decade, now and the drives are always way more expensive, too; but, eBay has most of them for a small fraction of what the manufacturer wants!
ok....found compatible 2TB drives, which would be good as we are getting a little tight on space... thinking of buying two..

    Hard drive
    2 TB
    hot-swap
    3.5"
    SATA 6Gb/s
    7200 rpm
    buffer: 32 MB
    for ThinkServer RD330; RD340; RD440; RD530; RD540; RD630; RD640; TD330; TD340; TS430; TS440

can someone advise on best practice to get this fixed up and operating at best possible performance, etc..  

I am no tech expert, but know enough to be dangerous.... :)

Thanks
You need to buy the same 557.861 GB drives that you need to replace and the report you posted says they are SAS not SATA.  You won't be able to create a new virtual drive with only two 2TB drives and may, in fact, clobber the rebuild if you stick them in.
The X Server I mentioned has several possible backplanes; 2.5 inch hot swap, 2.5 inch non hot-swap, 3.5 inch of both, and SATA.
The megaraid software picture you first posted has a Physical tab.  Did you look there to see what the drives were?
The drives are Seagate 600 GB 3.5 SAS ST3600057SS
ST Seagate
3 3.5 inch if this is a 9 it will be a 2.5 inch drive Form factor
600057 600 GB
SS SAS

Some vendors while they have their own, at times allowed non-vendor specific to be acceptable while the controller might reflect the drive in an "unknown when the firmware as was suggested was not that of the vendor's signature" state......
Ok Thanks.  I have ordered 2 of the exact same drives...hopefully here this week.  

so i can proceed quickly when they arrive, i have a couple follow up/clarification questions that I would appreciate some help with....

in an earlier message Davis indicated that the issues were with drives 1 & 2.  

What in the logs indicated that (just so i know what i am looking at)?

and is that drives 1 & 2 in terms of their slots? and that the drive in slot 0 is ok?

and also, I believe these are hot swap-able but is that a good practice or should i shut down, replace one of the drives and then after the raid is rebuilt, then do the same thing again?

Thanks

Denten
See the items in the RAId file, 1 Location, 2 Location.
I'm not sufficiently familiar with megaraid...
If you have a spare sas drive, not related to the issue at hand, insert it into position 5, then look at the event log on the raid controller that shoukd reflect the insertion of a disk, X Location.

Hot-swaps are designed to be swapped in and out while hot, system continues to run.
Never replace drives when hot-swap is available while off.

The issues that often occur are, the drive being added may have a foreign lable, and could have an adverse impact when the raid controller picks the wrong drive as the reference for the raid.
In hp, I think you have to kick the drive you want to replace out.
Then insert the replacement, i think in this case and in the absence of a hot spare the rebuild will be initiated.
"The megaraid software picture you first posted has a Physical tab.  Did you look there to see what the drives were? "
That should tell you which drive is which and I might suggest adding both of the new drives as hotspares followed by removing the failing ones, one at a time.
The "hot swap" feature is mostly in the back plane and the drives themselves with an explicit procedure outlined in the users manual.
sorry for all the newbie questions, but...

I bought drives, but they do not have the enclosure they are just the same drives them selves, and i have no other enclosures for the drives, but there are 4 empty slots on the server.  

so I assume i can just buy the drive enclosures separate and then just install the drives and pop them in the slots and configure them as hot spares on reboot?
There is no need to bring the server down, nor is there a need to reboot the server.
Think of it as a clothing change during the day. You need not go to sleep to change your attire, nor do you gave to go through your morning routine to change your shirt.

In a hot-swap, you using the hp insight tool, insert the new drive, look at the physical drives on each channel, add the drive indicated as ready and set it as a hot spare. After you confirm which drive has more of these events, you would navigate to the logical volume list, expand to see the member drives making that volume, and kick out the drive you want replaced.
Wait until the rebuild completes before repeating the process for the second drive.

As prior, please make sure you have a current good backup plan as well as possibly consider updating the environment potentially getting a newer server higher capacity, better disk plan, I.e. rAId 1 volume for os and raid 10 for the other.........
I don't believe I have that software?  the Mega RAID manager software i have been looking in doesn't appear to have any configuration abilities...unless I'm missing something? only way i have ever seen to configure the RAID was on a reboot...

I will have a look and see if I can find it somewhere to download...

Thanks
You already have the MegaRaid software; but, it does give you the option to login as administrator when you run it AND it will let you do RAID configurations from within it.  Look on the Physical tab at the top.
Hello,

been awhile on this one....  

I had purchased two new drives (exactly the same as the ones in the server).  I put one in as a replacement and it worked just fine, and repaired the RAID.  Now I have a second drive that has failed and put in my second purchased drive, but it will not even recognize that i put this drive in.  any thoughts on why that may be?

Thanks

Denten
If it does not appear in the listing of the installed drives, that drive is bad.
Did you get this resolved?  I am being asked to close the question.
Please close this question and award the points you think are fair, OK?
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.