Avatar of RyanHenry
RyanHenry
Flag for United States of America asked on

Dell MD3000i SAN had 2 drives fail and now MDSM is offline.

I had 1 drive fail on my SAN and then the hot spare also went into a failing state.  But all other 12 drives were still lit up green and the storage array was accessible, SAN is in a RAID 5 array.  
I replaced both drives one right after the other (which I think was a mistake), and then the SAN went offline.  Eventually in came back on, but only the 2 drives I replaced are lit up green now, and the other 12 are not.  Every few seconds I see a flash of green lights where it looks like all 12 are trying to light up but don't.  The solid amber light is on the front on the SAN as well?  Support is no longer offered from Dell so any help would be great.

Also, the drives on the server that are mapped to the SAN are still operational, so hopefully you can help me get it working again, but I am not familiar with the CLI, so if anyone knows some commands that would be great?
StorageServer HardwareDisaster RecoveryDell

Avatar of undefined
Last Comment
andyalder

8/22/2022 - Mon
randomsense

I'm not really familiar with that model but it looks like it might have a configuration/management setup outside of just CLI. I only skimmed through it but here is the manual for that device.

http://downloads.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_powervault/powervault-md3000i_User%27s%20Guide20_en-us.pdf

and the Dell support site for it: http://www.dell.com/support/home/us/en/19/product-support/product/powervault-md3000i/manuals

Your first step would probably be to check to see what the RAID Controller says for the arrays status. You should be able to check it via its web configuration page or possibly rebooting the server and hitting the key combination that enters the controllers configuration when prompted during POST.

Hopefully that gets you pointed in the right direction.

I almost forgot... on the Dell Manual page linked above there is a 'Dell PowerVault Modular Disk Storage Arrays CLI Guide'.
RyanHenry

ASKER
It seems to be in a loop and won't actually allow me into the SMcli.  I plugged a serial cable into it and it seems t be stuck in a loop, tried CTRL BREAK but no luck.  I tried unseating one of the 2 drives I replaced, and now at least all of the drives are back green online, but I still have the amber light as well.  I also still can't use the web interface, and none of my mapped drives to the SAN are back.  Seems to be completely down now, but looks green like it should be ok?
randomsense

Here is the hardware users guide:
http://downloads.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_powervault/powervault-md3000i_Owner%27s%20Manual_en-us.pdf

On page 15 it starts giving descriptions for the status lights.
It mentions Steady Amber is power is on and in reset state.
Blinking Amber is enclosure is in fault state.

Double check the docs though to be sure I'm looking at the correct lights :)

Page 17 starts the trouble shooting section.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
RyanHenry

ASKER
Thx, it looks like the controller is in reset state, which makes sense, its solid amber.  I am going to look thru the t-shooting, thx!
Rojosho

Hello RyanHenry,

Question: What happens when the two replaced HDDs are removed?

If the two failed HDDs were part of the same Logical Drive, then the Rebuild will have major problems completing.  If the SAN and it's volumes were accessible, then I would see if you can get back to that point.

If you can, then I would use something like 'KillDisk' to wipe the new replacements to remove any meta data from the first rebuild and insert them one-at-time and only after the rebuild has completed.

It may be useful to call Dell and see what they say...

Rojosho
RyanHenry

ASKER
Unfortunately its end of life at Dell so i couldn't renew the support on it.  I have pro support on both my R610 and R620 that connect to it, so maybe they can help?  I hope!

If I but both drives in, all other drives go offline and loose the color, the second I take 1 of them out, particularly drive bay 7, then they all come back online, lit up green,  But, the array is still offline, and I can't connect to it through any means.  Haven't tried booting it with both drives out, because I was worried.  I will try that in the am when i get back in.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
RyanHenry

ASKER
OK,

SO I ended up talking to Dell and they wanted $599 so that was out.  When I removed the replacement drives, the SAN came back fully online.  It looks like only 1 drive is lost in the RAID 5, the second drive is for the Hot Spare.  I also had to re-purchase the exact same model drive via Dell.  Those will be in tomorrow am, and hopefully I will be ok.

Question, it looks like they are recommending that i put the hot spare replacement in 1st, then when that's complete, assign it to the bad drive, before inserting the second new drive?  Does that make sense to u guys?
ASKER CERTIFIED SOLUTION
randomsense

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
randomsense

Once all is said and done depending on the number of drives, size of the drives, and setup of the array(s) you might want to take a look at either Raid 6 or 10 if your usage allows for it. Though I'm guessing Raid 6 might be a better fit as you wouldn't lose as much space as 10. The larger the drives the greater the possibility that a second one dies during rebuild. If the drives are smaller and the rebuild time isn't very long then the worry isn't as great.
andyalder

Now that you have removed the two disks that were confusing it for some reason can you get into the MDSM GUI? If so just follow the recovery guru procedure.

Surprised Dell won't still give free support over the phone on these, it was meant to have lifetime technical support when you bought it AFAIK. Dell support still post on their MD3000 forum, I uploaded a support bundle a couple of months ago and they told me what to do to fix it within a day or so.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes