Dell MD3000i SAN had 2 drives fail and now MDSM is offline.
I had 1 drive fail on my SAN and then the hot spare also went into a failing state. But all other 12 drives were still lit up green and the storage array was accessible, SAN is in a RAID 5 array.
I replaced both drives one right after the other (which I think was a mistake), and then the SAN went offline. Eventually in came back on, but only the 2 drives I replaced are lit up green now, and the other 12 are not. Every few seconds I see a flash of green lights where it looks like all 12 are trying to light up but don't. The solid amber light is on the front on the SAN as well? Support is no longer offered from Dell so any help would be great.
Also, the drives on the server that are mapped to the SAN are still operational, so hopefully you can help me get it working again, but I am not familiar with the CLI, so if anyone knows some commands that would be great?
StorageServer HardwareDisaster RecoveryDell
Last Comment
andyalder
8/22/2022 - Mon
randomsense
I'm not really familiar with that model but it looks like it might have a configuration/management setup outside of just CLI. I only skimmed through it but here is the manual for that device.
Your first step would probably be to check to see what the RAID Controller says for the arrays status. You should be able to check it via its web configuration page or possibly rebooting the server and hitting the key combination that enters the controllers configuration when prompted during POST.
Hopefully that gets you pointed in the right direction.
I almost forgot... on the Dell Manual page linked above there is a 'Dell PowerVault Modular Disk Storage Arrays CLI Guide'.
RyanHenry
ASKER
It seems to be in a loop and won't actually allow me into the SMcli. I plugged a serial cable into it and it seems t be stuck in a loop, tried CTRL BREAK but no luck. I tried unseating one of the 2 drives I replaced, and now at least all of the drives are back green online, but I still have the amber light as well. I also still can't use the web interface, and none of my mapped drives to the SAN are back. Seems to be completely down now, but looks green like it should be ok?
On page 15 it starts giving descriptions for the status lights.
It mentions Steady Amber is power is on and in reset state.
Blinking Amber is enclosure is in fault state.
Double check the docs though to be sure I'm looking at the correct lights :)
Thx, it looks like the controller is in reset state, which makes sense, its solid amber. I am going to look thru the t-shooting, thx!
Rojosho
Hello RyanHenry,
Question: What happens when the two replaced HDDs are removed?
If the two failed HDDs were part of the same Logical Drive, then the Rebuild will have major problems completing. If the SAN and it's volumes were accessible, then I would see if you can get back to that point.
If you can, then I would use something like 'KillDisk' to wipe the new replacements to remove any meta data from the first rebuild and insert them one-at-time and only after the rebuild has completed.
It may be useful to call Dell and see what they say...
Rojosho
RyanHenry
ASKER
Unfortunately its end of life at Dell so i couldn't renew the support on it. I have pro support on both my R610 and R620 that connect to it, so maybe they can help? I hope!
If I but both drives in, all other drives go offline and loose the color, the second I take 1 of them out, particularly drive bay 7, then they all come back online, lit up green, But, the array is still offline, and I can't connect to it through any means. Haven't tried booting it with both drives out, because I was worried. I will try that in the am when i get back in.
SO I ended up talking to Dell and they wanted $599 so that was out. When I removed the replacement drives, the SAN came back fully online. It looks like only 1 drive is lost in the RAID 5, the second drive is for the Hot Spare. I also had to re-purchase the exact same model drive via Dell. Those will be in tomorrow am, and hopefully I will be ok.
Question, it looks like they are recommending that i put the hot spare replacement in 1st, then when that's complete, assign it to the bad drive, before inserting the second new drive? Does that make sense to u guys?
Once all is said and done depending on the number of drives, size of the drives, and setup of the array(s) you might want to take a look at either Raid 6 or 10 if your usage allows for it. Though I'm guessing Raid 6 might be a better fit as you wouldn't lose as much space as 10. The larger the drives the greater the possibility that a second one dies during rebuild. If the drives are smaller and the rebuild time isn't very long then the worry isn't as great.
andyalder
Now that you have removed the two disks that were confusing it for some reason can you get into the MDSM GUI? If so just follow the recovery guru procedure.
Surprised Dell won't still give free support over the phone on these, it was meant to have lifetime technical support when you bought it AFAIK. Dell support still post on their MD3000 forum, I uploaded a support bundle a couple of months ago and they told me what to do to fix it within a day or so.
http://downloads.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_powervault/powervault-md3000i_User%27s%20Guide20_en-us.pdf
and the Dell support site for it: http://www.dell.com/support/home/us/en/19/product-support/product/powervault-md3000i/manuals
Your first step would probably be to check to see what the RAID Controller says for the arrays status. You should be able to check it via its web configuration page or possibly rebooting the server and hitting the key combination that enters the controllers configuration when prompted during POST.
Hopefully that gets you pointed in the right direction.
I almost forgot... on the Dell Manual page linked above there is a 'Dell PowerVault Modular Disk Storage Arrays CLI Guide'.