Dell MD issue

Hello Experts,

We are having a issue with our storage enclosure's.  attached are the errors we are getting.
Capture.PNG
Capture2.PNG
GalaxyTechServiceAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

andyalderCommented:
Need to see support bundle, you may have a bad disk, bad slot or you may just have to reset the path to good.
GalaxyTechServiceAuthor Commented:
a little back story.  the room went past 100 degrees because the ac went out. we have had this same module throw the same error more then once.  the battery has been replace a number of times.  I think its time for us to replace the raid module its self.
GalaxyTechServiceAuthor Commented:
Andyalder,If the error was caused by the over heating.  how do we reset the path to good?
Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

andyalderCommented:
Were the path stats cleared last time you got the error? If not then the fault would keep coming back, The battery cannot cause a degraded path.

smcli -n "arrayname" -c "clear allphysicalDiskChannel stats;" clears down the stats

smcli -n "arrayname" -c "set physicalDiskChannel [1] status=optimal;" sets channel 1 to optimal.  [1] may be 0, depends on which controller it is.

"degraded" actually means the controller increases the timeout period for errors so it doesn't actually mean there is still a path fault. A bad disk can cause the problem and replacing it can still leave the path stats with the original high error count.
GalaxyTechServiceAuthor Commented:
I'm not sure the last time we had this error come up was over six months ago.  A reboot seems to clear it.
andyalderCommented:
Rebooting may clear it temporarily, I would still reset the path stats as per above after checking the majoreventlog from the support bundle. You can also get a second opinion from Dell Support by logging it on their PowerVault forum.
GalaxyTechServiceAuthor Commented:
how are the path stats reset?
andyalderCommented:
smcli -n "arrayname" -c "clear allphysicalDiskChannel stats;" clears down the stats

smcli -n "arrayname" -c "set physicalDiskChannel [1] status=optimal;" sets channel 1 to optimal.  [1] may be [0], depends on which controller it is.
GalaxyTechServiceAuthor Commented:
when changing a RAID module on the MD3260 is there anything that needs to be done before hand such as backing up the configuration?  We plan on shutting don the array and server that controls it first the changing thmodule while the entire system is powered down.  You seem to know your way around this equipment pretty well andyalder.
andyalderCommented:
Do you only have one controller? Normally you just hot-plug it with dual controllers and it reads all the config from the other one, no shutdown needed. Not keen on shutting them down especially if it is a second hand spare although the spare would not wake up thinking it was primary since the disk config provides a quorum. There's a special process for single-controller systems if I remember correctly but not needed for dual-controller ones.

LSI sold there to Dell, IBM and a few others so they're quite common
andyalderCommented:
Just remembered I have already seen your log from the beginning of January. I would agree with replacing the controller. Not got time to re-read that log now to see which one but a controller reset itself twice. I jumped to the conclusion that one of your colleagues removed it as there was a "battery has been replaced" message logged on that day. When the controller came back up it reported battery low but that's not the cause. just a symptom of the controller reset. It could be deeper than a bad controller and be a chassis connector fault. No easy way to tell without shuffling controllers.

When did the AC go down?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GalaxyTechServiceAuthor Commented:
day before yesterday.  do you see any issue with powering down the unit to replace the module?  This unit does have two modules installed at this time.
andyalderCommented:
It's not recommended to shut down to replace a controller, just put it offline via MDSM and then swap them. You may have to redistribute the LUNs between controllers but it normally does that itself. If you are not sure everything is redundant then shutting down the hosts is OK.

https://www.dell.com/community/PowerVault/MD3000i-controller-has-failed-replacement-instructions-using/td-p/4559376
(All MD3xxx are basically the same, doesn't matter that your one is a later generation.)
GalaxyTechServiceAuthor Commented:
andyalder,

Were do we run the above syntax from to clear the paths? From what I have read, it can be done from the C:\ of the management server.
andyalderCommented:
I think it is in dell/mdstoragemanager/client. Just change to the directory smcli is in and type the commands in. You can run it on any machine that has MDSM on it.
GalaxyTechServiceAuthor Commented:
thank you for all your help andyalder.  we replaced the raid module 0 and the degraded path error is resolved.
andyalderCommented:
still worth running "smcli show alldrivechannels stats" to make sure since as you say rebooting clears it temporarily anyway.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.