Link to home
Start Free TrialLog in
Avatar of GalaxyTechService
GalaxyTechService

asked on

Dell MD issue

Hello Experts,

We are having a issue with our storage enclosure's.  attached are the errors we are getting.
Capture.PNG
Capture2.PNG
Avatar of Member_2_231077
Member_2_231077

Need to see support bundle, you may have a bad disk, bad slot or you may just have to reset the path to good.
Avatar of GalaxyTechService

ASKER

a little back story.  the room went past 100 degrees because the ac went out. we have had this same module throw the same error more then once.  the battery has been replace a number of times.  I think its time for us to replace the raid module its self.
Andyalder,If the error was caused by the over heating.  how do we reset the path to good?
Were the path stats cleared last time you got the error? If not then the fault would keep coming back, The battery cannot cause a degraded path.

smcli -n "arrayname" -c "clear allphysicalDiskChannel stats;" clears down the stats

smcli -n "arrayname" -c "set physicalDiskChannel [1] status=optimal;" sets channel 1 to optimal.  [1] may be 0, depends on which controller it is.

"degraded" actually means the controller increases the timeout period for errors so it doesn't actually mean there is still a path fault. A bad disk can cause the problem and replacing it can still leave the path stats with the original high error count.
I'm not sure the last time we had this error come up was over six months ago.  A reboot seems to clear it.
Rebooting may clear it temporarily, I would still reset the path stats as per above after checking the majoreventlog from the support bundle. You can also get a second opinion from Dell Support by logging it on their PowerVault forum.
how are the path stats reset?
smcli -n "arrayname" -c "clear allphysicalDiskChannel stats;" clears down the stats

smcli -n "arrayname" -c "set physicalDiskChannel [1] status=optimal;" sets channel 1 to optimal.  [1] may be [0], depends on which controller it is.
when changing a RAID module on the MD3260 is there anything that needs to be done before hand such as backing up the configuration?  We plan on shutting don the array and server that controls it first the changing thmodule while the entire system is powered down.  You seem to know your way around this equipment pretty well andyalder.
Do you only have one controller? Normally you just hot-plug it with dual controllers and it reads all the config from the other one, no shutdown needed. Not keen on shutting them down especially if it is a second hand spare although the spare would not wake up thinking it was primary since the disk config provides a quorum. There's a special process for single-controller systems if I remember correctly but not needed for dual-controller ones.

LSI sold there to Dell, IBM and a few others so they're quite common
ASKER CERTIFIED SOLUTION
Avatar of Member_2_231077
Member_2_231077

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
day before yesterday.  do you see any issue with powering down the unit to replace the module?  This unit does have two modules installed at this time.
It's not recommended to shut down to replace a controller, just put it offline via MDSM and then swap them. You may have to redistribute the LUNs between controllers but it normally does that itself. If you are not sure everything is redundant then shutting down the hosts is OK.

https://www.dell.com/community/PowerVault/MD3000i-controller-has-failed-replacement-instructions-using/td-p/4559376
(All MD3xxx are basically the same, doesn't matter that your one is a later generation.)
andyalder,

Were do we run the above syntax from to clear the paths? From what I have read, it can be done from the C:\ of the management server.
I think it is in dell/mdstoragemanager/client. Just change to the directory smcli is in and type the commands in. You can run it on any machine that has MDSM on it.
thank you for all your help andyalder.  we replaced the raid module 0 and the degraded path error is resolved.
still worth running "smcli show alldrivechannels stats" to make sure since as you say rebooting clears it temporarily anyway.