Cisco Nexus 7k - two modules keep over-heating

I have a Nexus 7k and two modules are overheating. Module 1 got all the way to 119c and
powered itself down. When module 1 went down module 4 decided to heat up and got to
104 as you can see below. The data center people supposedly moved around some insulation
to get better air to it but I'm not seeing an impact. I should probably open a TAC case but I've
been up many hours and just hope to work at this again in the morning. Perhaps one of the
experts here might have another thought? Is there any way this could be the fault of the
equipment based given the PS looks good, the fans look good etc?

4        MAC0Sn0(s2)     115             105         47         Ok
4        MAC0Sn1(s3)     115             105         48         Ok
4        MAC0-Buf0(s4)   115             105         55         Ok
4        MAC0-Buf1(s5)   115             105         56         Ok
4        MAC0-Buf2(s6)   115             105         68         Ok
4        MAC0-Buf3(s7)   115             105         70         Ok
4        MAC1Sn0(s8)     115             105         42         Ok
4        MAC1Sn1(s9)     115             105         44         Ok
4        MAC1-Buf0(s10)  115             105         47         Ok
4        MAC1-Buf1(s11)  115             105         67         Ok
4        MAC1-Buf2(s12)  115             105         50         Ok
4        MAC1-Buf3(s13)  115             105         44         Ok
4        Fwd0Sn0(s14)    115             105         87         Ok
4        Fwd0Sn1(s15)    115             105         87         Ok
4        Fwd1Sn0(s16)    115             105         85         Ok
4        Fwd1Sn1(s17)    115             105         85         Ok
4        Fwd2Sn0(s18)    115             105         82         Ok
4        Fwd2Sn1(s19)    115             105         82         Ok
4        Fwd3Sn0(s20)    115             105         61         Ok
4        Fwd3Sn1(s21)    115             105         61         Ok
4        QEng0Sn0(s22)   115             105         88         Ok
4        QEng0Sn1(s23)   115             105         88         Ok
4        QEng1Sn0(s24)   115             105         104        Ok
4        QEng1Sn1(s25)   115             105         104        Ok
4        QEng2Sn0(s26)   115             105         75         Ok
4        QEng2Sn1(s27)   115             105         75         Ok
4        QEng3Sn0(s28)   115             105         67         Ok
4        QEng3Sn1(s29)   115             105         67         Ok
4        Crossbar(s30)   115             105         61         Ok
4        LkU0Sn0(s31)    115             105         99         Ok

Fan1(sys_fan1)  N7K-C7018-FAN        1.0        Ok
Fan2(sys_fan2)  N7K-C7018-FAN        1.0        Ok
Fan_in_PS1      --                   --         Ok
Fan_in_PS2      --                   --         Ok
Fan_in_PS3      --                   --         Ok
Fan_in_PS4      --                   --         Ok
Fan Zone Speed: Zone 1: 0x5f Zone 2: 0x30 Zone 3: 0x9f

Mod  Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    6      10/40 Gbps Ethernet Module          N7K-M206FQ-23L     powered-dn

ho env power
Power Supply:
Voltage: 50 Volts
Power                              Actual        Total
Supply    Model                    Output     Capacity    Status
                                 (Watts )     (Watts )
-------  -------------------  -----------  -----------  --------------
1        N7K-AC-6.0KW              1143 W       6000 W     Ok
2        N7K-AC-6.0KW              1156 W       6000 W     Ok
3        N7K-AC-6.0KW               653 W       3000 W     Ok
4        N7K-AC-6.0KW              1155 W       6000 W     Ok


                                  Actual        Power
Module    Model                     Draw    Allocated    Status
                                 (Watts )     (Watts )
-------  -------------------  -----------  -----------  --------------
1        N7K-M206FQ-23L             N/A            0 W    Powered-Dn
2        N7K-M108X2-12L             477 W        650 W    Powered-Up
3        N7K-M108X2-12L             500 W        650 W    Powered-Up
4        N7K-M224XP-23L             654 W        795 W    Powered-Up
5        N7K-M224XP-23L             640 W        795 W    Powered-Up
6        N7K-F248XP-25E             318 W        450 W    Powered-Up
7        N7K-M206FQ-23L             625 W        795 W    Powered-Up
8        N7K-M206FQ-23L             628 W        795 W    Powered-Up
9        N7K-SUP2E                  145 W        265 W    Powered-Up
10       supervisor                 N/A            0 W    Absent
Xb1      N7K-C7018-FAB-2             52 W        150 W    Powered-Up
Xb2      N7K-C7018-FAB-2             57 W        150 W    Powered-Up
Xb3      N7K-C7018-FAB-2             57 W        150 W    Powered-Up
Xb4      N7K-C7018-FAB-2             51 W        150 W    Powered-Up
Xb5      xbar                       N/A          150 W    Absent
fan1     N7K-C7018-FAN              280 W        578 W    Powered-Up
fan2     N7K-C7018-FAN              148 W        422 W    Powered-Up

N/A - Per module power not available


Power Usage Summary:
--------------------
Power Supply redundancy mode (configured)                PS-Redundant
Power Supply redundancy mode (operational)               Non-Redundant

Total Power Capacity (based on configured mode)              18000 W
Total Power of all Inputs (cumulative)                       21000 W
Total Power Output (actual draw)                              4107 W
Total Power Allocated (budget)                                7210 W
Total Power Available for additional modules                 10790 W
LVL 2
amigan_99Network EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

atlas_shudderedSr. Network EngineerCommented:
Three questions I think:

First, do you actively monitor the equipment over history and if so, do you have any trend data?

Second, have you checked the air temperature at the intake side (front and right sides facing blades/slots)?  

Third, have you reviewed the cabinet for restrictions to airflow?  Cabling, isle placement/division?  We generally run sealed cold isles, 80% population on cabling with a side draw/drop w/micro/slimline cabling where possible.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Craig BeckCommented:
I'd say it's something restricting airflow, especially if other components are ok within the same chassis.
0
amigan_99Network EngineerAuthor Commented:
Thanks. For module 4 it looks like objects out of 29 are hot?

4        QEng1Sn0(s24)   115             105         104        Ok
4        QEng1Sn1(s25)   115             105         104        Ok

Module 1 is powered down. I'll spin up the data center guys. Looks like they didn't
do anything over night.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

amigan_99Network EngineerAuthor Commented:
Consultants put in insulation to make things better - and made things sharply worse.
0
atlas_shudderedSr. Network EngineerCommented:
Insulation in the rack?
0
amigan_99Network EngineerAuthor Commented:
That's what they tell me! They're in another state. I envisioned their putting a blanket around the 7k.
0
Craig BeckCommented:
Wow! Insulation will make things worse. The kit emits heat. Are these consultants proper IT consultants?
0
amigan_99Network EngineerAuthor Commented:
It sure did make things worse. You could see in Observium maps the exact moment they installed it. A real production risk.
0
atlas_shudderedSr. Network EngineerCommented:
Not to put to sharp a point on it but....

What kind of idiot puts a blanket over an electrified, high heat, omni-directional air-flow piece of a equipment with the expectation that this will result in an environmental improvement?  Where they concerned your switch was going to get to cold and the electrons would begin to slow down?  Did they think the switch was being cooled with liquid nitrogen?

I'd do three things -

1.  I would tell them they are buying me two new blades and they are on the hook for any other equipment failures in that rack for the next 2-3 years.
2.  I'd tell them that I wasn't paying them a dime and in fact they may want to consider paying me hush money so that I don't drag them into court for negligence.
3.  I'd call the fire marshal and have their business license pulled on the grounds of attempted arson.

Just my thoughts on the matter.
1
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Cisco

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.