Link to home
Start Free TrialLog in
Avatar of gurutc
gurutcFlag for United States of America

asked on

IBM 3650 Component temperature thresholds

What are the recommended temperature thresholds for IBM x series servers specifically memory CPU But including all others
ASKER CERTIFIED SOLUTION
Avatar of ☠ MASQ ☠
☠ MASQ ☠

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gurutc

ASKER

I found the numbers you quote, however, these are 'ambient' system values.  DIMM memory temps as well as CPU temps on servers running fine fluctuate up to 42 C.  

My bad server has some DIMM temps ranging up to 60C and 46C CPU temps.  As you would guess, it's having issues.  The Best-Practices values for these components would have to be known by IBM for the Intel CPUs 'as installed'.  My bad server is out of range I'm sure.  But the 'operating environment' temp range is certainly not valid for the CPUs and memory.  Even with these ridiculously high  CPU and memory temps on my bad server it is not throwing a temp warning.  Sooo, somewhere IBM must have the 'known-good' upper temp range values.

- gurutc
Avatar of ☠ MASQ ☠
☠ MASQ ☠

So is what you are asking actually - why does my server run at this temperature?

For the M4 version both the RAM (standard DRR3 or Max5 v2) and CPU (e5-2600v2) are rated at over 80ºC but the server threshold is meant to be set at 50ºC.  Of course once CPU gets hotter you're going to see a drop in performance but you're a long way from the tolerance limits of the Xeon

There are some old firmware versions that had thermal control problems but if you're running IMM2 v.3.35 or better you shouldn't see that.

I'm assuming the system fans have already kicked in at the temps you're describing.
Avatar of gurutc

ASKER

Thanks for your response.  I'm observing that CPU and memory temps on properly functioning servers are in the IBM operational range.  But actual observation reveals that IBM sets some threshold values for CPU and DIMM temps that cause a shutdown.  This is proved by the fact that there's a firmware update to fix a DIMM memory too low temp threshold shutdown.  This value is below the CPUs' and DIMMs' max temp limit.  Someone at IBM must know what these values are.  On my problem system the 'system' temp is below threshold and the TEMP led is not currently lit even though my CPUs and DIMMs are problematically and performance-affecting overly hot.  My guess is 50C for the CPUs but that's a guess, and I have no idea for the DIMMs.  What are the numbers?

And another issue:  Because the Broadcom network chipsets as well as the chipsets for installed QLogic HBAs have a max temp operational range of about 35C.  All is fine and good for CPUs and DIMMs to take a sauna bath, but the other components in the server located right next to the CPUs and DIMMs can't handle the heat coming from the CPUs and DIMMs.

I know, just open up the box and fix the airflow baffles.  That's coming during our Sunday maintenance window.  But I plan to check all of our thousands of IBM servers for CPU and DIMM temps even though none show a TEMP indicator issue.  IBM sets the operational temp values to ensure performance and reliability.  I suspect that un-detected CPU and DIMM heat issues are causing a huge performance hit in my environment.

So simply, what are the Over-Temp threshold values that cause a shutdown in IBM X Series Servers dadgummit?

- gurutc
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It is in service guide...