gurutc
asked on
IBM 3650 Component temperature thresholds
What are the recommended temperature thresholds for IBM x series servers specifically memory CPU But including all others
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
So is what you are asking actually - why does my server run at this temperature?
For the M4 version both the RAM (standard DRR3 or Max5 v2) and CPU (e5-2600v2) are rated at over 80ºC but the server threshold is meant to be set at 50ºC. Of course once CPU gets hotter you're going to see a drop in performance but you're a long way from the tolerance limits of the Xeon
There are some old firmware versions that had thermal control problems but if you're running IMM2 v.3.35 or better you shouldn't see that.
I'm assuming the system fans have already kicked in at the temps you're describing.
For the M4 version both the RAM (standard DRR3 or Max5 v2) and CPU (e5-2600v2) are rated at over 80ºC but the server threshold is meant to be set at 50ºC. Of course once CPU gets hotter you're going to see a drop in performance but you're a long way from the tolerance limits of the Xeon
There are some old firmware versions that had thermal control problems but if you're running IMM2 v.3.35 or better you shouldn't see that.
I'm assuming the system fans have already kicked in at the temps you're describing.
ASKER
Thanks for your response. I'm observing that CPU and memory temps on properly functioning servers are in the IBM operational range. But actual observation reveals that IBM sets some threshold values for CPU and DIMM temps that cause a shutdown. This is proved by the fact that there's a firmware update to fix a DIMM memory too low temp threshold shutdown. This value is below the CPUs' and DIMMs' max temp limit. Someone at IBM must know what these values are. On my problem system the 'system' temp is below threshold and the TEMP led is not currently lit even though my CPUs and DIMMs are problematically and performance-affecting overly hot. My guess is 50C for the CPUs but that's a guess, and I have no idea for the DIMMs. What are the numbers?
And another issue: Because the Broadcom network chipsets as well as the chipsets for installed QLogic HBAs have a max temp operational range of about 35C. All is fine and good for CPUs and DIMMs to take a sauna bath, but the other components in the server located right next to the CPUs and DIMMs can't handle the heat coming from the CPUs and DIMMs.
I know, just open up the box and fix the airflow baffles. That's coming during our Sunday maintenance window. But I plan to check all of our thousands of IBM servers for CPU and DIMM temps even though none show a TEMP indicator issue. IBM sets the operational temp values to ensure performance and reliability. I suspect that un-detected CPU and DIMM heat issues are causing a huge performance hit in my environment.
So simply, what are the Over-Temp threshold values that cause a shutdown in IBM X Series Servers dadgummit?
- gurutc
And another issue: Because the Broadcom network chipsets as well as the chipsets for installed QLogic HBAs have a max temp operational range of about 35C. All is fine and good for CPUs and DIMMs to take a sauna bath, but the other components in the server located right next to the CPUs and DIMMs can't handle the heat coming from the CPUs and DIMMs.
I know, just open up the box and fix the airflow baffles. That's coming during our Sunday maintenance window. But I plan to check all of our thousands of IBM servers for CPU and DIMM temps even though none show a TEMP indicator issue. IBM sets the operational temp values to ensure performance and reliability. I suspect that un-detected CPU and DIMM heat issues are causing a huge performance hit in my environment.
So simply, what are the Over-Temp threshold values that cause a shutdown in IBM X Series Servers dadgummit?
- gurutc
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
It is in service guide...
ASKER
My bad server has some DIMM temps ranging up to 60C and 46C CPU temps. As you would guess, it's having issues. The Best-Practices values for these components would have to be known by IBM for the Intel CPUs 'as installed'. My bad server is out of range I'm sure. But the 'operating environment' temp range is certainly not valid for the CPUs and memory. Even with these ridiculously high CPU and memory temps on my bad server it is not throwing a temp warning. Sooo, somewhere IBM must have the 'known-good' upper temp range values.
- gurutc