Solved

# Optimum server room temperature?

Posted on 2010-11-22
1,306 Views
Hi there, I am looking for as much documentation as possible to support an argument to lower the temperature in our server room, currently somewhere between 24.4-27.5°C. Most research so far suggests it should be between 20-23(max)°C. The only thing I have found from HP is that the room should be 20-30°C but I am trying to find something that supports what the optimum should be? Does anyone have any documentation/links that I could use? Many thanks in advance.
0
Question by:corecc
• 5
• 4
• 3
• +4

LVL 1

Accepted Solution

sirois500 earned 25 total points
ID: 34188512
Interesting article here :

Google take up the temp of the server room to save energy. And the engineer desing procesor for work up to 95 degree Celsius.....
0

LVL 31

Expert Comment

ID: 34188513
The more important note is what is the temperature inside your servers themselves? If those temperatures are too high it could indicate a ventilation or cooling problem which could be rectified by reducing the overall temperature of the server room.

If your servers internal temperatures are nominal... then there isn't really a need to change the room temperature anyways.
0

LVL 21

Expert Comment

ID: 34188607

First of all, the only temperature that matters is the inlet temperature. This means this should be measured at the intake level of the servers, usually at the front.

Most recent equipement will tolerate higher temperatures of about 25C quite well. But in our experience this still results in a higher failure rate than cooling to about 20C.

Things that certainly fail more often are the internal fans (because need to work harder) if environment temperature are higher.

0

LVL 9

Assisted Solution

losip earned 25 total points
ID: 34190478
As robocat says, the room temperature is irrelevant; it is the server temperature that matters assuming you are not going to have people occupying the room.  Data centre designers and aircon engineers still have the mind set that a server room is a cooled room into which you put servers.  This is not so; you need to cool the servers, not the room.

To that end, if you have multiple rows of servers racks, you should arrange it so that the aisles are either hot aisles or cold aisles, i.e. alternate the direction in which the servers face.  You supply cool air to cold aisle (usually formed of the front of servers) and let the hot aisle temperature look after itself - it doesn't matter - you just suck whatever temperature out from the hot aisle.

Now, what temperature should the cold aisle be supplied with?  In the days of mainframes and open reel tapes, close control of temperature (and humidity) was essential but servers don't care much.  If you look at the spec of of current HP ProLiant servers, you will find that they are designed to accept inlet air temperature up to 35C (95F) with a de-rating at altitude.  The processors will slow down above 30C if there is a fan fault but the system will continue to operate

While I wouldn't recommend running at the full 35C, I do think 25-28C is perfectly feasible and will not noticeably affect reliability (failure rate is proportional to the square of the absolute temperature so raising the temperature from 20 to 25C will result in a drop of mean time between failures of about 3%).  However, the energy saving will be huge.

To answer your question, the Uptime Institute has various papers on cooling techniques but not all refer to rooms housing only servers and network equipment so you have to read them with that in mind.  Go to www.uptimeinstitute.org and look for their publications.  Another good source is www.ashrae.org and finally you can try somw of the APC White Papers but bear in mind they are in the business of selling solutions. http://www.apc.com/prod_docs/results.cfm
0

LVL 47

Expert Comment

ID: 34192943
I can' t imagine any modern research showing that the ambient temperature of non-mainframe server farms be lowered to what you desire.

Lower temperatures such as that needlessly waste power and even contribute to lower disk life.

All you need to do is query the INTERNAL temperature of your servers.  They are architected to run normally at much higher temperatures then systems made as little as 5 years ago.

0

Expert Comment

ID: 34213327
20 degrees Celsius is typical for most data centers. As others have already said, the inlet and internal temperatures are what you set the room to accommodate. I run NIMS class devices so I can run a little warmer and save on cooling costs. Most servers today have sensors that can be read. Reading these sensors can be done remotely using IPMI.  Another important environmental consideration is humidity. Too high and you will have condensation problems. Too low and you will have static buildup. A value og 50% +/- 10% is regarded as acceptable.
0

LVL 47

Expert Comment

ID: 34213455
I must respectfully strongly disagree with mikemcdonough.  Well, actually, I don't disagree.   Intel, IBM, HP, Liebert, and Lawrence Livermore labs disagree.

This study speaks for itself.  You are wasting power and energy needlessly.  25c to 27c (77F - 80F) is just fine.

"Intel wants you to know that data centers are wasting energy - and money - by over-cooling their servers, burdened by warranties that may prevent them from aggressively raising their temperature.

During a wide-ranging discussion on enterprise-level cloud computing last week, Intel's director of platform technology initiatives in the company's server platform group, Dylan Larson, pointed reporters to a recent energy-efficiency study (PDF) conducted by representatives from Intel, IBM, HP, Liebert Precision Cooling (a division of Emerson Network Power), and the Lawrence Berkeley National Lab."

http://www.theregister.co.uk/2009/08/31/data_centers_run_too_cool/
0

Expert Comment

ID: 34213647
An interesting article. I'll have to take the time to follow the links and read these studies. I suppose I am still working off the ASHRAE 2004 recommendations

http://www.ancis.us/images/AN-04-9-1.pdf

I was referring to the setting of the air handlers. Depending on the BTU load of the room, the actual temperatures read at server intake can be between 5 and 10 degrees higher. This issues, one has to take into account are; the layout of the data center, are there hot and cold isles, where are the vents placed in relationship to the racks, etc,

I mostly deal with mission critical servers. Heat is a contributor to failure. The cost of downtime is greater then the cost of cooling. I would be comfortable in moving from 68F (20C) to 70F or 71F.

Now if the equipment is low cost and the cost of downtime is low80F may be just fine. If these factors are not the case, I would not recommend 75F and certainty not 80F.

0

LVL 9

Expert Comment

ID: 34213804
Mike, you say "Heat is a contributor to failure".  Do you have access to any quantative research into the effect of temperature on modern servers' reliability?  I would be greatly interested to see it.

The research we have done with two data centers of 2,000 servers each showed no detectable difference in failure rate when the temperature was raised from an on-coil set-point of 18C to about 25C.  In the two years since this was done, some servers have got older and therefore later in their "bathtub" failure rate curve, while there has also been a continued technology refresh of new servers being installed and old ones retired.

The overall failure rate has been so low that it's clear that even 4,000 servers is not a large enough sample size for accurate prediction so I would be very interested to hear of other people's experience in raising the temperature.  Nearly all failures have been HDD problems but we haven't got post-mortems on these to see if they were mechanical or electronic.

Note that the air handling units are now managed by multiple sensor server inlet temperatures and not by general room thermostats which is why I said that the air handler temperature is now "about" 25C.
0

LVL 47

Expert Comment

ID: 34213817
Actually HDDs have longer life and fewer errors at warmer temperatures then colder ones.  (Up to the point where they approach thermal shutdowns).

This is supported by a great deal of research,  there is some mention at http://storagesecrets.org  somewhere, and some graphs and such can't remember where but should be easy to find.
0

Expert Comment

ID: 34215210
This  studies show that the number of head-disk collisions increases due to lower head clearance as temperature increases.

Author: Strom, BD; Lee, S; Tyndall, GW; Khurshudov, A
IEEE TRANSACTIONS ON MAGNETICS Vol: 43 Issue: 9 ISSN: 0018-9464 Date: 09/2007 Pages: 3676 - 3684

This thesis which cites the paper above talks more on the effects of temperature on failure rates of hard drives.

Author: Bagul, Yogesh G
ISBN: 9780549950219 Date: 09/2009
0

Expert Comment

ID: 34215219
Sorry for the typos and grammatical errors.. I inadvertently submitted this before i had proof read it. Also these documents can be accessed though your local public or collage library since they are in databases that are not publicly accessible.
0

LVL 47

Expert Comment

ID: 34215328
Here is the link which supports the lower-drive-temperature = greater  failures.  It comes from the infamous google studiy which looked at 100,000 disk drives over several years at google.com...

http://storagesecrets.org/2009/01/diskcoolerscam/#more-109

Here is the probability density distribution which shows probability of drive failure against average temperature.  it is counter-intuitive, but quite specific.  In the 23-25 degree range you have higher probability of failure then if you run the disks in 35-45 degree C range.  In fact, you have several times higher probability of failure in that range then if the disk is in 40C - 50C.

diskfailures-fig4.jpg
0

LVL 9

Expert Comment

ID: 34215760
Now this is getting interesting: all the discussion is now on HDD failure rates and no one in their right mind would engineer any enterprise systems without inbuilt disk resilience such as RAID, log shipping, synchronous database replication, split-site SANs or all of them.  So, the overall availability of datacenter systems should not be impacted by reasonable HDD failures whether or not they are exacerbated by high temperatures or low temperatures

As for the other causes of server failure, we have found them to be in this order:
1. Software failure (Application and OS)
2. Human error (Admin, operation and network config)
3. Other network failures
4. Server hardware failures

Of these, only the last could be affected by computer room temperature although one could argue about human failures!  Availability is managed by the use of strong Change Management procedures, duplicated or quadruplicated hardware and comms, and a huge investment in the transaction application software engineering to make it resilient to any single failure.  Nearly all applications can now survive the loss of an entire data center - an event which has a signficant chance of happening due to natural disaster, loss of comms, terrorist activity or human error again.

In summary, we believe that using higher computer room temperatures has saved enough energy costs to pay for a good deal of added resilience and our systems therefore have HIGHER availability with higher room temperatures.

There is still much more energy saving that can be achieved - such as free cooling heat exchangers - and higher room temperatures assist here.  But it is still a struggle to persuade HVAC designers that all you need is reliable cold aisles and the rest of the room doesn't matter.  As well as supplying cool air to the cold aisles, we remove heat directly from the hot aisles and don't let it waft across the room.  As you walk into the Data Centers, the perceived ambient is quite warm - until you walk into the cold aisles.  Incidentally, this approach does require buying NEBS compliant Cisco network equipment to achieve the front to back air flow because the regular 7000 routers have a very confused cooling requirement.
0

LVL 47

Expert Comment

ID: 34215788
True, but the discussion is strictly on temperature.  The same human errors, network failures, server failures, etc. is a constant that is independent of temperature, so those variables can be removed.  As my graphics showed, temperature can easily quadruple the AFR of HDDs.

This goes against "mainframe mentality" where colder was better, which is likely the original premise by the author, where he/she thought colder actually was better.
0

LVL 1

Author Comment

ID: 34266105
Thanks again everyone, it is clearly a hot topic. We have stabilised now at 26/27C and I think this is how it will stay as there seems to be good argument either way and therefore extremely difficult for me to present a definitive case to our board for the cost involved.
0

## Featured Post

### Suggested Solutions

EE introduced a new rating method known as Level, which displays in your avatar as LVL. The new Level is a numeric ranking that is based on your Points. This article discusses the rationale behind the new method and provides the mathematical formula…
In the modern office, employees tend to move around the workplace a lot more freely. Conferences, collaborative groups, flexible seating and working from home require a new level of mobility. Technology has not only changed the behavior and the expe…
Notifications on Experts Exchange help you keep track of your activity and updates in one place. Watch this video to learn how to use them on the site to quickly access the content that matters to you.
Where to go on the main page to find the job listings. How to apply to a job that you are interested in from the list that is featured on our Careers page.