Testing for heat related damage on servers
Posted on 2008-10-27
Greetings Experts. Our server room has experienced two severe over-heating issues within 1 weekend. The AC failed and since no one was here on Saturday, we weren't aware of the problem until three of the eight servers failed or were unresponsive. When we arrived on site, the server room had hit 105 degrees. We powered down the remaining servers. On Sunday, the building owner said the AC was fixed so the servers were turned back on. This morning the server room was again 100+ degrees so we went through the process of shutting everything down again.
I know there are a host of things we need to address (mainly the AC unit) but the question being asked is if there are any tools that you are familiar with that can determine if any hardware was damaged (like a stress test, etc..) I did several searches on Google and wanted to check with the experts before I started installing 3rd party apps on my servers.
Thanks in advance for the help.