Explanation for outage
Posted on 2011-09-14
I would like to pick your brains on something that recently occurred at a customer for which I try to find a reasonable explanation.
In four days time, three H700 RAID controllers broke down in three different R510 Dell servers of each about one year old. Exchanging the controllers by a new one Dell sent, proved to be the solution.
One could argue coincidence, but I think this is statistically challenging.
I argued something environmental, which the customer doubts.
The servers are located in a server room close to a factory where paper is printed and handled. Paper dust might be an issue.
The customer is also located next to a high voltage setup. Maybe something in the configuration has changed that altered the characteristics of the power supplied. Mind you we do have a UPS battery connected to the servers, which is supposed to "clean" the power.
Of course it could be the UPS itself ...
"A virus" has been argued as well, but I don't see how this can affect a RAID controller, not the used Linux operating system.
What are your thoughts?