HP MSA 1000 controller failure - beyond useful life?
Posted on 2012-03-17
The end question is, specifically what would be the expected useful life of an HP MSA 1000 controller board? I had one "go nuts" recently, and it was suggested that it was past its expected life. I think the MSA 1000 is about six years in service. Yes, hard drives fail, but a controller board that is only electronics, in a fairly dust free, static and temperature controlled server room, on a dedicated power circuit, connected via a UPS (standby, with over and under voltage protection) shouldn't "wear out" in about six years should it?
A single drive in the array went bad, not an unusual event. A hot spare was present and configured. The controller did not pick up the failed drive status, and did not pull the hot spare online and rebuild the array automatically as it should have.
Instead, the O/S (Windows Server 2003) ended up seeing the volume as heavily corrupted, and resulted in many errors as end users tried to access shares and the shared files on the volume. The volume could be seen, but the longer things went the more corruption seemed to occur. A rebuild of the volume was a final result, along with replacing the controller board in the MSA 1000.
For more info, the array was one LUN, with RAID6 (ADG). Ten physical disks. Rebuild priority was not set to zero. Everything I know of should indicate the array should have easily survived a single drive failure.