We have a HP DL385 G2 rackmount server running VMWare ESX 4.1.
It has 2 RAID cards, the internal (P400) seems to be going fine but in the last few days we've had the P800 (512MB BBWC) connected to a MSA60 with what seems to be it halting. A while back after a reboot the server hung not long after the VM guests came back online however a hard-reset fixed it. I know of a case with another HP RAID card having issues with heavy I/O halting it (which seemed similar) however even after a full Firmware update today it happened 4 hours later.
Now we also have a Seagate ES drive (1TB) in the MSA that has twice today been marked as faulty. I'm at a loss to it being an error as only today (the 5th crash tonight since Monday night) it showed as failed twice (remove, reseat, re-sync & ok).
Can anyone give me a heads up possibly with any advice to what it could be? Is it as simple as an actual HDD failing (the one in question is a 1TB RAID1 set) or is there issues with the RAID controller?
I've already cut back the caching to 75% read, 25% write & disabled the Array acceleration on the RAID1 as mentioned above as preventative measures as well as lowering the load on the server but have to now think about moving images to another VM Server or making other arrangements!
Screenshot of system post failure:
Diagnostics report of HP RAID ACU: report-4e82d645-000065bc-0000000.zip