This morning around 2:15am, three of our servers crashed, all at the same time (within a minute) and were hung on this blue screen:
The event logs on all of the servers that blue screened are pretty clean. The only events leading up to the blue screen’s around 2:15 were automatic update services, which only started and stopped on all servers. No updates were actually installed.
These servers are VM's. There are 13 VMs all running Server 2003 on this one Proliant DL585 G2 ESX server. Only 3 or them crashed.
If you look in VIC, and click the performance tab à Change Chart Options and sort by “last day”, you can see that around 2:15 – 2:20am, all of the servers that blue screened had a sudden spike in CPU usage, all around the same time. If you look at the other servers, their CPU usage remained stable.
It’s hard to tell what caused the Blue Screens. Many times they are caused from Windows Updates, or hardware failures. However, I believe if it was a hardware issue, all of the other VM’s would have crashed also. But, there’s also nothing software related going on in the event logs leading up to the crash.
Any ideas as to what could have cause this or how to dig deeper and what to look for?