If anyone has any insight into this it would be lovely, this is a bit of a long shot.
We have a Windows 2003 Cluster acting as a File Server (Active / Passive). The Nodes in this cluster have developed this rather nasty habit of shutting down. This can be either (or both) of the nodes (occasionally at exactly the same time).
I can't really say what it's up to when it shuts down, there's no consistency:
- No Information messages / Warnings / Errors logged in the Event Viewer related to the shutdown
- Cluster Log shows no errors (just the Heartbeat failure when a Node shuts down)
- No excessive system activity
- No Memory Dump files
- No BSoD
I even checked the WBEM logs on the off-chance they said something meaningful...
The storage for the Cluster is based on an EMC Clariion SAN using PowerPath to manage the HBA. We have one patch we can apply to Powerpath, just in case.
We ran Dell's hardware diagnostic tools to check the hardware, everything comes back clear there.
Can anyone think of anything / anywhere else to check?