I'm baffled by this one - not sure if it is hardware, driver or OS issue.
The server is an ML350 G6 with SBS2011 installed. This is connected via NIC1 (no NIC teaming) to an HP ProCurve 2510G-24 switch.
This system has been running just fine for nearly 3 years without any major issues. Suddenly in the last couple of months an intermittent issue has come up where the server will lose all network connectivity for seemingly no reason and require a reboot.
The server is not locked up or crashed (BSOD) mind you. You can login just fine from a console and do a graceful reboot. During the time the ILO is still responsive and the NIC within windows shows to be online and connected.
Things I've tried/looked at:
1 - Windows Event logs - there is nothing ever reported in any of the windows logs around the time of connectivity loss other than sometimes there is a DNS resolution error (presumably because the network has dropped). My RMM tool does log the loss of connectivity and status update failures so I can get pretty close to when the issue is happening (within 30 sec +/-)
2 - HP IML, the Integrated Management log on the iLO shows nothing - it logs the power event for the reboot and that's it
3 - Switch syslog, there are some excessive broadcasts on the network from a few of the clients that have chatty software installed, but no issues for the port that the server is plugged into (or the other port that I moved it to for testing)
4 - Windows is 100% in current patch
5 - HP SUM (System Update Manager) has been run every month to get all critical and recommended system firmware/bios/driver updates as needed so that is also current.
6 - I have scanned for rootkits/malware/viruses etc numerous times using multiple tools from Sysinternals, GMER, MBAM, SoPHOS, ESET and it always comes up clean.
I want to call HP or Microsoft but I don't even have anything to give them to start debugging. I cannot reproduce the issue, but it has happened on 11/3, 11/8, 11/15, and 12/3