Solved

Intermittent loss of network on HP Proliant ML350 G6

Posted on 2013-12-03
6
1,556 Views
Last Modified: 2014-01-03
I'm baffled by this one - not sure if it is hardware, driver or OS issue.

The server is an ML350 G6 with SBS2011 installed.  This is connected via NIC1 (no NIC teaming) to an HP ProCurve 2510G-24 switch.

This system has been running just fine for nearly 3 years without any major issues.  Suddenly in the last couple of months an intermittent issue has come up where the server will lose all network connectivity for seemingly no reason and require a reboot.  

The server is not locked up or crashed (BSOD) mind you.  You can login just fine from a console and do a graceful reboot.  During the time the ILO is still responsive and the NIC within windows shows to be online and connected.

Things I've tried/looked at:

1 - Windows Event logs - there is nothing ever reported in any of the windows logs around the time of connectivity loss other than sometimes there is a DNS resolution error (presumably because the network has dropped).  My RMM tool does log the loss of connectivity and status update failures so I can get pretty close to when the issue is happening (within 30 sec +/-)

2 - HP IML, the Integrated Management log on the iLO shows nothing - it logs the power event for the reboot and that's it

3 - Switch syslog, there are some excessive broadcasts on the network from a few of the clients that have chatty software installed, but no issues for the port that the server is plugged into (or the other port that I moved it to for testing)

4 - Windows is 100% in current patch

5 - HP SUM (System Update Manager) has been run every month to get all critical and recommended system firmware/bios/driver updates as needed so that is also current.

6 - I have scanned for rootkits/malware/viruses etc numerous times using multiple tools from Sysinternals, GMER, MBAM, SoPHOS, ESET and it always comes up clean.

I want to call HP or Microsoft but I don't even have anything to give them to start debugging.  I cannot reproduce the issue, but it has happened on 11/3, 11/8, 11/15, and 12/3
0
Comment
Question by:DigiSec
6 Comments
 
LVL 56

Accepted Solution

by:
Cliff Galiher earned 250 total points
ID: 39694137
If a power cycle is the only thing that fixes it, my first guess is failing hardware. I'd probably start by disabling the NIC and adding a new NIC. You'll need to run the Fix My Network Wizard to get all the services bound to the new NIC...so as always...have a backup. But this is a relatively trivial thing to do. If you find that solves the problem, time to call HP.
0
 

Author Comment

by:DigiSec
ID: 39694149
That's a possibility.  For that matter I could switch to the unused NIC2 and rebind everything over there.  I believe they are physically separate controllers not shared controller with 2 ports.

To be fair, a reboot is the fastest easiest way that I have been able to get my client back online - talking a non technical person through logging into and restarting the server cleanly (really wish they would pop for the Advanced iLO license).

I can't reproduce so I haven't been onsite to try things like unplugging / disabling the NIC, or using sysinternals to trace or even perfmon to to look at current network utilization.  I suppose it is possible that something is literally locking the NIC - but I don't think it's a bottleneck issue because it is not momentary - still requires a reboot.
0
 
LVL 6

Assisted Solution

by:donnk
donnk earned 250 total points
ID: 39694757
Call HP and have them diagnose the hardware.
0
New! My Passport Wireless Pro Wi-Fi Mobile Storage

Portable wireless storage to offload, edit, and stream anywhere.

High-capacity, wireless mobile storage designed to accompany professional photographers and videographers in the field to easily offload, edit and stream captured photos and high-definition videos.

 

Author Comment

by:DigiSec
ID: 39695483
Yeah, I think that's the plan for an emergency outage tonight.  Going to have HP diagnose the HW and switch over to the second NIC at the same time.  This will be next to impossible to test though since it is intermittent and not reproducible by me.

I will award partial points - both are good answers.
0
 

Expert Comment

by:BitHammer
ID: 39753742
Has HP solved the problem? I have two servers with the same issue, running two different versions of Windows Server (2003, 2008). Both Proliant DL-308, one is G3, the other G4. Because of the timing of the events, I was thinking it was due to a Microsoft update. It started after an update occurred on both machines pretty close to the same time. I suppose it could be coincidental hardware failures, but it seems unlikely. It's a huge issue as one of them is our internal DNS server and it's going down daily.
0
 

Author Comment

by:DigiSec
ID: 39755607
Interestingly enough - no.  We swapped out the motherboard to replace the NICs per HP - but had the call yesterday that the "Server was down"  I could see by the iLO that the system was up and was able to gracefully reboot via iLO - but it was inaccessible on the network.

I'm re-opening the case with HP now
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You may have discovered the 'Compatibility View Settings' workaround for making your SBS 2008 Remote Web Workplace 'connect to a computer' section stops 'working around' after a Windows 10 client upgrade.  That can be fixed so it 'works around' agai…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now