Solved

Intermittent loss of network on HP Proliant ML350 G6

Posted on 2013-12-03
6
1,545 Views
Last Modified: 2014-01-03
I'm baffled by this one - not sure if it is hardware, driver or OS issue.

The server is an ML350 G6 with SBS2011 installed.  This is connected via NIC1 (no NIC teaming) to an HP ProCurve 2510G-24 switch.

This system has been running just fine for nearly 3 years without any major issues.  Suddenly in the last couple of months an intermittent issue has come up where the server will lose all network connectivity for seemingly no reason and require a reboot.  

The server is not locked up or crashed (BSOD) mind you.  You can login just fine from a console and do a graceful reboot.  During the time the ILO is still responsive and the NIC within windows shows to be online and connected.

Things I've tried/looked at:

1 - Windows Event logs - there is nothing ever reported in any of the windows logs around the time of connectivity loss other than sometimes there is a DNS resolution error (presumably because the network has dropped).  My RMM tool does log the loss of connectivity and status update failures so I can get pretty close to when the issue is happening (within 30 sec +/-)

2 - HP IML, the Integrated Management log on the iLO shows nothing - it logs the power event for the reboot and that's it

3 - Switch syslog, there are some excessive broadcasts on the network from a few of the clients that have chatty software installed, but no issues for the port that the server is plugged into (or the other port that I moved it to for testing)

4 - Windows is 100% in current patch

5 - HP SUM (System Update Manager) has been run every month to get all critical and recommended system firmware/bios/driver updates as needed so that is also current.

6 - I have scanned for rootkits/malware/viruses etc numerous times using multiple tools from Sysinternals, GMER, MBAM, SoPHOS, ESET and it always comes up clean.

I want to call HP or Microsoft but I don't even have anything to give them to start debugging.  I cannot reproduce the issue, but it has happened on 11/3, 11/8, 11/15, and 12/3
0
Comment
Question by:DigiSec
6 Comments
 
LVL 56

Accepted Solution

by:
Cliff Galiher earned 250 total points
ID: 39694137
If a power cycle is the only thing that fixes it, my first guess is failing hardware. I'd probably start by disabling the NIC and adding a new NIC. You'll need to run the Fix My Network Wizard to get all the services bound to the new NIC...so as always...have a backup. But this is a relatively trivial thing to do. If you find that solves the problem, time to call HP.
0
 

Author Comment

by:DigiSec
ID: 39694149
That's a possibility.  For that matter I could switch to the unused NIC2 and rebind everything over there.  I believe they are physically separate controllers not shared controller with 2 ports.

To be fair, a reboot is the fastest easiest way that I have been able to get my client back online - talking a non technical person through logging into and restarting the server cleanly (really wish they would pop for the Advanced iLO license).

I can't reproduce so I haven't been onsite to try things like unplugging / disabling the NIC, or using sysinternals to trace or even perfmon to to look at current network utilization.  I suppose it is possible that something is literally locking the NIC - but I don't think it's a bottleneck issue because it is not momentary - still requires a reboot.
0
 
LVL 6

Assisted Solution

by:donnk
donnk earned 250 total points
ID: 39694757
Call HP and have them diagnose the hardware.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:DigiSec
ID: 39695483
Yeah, I think that's the plan for an emergency outage tonight.  Going to have HP diagnose the HW and switch over to the second NIC at the same time.  This will be next to impossible to test though since it is intermittent and not reproducible by me.

I will award partial points - both are good answers.
0
 

Expert Comment

by:BitHammer
ID: 39753742
Has HP solved the problem? I have two servers with the same issue, running two different versions of Windows Server (2003, 2008). Both Proliant DL-308, one is G3, the other G4. Because of the timing of the events, I was thinking it was due to a Microsoft update. It started after an update occurred on both machines pretty close to the same time. I suppose it could be coincidental hardware failures, but it seems unlikely. It's a huge issue as one of them is our internal DNS server and it's going down daily.
0
 

Author Comment

by:DigiSec
ID: 39755607
Interestingly enough - no.  We swapped out the motherboard to replace the NICs per HP - but had the call yesterday that the "Server was down"  I could see by the iLO that the system was up and was able to gracefully reboot via iLO - but it was inaccessible on the network.

I'm re-opening the case with HP now
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Hyper-convergence systems have taken the IT world by storm and have quickly started to change our point of view of how the data center should and could be architected. In this article, I’ll explain the benefits of employing a hyper-converged system …
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now