<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

When your Windows server goes silent

Published on
10,715 Points
4,615 Views
1 Endorsement
Last Modified:
Approved
A coworker and I recently ran into a problem where one of our Windows 2003 servers just dropped off the network for apparently no reason at all.  The server was up and running happily, with no apparent problem, except it was completely unresponsive -- exactly as if a network cable were unplugged.  The server has a pair of on-board Broadcom NICs with each one connected to a separate switch in a two-switch stack and teamed for load-balancing and failover.

The server dropped out off sometime during the day before.  We poked around in the Event logs around that time, but found nothing related to networking.  Of course, the patch cables were fine and it was highly unlikely that both switches, Cisco 3750's, were faulty.  Our network engineering team checked the port configuration on each switch, and not only were they configured correctly, but their laptop worked fine when configured with the same IP address and connected to the same ports.

It is obviously not a network problem, so our focus turned back to the server itself.  We uninstalled the NIC teaming software and configured one of them to have a proper IP address.  No cigar.  Perhaps the TCP/IP stack somehow became corrupt?  So we reset TCP/IP by executing the following command:
netsh interface ip reset C:\ipreset.log

Open in new window

This required a reboot, and as we stared impatiently at the POST progress, we began to grow happy that our six-hour ordeal might be over.

Wrong.  After all that, we still had made absolutely no progress at identifying the cause.  Hmm... these two NICs are on-board, so they likely share a single controller.  If the controller went bad, both NICs might be affected, right?

Our next move was to install a fresh new dual-port Intel NIC into virgin PCI Express slot.  We disabled the on-board NICs in the BIOS setup menu and fired up the server to re-configure teaming.  Once everything was configured, we cracked open a command prompt to ping the gateway, but were met again with the same, familiar disappointment.  Now what?  How could this be?

At this point, we'd verified the switch stack was not misconfigured.  We'd updated drivers and teaming software for the on-board Broadcom NICs.  We removed the teaming and reverted to a single-NIC configuration.  We'd reset the TCP/IP stack.  We even installed a whole new NIC from another manufacturer!  And after all that, we still had the exact same problem!  We'd also started checking for simple things, like making sure the firewall wasn't enabled (but even if it were, I wouldn't expect it to cause the server to "unplug" itself from the network).

At this point, there were four engineers scratching their heads.  I started grabbing for straws and decided to throw Service Pack 2 in there -- a desperate move, but it needed it anyway.  Meanwhile, my coworker turned to every IT professional's best friend: Google -- what he searched for, I don't know, but he landed on a knowledgebase article from Microsoft, KB870910.

This article describes a problem with the IPSec MMC, so at first glance, it was an unrelated issue.  However, in it we would stumble upon our solution.

In the resolution section, a registry key was identified to be deleted
HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\IPSec\Local

Open in new window


However, the IPSec\Local keys didn't even exist.  Okay, so we're definately on the wrong track, right?  Not so fast.  We continued on to step two, which offered a command to run to rebuild local policy store
regsvr32 polstore.dll

Open in new window


We threw this into a command prompt, slapped the [Enter] key, and rebooted... AND EUREKA, WE HAD EMERGED VICTORIOUSLY!

So what happened to cause the corruption?  We cannot be certain.  But we did learn from a valuable test: if everything seems right, but your server acts as if the network cable is unplugged, a missing or corrupt IPSec policy might be the cause.  While I've never run into this before in my nine years of server administration, this is definately one of those situations that I will remember.
1
1 Comment
LVL 7

Expert Comment

by:eugene20022002
I didnt use this myself but nice article. I know that "eureka feeling when you finally get it right :-)
0

Featured Post

Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

Wrapper-1-Query. Use an Excel function to calculate a column for an Access query. Part 1. Shows a query in Access that has a calculated column with the results of an Excel worksheet function. See how to call a wrapper function from a query, and …
I've published three five-minute Experts Exchange video Micro Tutorials that describe terrific features in an excellent, free PDF product called PDF-XChange Editor: How to rotate pages in a PDF with free software (https://www.experts-exchange.com…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month