Solved

Windows Server 2008 R2 Network Disconnect Intermittently

Posted on 2011-07-06
13,313 Views
Last Modified: 2012-05-11
I am running ESXi 4.1 on a HP Proliant ML 350 G6 hosting 3 virtual guest Windows servers. One server is running Windows 2008 R2 64 Bit as a domain controller and DNS. The second server is Windows 2003 and is a print server, fileserver and application server. The third and problematic server is Windows Server 2008 R2 hosting Exchange Server 2010 with a base build exactly the same as the Windows 2008 domain controller. Intermittently as much as once or more a day the network connection will drop and only a server restart will resolve the connection. The Exchange Server has sufficient hardware resources and other then network disconnections runs normally with acceptable performance. The only difference between the Windows 2008 servers is one is a domain controller and the other is not and is hosting Exchange.

After a restart the server will behave normally from 3 hours up to 16 hours where all mail sending and receiving services through POP3, IMAP4 or OWA all work without issue. Then the network will randomly disconnect with the following test results:

•      Network and Sharing shows complete network disconnect – no LAN or WAN access
•      ipconfig returns expected network configurations settings
•      From Exchange server cannot ping DNS, Gateway or any other LAN IP or hostname
•      No response when I ping Exchange server from another computer on the LAN
•      Loopback or ping of Exchange IP does return a response (network card is active and responding)
•      Network troubleshoot/repair does not resolve the problem
•      Disable and enable NIC does not repair the problem.
•      Restarting Exchange and Network service does not resolve problem
•      Timing for disconnect Event ID 1014:
        Name resolution for the name dns.msftncsi.com timed out after none of the configure DNS servers responded.

The Exchange 2010 server is configured as one Organization hosting Mailbox, Client Access and Hub Transport. It has only one mailbox database. Client Access includes OWA, POP3, IMAP4 and Offline Address Book and two receive connectors for client and OWA.

All three servers described above have their virtual hardware configured identical including E1000 NICs with one assigned IP. Network configurations are identical including one IP, subnet, Gateway and DNS as well as all the NIC driver settings themselves.

The two other two Windows server network connections are fully reliable and have been stable since they went into production and they are hosted off the same physical NIC.

Some things I have tried in attempt to resolve the problem:

•      Uninstalled and re-installed NIC driver through windows
•      Confirmed same driver version as other known good Windows 2008  (Micorsoft 8.4.1.0)
•      Uninstalled and re-installed E1000 VMWare hardware NICs
•      Uninstalled E1000 NIC and tried using VMXNET 2 (Enhanced) and VMXNET3
•      Confirmed VMWare Tools are update and service is running
•      Reinstalled VMWare Tools
•      Configured Windows Power Options are set to Performance
•      Configured Windows Power Options PCI Express option is Off
•      NIC Driver Disabled “Allow the computer to turn off this device to save power”
•      Fully disabled IPV6 using Microsoft tool
•      Checked Group Policies to confirm no network or power settings are being forced
•      Fully disabled Windows Firewall including service
•      Updated Kaspersky from Windows Server 6.0.4.1424 to Windows Server Enterprise 8.0.0.599
•      Configured Kaspersky with recommended Exclusion Rules for Microsoft and Kaspersky.
•      Kaspersky Trusted Processes list in empty?
•      Windows logs around disconnect times Event Viewer > System
Event ID 1014
Name resolution for the name dns.msftncsi.com timed out after none of the configured DNS servers responded.
•      No Kaspersky logs
•      Virus scan report indicates 100% clean

All symptoms indicate this is a local issue and not related to any actual network connectivity since other guest VMs with the same configuration do not have this problem. Hardware resources are sufficient. There is no decipherable pattern other than daily and normally between 5:00 and 11:00 am so I suspect some background process such as power options or possibly Kaspersky invokes the disconnect.

Let me know if you have any suggestions or need me more information.

Thanks,
Mike
0
Question by:AutomationOne
    17 Comments
     
    LVL 1

    Expert Comment

    by:cbielich
    Lets try some basic network trouble shooting

    I am assuming you are using a dedicated IP for the server. Try disabling the nic or change the ip address and see if at that time you can still ping the address. Maybe someone else has the same ip address assigned and you are getting conflicts at the time they boot up or are online.

    You running full, half duplex? try changing those up and see what happens.
    0
     

    Author Comment

    by:AutomationOne
    Thanks for your response.

    Sorry I forgot to mention that in my original post. After a disconnect I did ping the IP the server has configured and no response. It is not an IP conflict. I will double check the next time it disconnects.

    I have duplexing set to Auto-negotiate which is the same as the other servers that are okay. Could it still make a difference?
    0
     
    LVL 2

    Expert Comment

    by:vvzar
    Please check is there any other services at problematic server? May be one of them configured incorrectly?

    Also this may ve a routing issue.

    please put here result of route print command. when all ok, and then when connection problem.
    0
     

    Author Comment

    by:AutomationOne
    The only services that are running are for Exchange as previously mentioned.

    Since Kaspersky can be quite aggressive with perceived threats I thought it might be possible that it was disconnecting the network as an intrusion detection method but after installing Kaspersky AV and Agent the network still dropped.

    Find attached the files for ROUTE for conencted and disconnected. AOEX01-Route-Command-Connected.txt AOEX01-Route-Command-Disconnecte.txt
    0
     

    Author Comment

    by:AutomationOne
    Also when the network disconnects I found that Disabling and Re-enabling the Local Area Connections reestablishes the connection. So as mentioned previously a restart is not necessary to resolve the problem.
    0
     
    LVL 2

    Expert Comment

    by:vvzar
    routes seems to be all ok.
    when you disable lan connection. hmmm...
    sounds like a software network loop. description as well as carp table override, or ip \ mac address conflict.

    when issue happnens, in command shell, try to enter the next:  netsh interface ip delete arpcache
    0
     

    Author Comment

    by:AutomationOne
    Since it is a VM I am going to try changing the MAC to rule out that out.

    Fairly confident it is not an IP conflict as when it occurs I disconnect the server from network and ping the IP with no response. It is possible another networked device has ping response disabled so there would be no reply.

    Just to clarify the next time this occurs on the server you want me to clear ARP cache by running "netsh interface 192.168.1.x ip delete arpcache".

    If that doesn't resolve the problem I will try changing IPs but obviously there is a bit of work with DNS and firewall if I take that step.

    Thanks again for your help.
    0
     

    Author Comment

    by:AutomationOne
    Changing MAC did not resolve the problem.

    There does seem to be a pattern transpiring where in the morning between 8:00 am and 10:00 am it will disconnect. So there might be a device connecting to the network or waking up on the network that is causing the problem. I would have thought the live IP would not be affected and the device that connects with the same IP would be the one impacted. In fact I tried to recreate the problem by configuring a workstation with the same IP, restarting and found the workstation did not connect while the server's connection remained live.
    0
     
    LVL 1

    Expert Comment

    by:cbielich
    What kind of switch are you connected too

    Model?
    0
     

    Author Comment

    by:AutomationOne
    DLink DES1024R

    The thing is the other VMs on the same virtual and physical switch are okay.

    To rule out the physical NIC and switch port I am connecting the second physical NIC on the server. Then I'm going to create a second virtual switch for the card and route the problematic server through there.
    0
     
    LVL 1

    Expert Comment

    by:cbielich
    Yeah but your VM has a unique MAC address, there could be something bugging out in the ARP table somewere
    0
     
    LVL 1

    Expert Comment

    by:cbielich
    Did you clone your VM from a physical server that is now being used on the same network?
    0
     

    Author Comment

    by:AutomationOne
    Yes I may have used the VMWare OVF template to create the VM.

    The VM MACs are definitely unique. In fact the problem still occurs after I have added a new virtual NIC.

    Thanks,

    Mike
    0
     
    LVL 36

    Expert Comment

    by:ArneLovius
    As you yhave done extensive troubleshooting that has not brought to light anything obvious, I would be very tempted to sping up a new VM, install 2k8r2 on it install Exchange 2010 on it and see if you have the same problem.

    When installing 2k8, I would suggest doing the install "manually" from a mounted ISO, not using a template.

    If you do not have the dame problem, I would suggest moving connectors and mailboxes etc onto the "new" Exchange server. Once you have moved all of the "working" parts of Exchange, you can decide if you want to further explore, or just uninstall Exchange and take the VM off the Domain.

    Cheers
    0
     

    Author Comment

    by:AutomationOne
    I made two changes July 8 and just returned to work today July 13 to find it has been live ever since.

    I enabled the second physical NIC and configured load blalance connecting both NICs directly to the Sonic Wall's LAN ports.  

    I configured both physical NICs to 100 Full Duplex in ESXi. In Windows Server the NICs are Auto-negotiate

    Since I am only running 3 servers in ESXi with moderate network utilization I have a hard time believing load balancing was a solution. What I suspect is the orginal NIC that has been providing network connectivity has an issue where it is not handling the three VMs well. Possibly hardware defect or firmware upgrade.

    At this time the problem is resolved although I am not 100% sure what fixed it. I will continue to investigate and report the results.

    Thanks,

    Mike
    0
     

    Accepted Solution

    by:
    Configured Network Adapters Speed/Duplex to 100 mbs/Full
    In ESXi
    Inventory > Host > Configuration > Networking > Properties > Network Adapters

    Problem resolved ever since.
    0
     

    Author Closing Comment

    by:AutomationOne
    The solution was identified in the first posted response to check link speed and duplexing options. The settings were confirmed in Windows Server 2008 but until I investigated further I was not aware of the ESXi host settings for speed and duplexing options.
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Do email signature updates give you a headache?

    Constantly trying to correctly format email signatures? Spending all of your time at every user’s desk to make updates? Want high-quality HTML signatures on all devices, including on mobiles and Macs? Then, let Exclaimer solve all your email signature problems today!

    Automapping, a wonderful feature with Exchange 2010 (SP2 onwards I believe), allows additional/Shared mailboxes that a user has access to be automatically mapped on Outlook client, simplifying the process by adding them while Outlook launches. Ho…
    Easy CSR creation in Exchange 2007,2010 and 2013
    This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
    The video tutorial explains the basics of the Exchange server Database Availability groups. The components of this video include: 1. Automatic Failover 2. Failover Clustering 3. Active Manager

    856 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    13 Experts available now in Live!

    Get 1:1 Help Now