I am running ESXi 4.1 on a HP Proliant ML 350 G6 hosting 3 virtual guest Windows servers. One server is running Windows 2008 R2 64 Bit as a domain controller and DNS. The second server is Windows 2003 and is a print server, fileserver and application server. The third and problematic server is Windows Server 2008 R2 hosting Exchange Server 2010 with a base build exactly the same as the Windows 2008 domain controller. Intermittently as much as once or more a day the network connection will drop and only a server restart will resolve the connection. The Exchange Server has sufficient hardware resources and other then network disconnections runs normally with acceptable performance. The only difference between the Windows 2008 servers is one is a domain controller and the other is not and is hosting Exchange.
After a restart the server will behave normally from 3 hours up to 16 hours where all mail sending and receiving services through POP3, IMAP4 or OWA all work without issue. Then the network will randomly disconnect with the following test results:
• Network and Sharing shows complete network disconnect – no LAN or WAN access
• ipconfig returns expected network configurations settings
• From Exchange server cannot ping DNS, Gateway or any other LAN IP or hostname
• No response when I ping Exchange server from another computer on the LAN
• Loopback or ping of Exchange IP does return a response (network card is active and responding)
• Network troubleshoot/repair does not resolve the problem
• Disable and enable NIC does not repair the problem.
• Restarting Exchange and Network service does not resolve problem
• Timing for disconnect Event ID 1014:
Name resolution for the name dns.msftncsi.com timed out after none of the configure DNS servers responded.
The Exchange 2010 server is configured as one Organization hosting Mailbox, Client Access and Hub Transport. It has only one mailbox database. Client Access includes OWA, POP3, IMAP4 and Offline Address Book and two receive connectors for client and OWA.
All three servers described above have their virtual hardware configured identical including E1000 NICs with one assigned IP. Network configurations are identical including one IP, subnet, Gateway and DNS as well as all the NIC driver settings themselves.
The two other two Windows server network connections are fully reliable and have been stable since they went into production and they are hosted off the same physical NIC.
Some things I have tried in attempt to resolve the problem:
• Uninstalled and re-installed NIC driver through windows
• Confirmed same driver version as other known good Windows 2008 (Micorsoft 126.96.36.199)
• Uninstalled and re-installed E1000 VMWare hardware NICs
• Uninstalled E1000 NIC and tried using VMXNET 2 (Enhanced) and VMXNET3
• Confirmed VMWare Tools are update and service is running
• Reinstalled VMWare Tools
• Configured Windows Power Options are set to Performance
• Configured Windows Power Options PCI Express option is Off
• NIC Driver Disabled “Allow the computer to turn off this device to save power”
• Fully disabled IPV6 using Microsoft tool
• Checked Group Policies to confirm no network or power settings are being forced
• Fully disabled Windows Firewall including service
• Updated Kaspersky from Windows Server 188.8.131.524 to Windows Server Enterprise 184.108.40.2069
• Configured Kaspersky with recommended Exclusion Rules for Microsoft and Kaspersky.
• Kaspersky Trusted Processes list in empty?
• Windows logs around disconnect times Event Viewer > System
Event ID 1014
Name resolution for the name dns.msftncsi.com timed out after none of the configured DNS servers responded.
• No Kaspersky logs
• Virus scan report indicates 100% clean
All symptoms indicate this is a local issue and not related to any actual network connectivity since other guest VMs with the same configuration do not have this problem. Hardware resources are sufficient. There is no decipherable pattern other than daily and normally between 5:00 and 11:00 am so I suspect some background process such as power options or possibly Kaspersky invokes the disconnect.
Let me know if you have any suggestions or need me more information.