Solved

network problem? server drops pings, then comes back. Every 2 minutes...

Posted on 2010-08-23
18
1,239 Views
Last Modified: 2012-05-10
We have two DHCP servers (Domain controllers) on our active directory. ADC1 is a Windows 2003 machine, ADC2 is Windows 2008.

One of our servers (SQL) keeps dropping connections. For example: for 2 minutes it will ping properly, and shares show up when accessed via the network (\\sql). Afterwards, it drops the connection, and becomes un-pingable. After a brief period, it comes back online, and responds to ping requests.

Some workstations on the network don't have the problem. Some constantly switch between working and non-working. Reboot of ADC's and the SQL itself does not help.

Computers connected to either ADC1 or ADC2 have the same problem. (for example one PC on ADC1 dhcp lease will pick up SQL wihout drops in connection, while another, on the same dhcp server, has the connection dropped).

SQL is on static ip, it's Event Viewer does not show any changes in network, services or drivers failure. No automatic updates, no expired licenses, no limited number of simultaneous users, no new ip address leases. Workstations also don't change IP's, even though they're on DHCP.

Any ideas on what could be causing SQL to drop and re-establish connections?

The only suspicious log I found on ADC1 was "The dns server has enountered numerous run-time events. To determine the initial cause of these run-time events, examine the DNS server event log entries that precede this event...." Event id:3000.

Unfortunately, the log does not show anything prior to this event.
0
Comment
Question by:94704
  • 6
  • 5
  • 2
  • +4
18 Comments
 
LVL 3

Expert Comment

by:Dave_LaSalle
Comment Utility
On the server ping loopback with the -t switch
let it run for a while
use ctrl + Break keys to get stats of pings/loss
if it's dropping it may be your nic
if not then suspect dns
0
 
LVL 21

Expert Comment

by:snusgubben
Comment Utility
If you have any sort of anti-virus programs on the SQL server I would try to uninstall it and see if that helps (even if you have marked the SQL program folder to not be online scanned).
0
 

Expert Comment

by:castercorey
Comment Utility
Are you using a megabyte switch or a gigbyte switch on your network?
0
 

Author Comment

by:94704
Comment Utility
The loopback pings on the server (on sql: ping sql -t) work fine. NIC is ok.

No antivirus on the server; nothing that would refresh it every 2 minutes or so.
0
 
LVL 29

Accepted Solution

by:
pwindell earned 145 total points
Comment Utility
First,...a few misconceptions.  Computers are not "connected to DCs".    They startup, they authenticate,...then nothing...you can throw the DC out in the street and the workstation wouldn't know the difference unitl the next time authentication was needed for something.    DHCP - well by default the Client only connects to DHCP once at start up and once every 4 days if left running,...so DHCP is irrelevant.
DNS could be one of your problems (just a big guess).  
DC's should never be mulit-homed.
DNS Scheme should be:
1. All machines on the LAN use only the AD/DNS and nothing else,..ever.  That includes the DC's themselves and the Firewall Device
2. The external DNS (the ISP's?) should be listed as a Forwarder within the config of the DNS Services.  If omitted the DNS will default to using Root hints.
3. Examine the DNS Zone and delete any incorrect, duplicated, or wrong entries.  Most entries will be added automatically so I reccomend that you do not manually add any.  A fast way to make a machine re-register itself in DNS it to rightclick on the "connection" and tell it to do a repair.  You can also do this from a command prompt:
IPCONFIG /registerdns
0
 
LVL 21

Expert Comment

by:snusgubben
Comment Utility
On SQL server: ping 127.0.0.1 -t

at the same time:

On a host:
ping <IP to SQL> -t


Do both drops packets after ie. 2 minutes?
0
 
LVL 4

Assisted Solution

by:bhartwell
bhartwell earned 30 total points
Comment Utility
Try restarting the DNS service on your server and see if that helps.
0
 
LVL 21

Expert Comment

by:snusgubben
Comment Utility
btw which HW is the SQL on?
0
 

Author Comment

by:94704
Comment Utility
@ pwindell: I already tried ipconfig /registerdns. No errors in the Event Viewer.
@ snusgubben: yes, I tried pinging IP, pinging the hostname, and pinging loopback 127.0.0.1. No loss, no downtime. Responds to pings constantly. Basically, I'm convinced that NIC is fine.
@ bhartwell: both ADCs have been restarted a few times now, therefore DNS service on them has been restarted as well.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:94704
Comment Utility
Do you think re-joining the domain under another name, rebooting, after which re-joining it again as SQL would fix the problem?
0
 
LVL 21

Expert Comment

by:snusgubben
Comment Utility
You said that it became un-pingable in your initial post?!

If you have 1 Gbit network cards on the SQL, you could try to lock it to 100 Mbit just to see if it is a negotiation problem.

Outdated NIC drivers?
0
 

Author Comment

by:94704
Comment Utility
@ snusgubben: the machine becomes unpingable only by other workstations in the AD. For example: workstation1 can ping SQL. 100% works. 2 minutes later, the same workstation cannot ping SQL. 2 minutes later, workstation1 is able to ping SQL. And so on...
0
 
LVL 21

Expert Comment

by:snusgubben
Comment Utility
If your SQL had a machine account problem the DC's would have logged it.

This sounds to me like a NIC / NIC driver / "1000 Mb vs 100 Mb problem" or some sort of anti-virus (firewall) problem.

The clients caches the FQDN and gets a kerberos ticket the first time they connect to the server. I doubt it's a domain problem (but who knows :)
0
 
LVL 6

Assisted Solution

by:Galtar99
Galtar99 earned 75 total points
Comment Utility
Is your SQL server plugged into a managed switch?  If it is, can you look at the switch to see if it notices the link dropping?  (show log)  Make sure logging is set to informational.
If it isn't, do you see the link light go out?  Maybe try switching it to another port on the same switch to see if that port is going bad.  It may seem like an obvious thing, but update the NIC driver and ensure the server has all the latest windows updates.
Maybe hard code the switch and/or NIC to one speed 1000MB/Full or whatever you feel comfortable with.
0
 

Author Comment

by:94704
Comment Utility
@ Galtar99: The server and the machine are both set to 100FDx, so that rules out any auto-config mismatches. Procurve (switch) log does not show any conflicts on the port or mac-address related conflicts.

@ snusgubben: Windows firewall is off/disabled. No other antivirus or firewall is present on the machine. Just updated the network card drivers (intel) to the most recent.

Still most workstations on the network loose connection to SQL every 2 minutes.

Odd: the connection with SQL (and ability to successfully ping) gets restored sooner, if the workstation1 user re-accesses the shared folder (\\sql). Nonetheless, the server drops connections and re-establishes them every 2 minutes.
0
 

Expert Comment

by:castercorey
Comment Utility
configure your nic to match your network switch. so if your switch is a gigabyte switch than confiure it to full duplex at gigabyte rate.
0
 

Author Closing Comment

by:94704
Comment Utility
With the help few of you and the suggestions I figured it out.

The problem was a combination of:

1.) Speed mismatch between the switch and SQL. Now, instead of "auto" on both ends, the server and switch are set to 100FDx (although both support a gigabit connection).

That fixed collisions and partially the dns issue.

2.) DNS entries. Over the years, one of the ADC's accumulated several static entries for "SQL". Going to DNS management console fixed the issue, by deleting every entry and allowing the DNS to populate once again.

What really fixed it, was adding a second NIC and re-registering it with ADC's (DNS servers), while removing any previous references to SQL in DNS management console.

Big thank you to all of you. I hope this information will be useful to someone with a similar problem.
0
 
LVL 29

Expert Comment

by:pwindell
Comment Utility
Ok, well the speed thing would have fixed transmission errors,...not collisions.  Switches don't have collisions because of the way they create Virtual Circuits.  Not a big thing, but you should be aware of that.
Adding a second nic is almost never a soultion to anything unless the first nic is removed or disabled,...and even then you can have trouble if the old nic was not set to use DHCP before it was removed.  So if what you did caused the machine to become dual-homed you may have to get ready for future problems.  Unless a machine is being used as a Router, Firewall, Proxy, or using Nic Teaming,...dual-homing is almost always a very bad thing.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

In this article, we will see the basic design consideration while designing a Multi-tenant web application in a simple manner. Though, many frameworks are available in the market to develop a multi - tenant application, but do they provide data, cod…
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
This tutorial will give a short introduction and overview of Backup Exec 2012 and how to navigate and perform basic functions. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as conne…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles to another domain controller. Log onto the new domain controller with a user account t…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now