Solved

network problem? server drops pings, then comes back. Every 2 minutes...

Posted on 2010-08-23
18
1,274 Views
Last Modified: 2012-05-10
We have two DHCP servers (Domain controllers) on our active directory. ADC1 is a Windows 2003 machine, ADC2 is Windows 2008.

One of our servers (SQL) keeps dropping connections. For example: for 2 minutes it will ping properly, and shares show up when accessed via the network (\\sql). Afterwards, it drops the connection, and becomes un-pingable. After a brief period, it comes back online, and responds to ping requests.

Some workstations on the network don't have the problem. Some constantly switch between working and non-working. Reboot of ADC's and the SQL itself does not help.

Computers connected to either ADC1 or ADC2 have the same problem. (for example one PC on ADC1 dhcp lease will pick up SQL wihout drops in connection, while another, on the same dhcp server, has the connection dropped).

SQL is on static ip, it's Event Viewer does not show any changes in network, services or drivers failure. No automatic updates, no expired licenses, no limited number of simultaneous users, no new ip address leases. Workstations also don't change IP's, even though they're on DHCP.

Any ideas on what could be causing SQL to drop and re-establish connections?

The only suspicious log I found on ADC1 was "The dns server has enountered numerous run-time events. To determine the initial cause of these run-time events, examine the DNS server event log entries that precede this event...." Event id:3000.

Unfortunately, the log does not show anything prior to this event.
0
Comment
Question by:94704
  • 6
  • 5
  • 2
  • +4
18 Comments
 
LVL 3

Expert Comment

by:Dave_LaSalle
ID: 33505860
On the server ping loopback with the -t switch
let it run for a while
use ctrl + Break keys to get stats of pings/loss
if it's dropping it may be your nic
if not then suspect dns
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505873
If you have any sort of anti-virus programs on the SQL server I would try to uninstall it and see if that helps (even if you have marked the SQL program folder to not be online scanned).
0
 

Expert Comment

by:castercorey
ID: 33505887
Are you using a megabyte switch or a gigbyte switch on your network?
0
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 

Author Comment

by:94704
ID: 33505908
The loopback pings on the server (on sql: ping sql -t) work fine. NIC is ok.

No antivirus on the server; nothing that would refresh it every 2 minutes or so.
0
 
LVL 29

Accepted Solution

by:
pwindell earned 145 total points
ID: 33505935
First,...a few misconceptions.  Computers are not "connected to DCs".    They startup, they authenticate,...then nothing...you can throw the DC out in the street and the workstation wouldn't know the difference unitl the next time authentication was needed for something.    DHCP - well by default the Client only connects to DHCP once at start up and once every 4 days if left running,...so DHCP is irrelevant.
DNS could be one of your problems (just a big guess).  
DC's should never be mulit-homed.
DNS Scheme should be:
1. All machines on the LAN use only the AD/DNS and nothing else,..ever.  That includes the DC's themselves and the Firewall Device
2. The external DNS (the ISP's?) should be listed as a Forwarder within the config of the DNS Services.  If omitted the DNS will default to using Root hints.
3. Examine the DNS Zone and delete any incorrect, duplicated, or wrong entries.  Most entries will be added automatically so I reccomend that you do not manually add any.  A fast way to make a machine re-register itself in DNS it to rightclick on the "connection" and tell it to do a repair.  You can also do this from a command prompt:
IPCONFIG /registerdns
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505954
On SQL server: ping 127.0.0.1 -t

at the same time:

On a host:
ping <IP to SQL> -t


Do both drops packets after ie. 2 minutes?
0
 
LVL 4

Assisted Solution

by:bhartwell
bhartwell earned 30 total points
ID: 33505961
Try restarting the DNS service on your server and see if that helps.
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505984
btw which HW is the SQL on?
0
 

Author Comment

by:94704
ID: 33506037
@ pwindell: I already tried ipconfig /registerdns. No errors in the Event Viewer.
@ snusgubben: yes, I tried pinging IP, pinging the hostname, and pinging loopback 127.0.0.1. No loss, no downtime. Responds to pings constantly. Basically, I'm convinced that NIC is fine.
@ bhartwell: both ADCs have been restarted a few times now, therefore DNS service on them has been restarted as well.
0
 

Author Comment

by:94704
ID: 33506051
Do you think re-joining the domain under another name, rebooting, after which re-joining it again as SQL would fix the problem?
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33506078
You said that it became un-pingable in your initial post?!

If you have 1 Gbit network cards on the SQL, you could try to lock it to 100 Mbit just to see if it is a negotiation problem.

Outdated NIC drivers?
0
 

Author Comment

by:94704
ID: 33506105
@ snusgubben: the machine becomes unpingable only by other workstations in the AD. For example: workstation1 can ping SQL. 100% works. 2 minutes later, the same workstation cannot ping SQL. 2 minutes later, workstation1 is able to ping SQL. And so on...
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33506210
If your SQL had a machine account problem the DC's would have logged it.

This sounds to me like a NIC / NIC driver / "1000 Mb vs 100 Mb problem" or some sort of anti-virus (firewall) problem.

The clients caches the FQDN and gets a kerberos ticket the first time they connect to the server. I doubt it's a domain problem (but who knows :)
0
 
LVL 6

Assisted Solution

by:Galtar99
Galtar99 earned 75 total points
ID: 33506523
Is your SQL server plugged into a managed switch?  If it is, can you look at the switch to see if it notices the link dropping?  (show log)  Make sure logging is set to informational.
If it isn't, do you see the link light go out?  Maybe try switching it to another port on the same switch to see if that port is going bad.  It may seem like an obvious thing, but update the NIC driver and ensure the server has all the latest windows updates.
Maybe hard code the switch and/or NIC to one speed 1000MB/Full or whatever you feel comfortable with.
0
 

Author Comment

by:94704
ID: 33506972
@ Galtar99: The server and the machine are both set to 100FDx, so that rules out any auto-config mismatches. Procurve (switch) log does not show any conflicts on the port or mac-address related conflicts.

@ snusgubben: Windows firewall is off/disabled. No other antivirus or firewall is present on the machine. Just updated the network card drivers (intel) to the most recent.

Still most workstations on the network loose connection to SQL every 2 minutes.

Odd: the connection with SQL (and ability to successfully ping) gets restored sooner, if the workstation1 user re-accesses the shared folder (\\sql). Nonetheless, the server drops connections and re-establishes them every 2 minutes.
0
 

Expert Comment

by:castercorey
ID: 33507071
configure your nic to match your network switch. so if your switch is a gigabyte switch than confiure it to full duplex at gigabyte rate.
0
 

Author Closing Comment

by:94704
ID: 33507933
With the help few of you and the suggestions I figured it out.

The problem was a combination of:

1.) Speed mismatch between the switch and SQL. Now, instead of "auto" on both ends, the server and switch are set to 100FDx (although both support a gigabit connection).

That fixed collisions and partially the dns issue.

2.) DNS entries. Over the years, one of the ADC's accumulated several static entries for "SQL". Going to DNS management console fixed the issue, by deleting every entry and allowing the DNS to populate once again.

What really fixed it, was adding a second NIC and re-registering it with ADC's (DNS servers), while removing any previous references to SQL in DNS management console.

Big thank you to all of you. I hope this information will be useful to someone with a similar problem.
0
 
LVL 29

Expert Comment

by:pwindell
ID: 33510737
Ok, well the speed thing would have fixed transmission errors,...not collisions.  Switches don't have collisions because of the way they create Virtual Circuits.  Not a big thing, but you should be aware of that.
Adding a second nic is almost never a soultion to anything unless the first nic is removed or disabled,...and even then you can have trouble if the old nic was not set to use DHCP before it was removed.  So if what you did caused the machine to become dual-homed you may have to get ready for future problems.  Unless a machine is being used as a Router, Firewall, Proxy, or using Nic Teaming,...dual-homing is almost always a very bad thing.
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

While rebooting windows server 2003 server , it's showing "active directory rebuilding indices please wait" at startup. It took a little while for this process to complete and once we logged on not all the services were started so another reboot is …
This article explains how to install and use the NTBackup utility that comes with Windows Server.
This tutorial will show how to push an installation of Backup Exec to an additional server in both 2012 and 2014 versions of the software. Click on the Backup Exec button in the upper left corner. From here, select Installation and Licensing, then I…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question