Solved

network problem? server drops pings, then comes back. Every 2 minutes...

Posted on 2010-08-23
18
1,307 Views
Last Modified: 2012-05-10
We have two DHCP servers (Domain controllers) on our active directory. ADC1 is a Windows 2003 machine, ADC2 is Windows 2008.

One of our servers (SQL) keeps dropping connections. For example: for 2 minutes it will ping properly, and shares show up when accessed via the network (\\sql). Afterwards, it drops the connection, and becomes un-pingable. After a brief period, it comes back online, and responds to ping requests.

Some workstations on the network don't have the problem. Some constantly switch between working and non-working. Reboot of ADC's and the SQL itself does not help.

Computers connected to either ADC1 or ADC2 have the same problem. (for example one PC on ADC1 dhcp lease will pick up SQL wihout drops in connection, while another, on the same dhcp server, has the connection dropped).

SQL is on static ip, it's Event Viewer does not show any changes in network, services or drivers failure. No automatic updates, no expired licenses, no limited number of simultaneous users, no new ip address leases. Workstations also don't change IP's, even though they're on DHCP.

Any ideas on what could be causing SQL to drop and re-establish connections?

The only suspicious log I found on ADC1 was "The dns server has enountered numerous run-time events. To determine the initial cause of these run-time events, examine the DNS server event log entries that precede this event...." Event id:3000.

Unfortunately, the log does not show anything prior to this event.
0
Comment
Question by:94704
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
  • +4
18 Comments
 
LVL 3

Expert Comment

by:Dave_LaSalle
ID: 33505860
On the server ping loopback with the -t switch
let it run for a while
use ctrl + Break keys to get stats of pings/loss
if it's dropping it may be your nic
if not then suspect dns
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505873
If you have any sort of anti-virus programs on the SQL server I would try to uninstall it and see if that helps (even if you have marked the SQL program folder to not be online scanned).
0
 

Expert Comment

by:castercorey
ID: 33505887
Are you using a megabyte switch or a gigbyte switch on your network?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:94704
ID: 33505908
The loopback pings on the server (on sql: ping sql -t) work fine. NIC is ok.

No antivirus on the server; nothing that would refresh it every 2 minutes or so.
0
 
LVL 29

Accepted Solution

by:
pwindell earned 145 total points
ID: 33505935
First,...a few misconceptions.  Computers are not "connected to DCs".    They startup, they authenticate,...then nothing...you can throw the DC out in the street and the workstation wouldn't know the difference unitl the next time authentication was needed for something.    DHCP - well by default the Client only connects to DHCP once at start up and once every 4 days if left running,...so DHCP is irrelevant.
DNS could be one of your problems (just a big guess).  
DC's should never be mulit-homed.
DNS Scheme should be:
1. All machines on the LAN use only the AD/DNS and nothing else,..ever.  That includes the DC's themselves and the Firewall Device
2. The external DNS (the ISP's?) should be listed as a Forwarder within the config of the DNS Services.  If omitted the DNS will default to using Root hints.
3. Examine the DNS Zone and delete any incorrect, duplicated, or wrong entries.  Most entries will be added automatically so I reccomend that you do not manually add any.  A fast way to make a machine re-register itself in DNS it to rightclick on the "connection" and tell it to do a repair.  You can also do this from a command prompt:
IPCONFIG /registerdns
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505954
On SQL server: ping 127.0.0.1 -t

at the same time:

On a host:
ping <IP to SQL> -t


Do both drops packets after ie. 2 minutes?
0
 
LVL 4

Assisted Solution

by:bhartwell
bhartwell earned 30 total points
ID: 33505961
Try restarting the DNS service on your server and see if that helps.
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33505984
btw which HW is the SQL on?
0
 

Author Comment

by:94704
ID: 33506037
@ pwindell: I already tried ipconfig /registerdns. No errors in the Event Viewer.
@ snusgubben: yes, I tried pinging IP, pinging the hostname, and pinging loopback 127.0.0.1. No loss, no downtime. Responds to pings constantly. Basically, I'm convinced that NIC is fine.
@ bhartwell: both ADCs have been restarted a few times now, therefore DNS service on them has been restarted as well.
0
 

Author Comment

by:94704
ID: 33506051
Do you think re-joining the domain under another name, rebooting, after which re-joining it again as SQL would fix the problem?
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33506078
You said that it became un-pingable in your initial post?!

If you have 1 Gbit network cards on the SQL, you could try to lock it to 100 Mbit just to see if it is a negotiation problem.

Outdated NIC drivers?
0
 

Author Comment

by:94704
ID: 33506105
@ snusgubben: the machine becomes unpingable only by other workstations in the AD. For example: workstation1 can ping SQL. 100% works. 2 minutes later, the same workstation cannot ping SQL. 2 minutes later, workstation1 is able to ping SQL. And so on...
0
 
LVL 21

Expert Comment

by:snusgubben
ID: 33506210
If your SQL had a machine account problem the DC's would have logged it.

This sounds to me like a NIC / NIC driver / "1000 Mb vs 100 Mb problem" or some sort of anti-virus (firewall) problem.

The clients caches the FQDN and gets a kerberos ticket the first time they connect to the server. I doubt it's a domain problem (but who knows :)
0
 
LVL 6

Assisted Solution

by:Galtar99
Galtar99 earned 75 total points
ID: 33506523
Is your SQL server plugged into a managed switch?  If it is, can you look at the switch to see if it notices the link dropping?  (show log)  Make sure logging is set to informational.
If it isn't, do you see the link light go out?  Maybe try switching it to another port on the same switch to see if that port is going bad.  It may seem like an obvious thing, but update the NIC driver and ensure the server has all the latest windows updates.
Maybe hard code the switch and/or NIC to one speed 1000MB/Full or whatever you feel comfortable with.
0
 

Author Comment

by:94704
ID: 33506972
@ Galtar99: The server and the machine are both set to 100FDx, so that rules out any auto-config mismatches. Procurve (switch) log does not show any conflicts on the port or mac-address related conflicts.

@ snusgubben: Windows firewall is off/disabled. No other antivirus or firewall is present on the machine. Just updated the network card drivers (intel) to the most recent.

Still most workstations on the network loose connection to SQL every 2 minutes.

Odd: the connection with SQL (and ability to successfully ping) gets restored sooner, if the workstation1 user re-accesses the shared folder (\\sql). Nonetheless, the server drops connections and re-establishes them every 2 minutes.
0
 

Expert Comment

by:castercorey
ID: 33507071
configure your nic to match your network switch. so if your switch is a gigabyte switch than confiure it to full duplex at gigabyte rate.
0
 

Author Closing Comment

by:94704
ID: 33507933
With the help few of you and the suggestions I figured it out.

The problem was a combination of:

1.) Speed mismatch between the switch and SQL. Now, instead of "auto" on both ends, the server and switch are set to 100FDx (although both support a gigabit connection).

That fixed collisions and partially the dns issue.

2.) DNS entries. Over the years, one of the ADC's accumulated several static entries for "SQL". Going to DNS management console fixed the issue, by deleting every entry and allowing the DNS to populate once again.

What really fixed it, was adding a second NIC and re-registering it with ADC's (DNS servers), while removing any previous references to SQL in DNS management console.

Big thank you to all of you. I hope this information will be useful to someone with a similar problem.
0
 
LVL 29

Expert Comment

by:pwindell
ID: 33510737
Ok, well the speed thing would have fixed transmission errors,...not collisions.  Switches don't have collisions because of the way they create Virtual Circuits.  Not a big thing, but you should be aware of that.
Adding a second nic is almost never a soultion to anything unless the first nic is removed or disabled,...and even then you can have trouble if the old nic was not set to use DHCP before it was removed.  So if what you did caused the machine to become dual-homed you may have to get ready for future problems.  Unless a machine is being used as a Router, Firewall, Proxy, or using Nic Teaming,...dual-homing is almost always a very bad thing.
0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
This article provides a convenient collection of links to Microsoft provided Security Patches for operating systems that have reached their End of Life support cycle. Included operating systems covered by this article are Windows XP,  Windows Server…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question