Diagnosing Odd Packet Loss
Posted on 2009-02-20
A recent problem with a client has me stumped so any advice would be appreciated.
1. The network topology looks like this:
T1 into Router into PowerConnect 48port switch in unmanaged mode.
Powerconnect into 2 other PowerConnects in other locations (24ports not managed)
Server1: SecondaryDC and DNS
Server2: Primary DC and DNS
Server 3: Application and DHCP
(we were brought into this case with no prior experience so no idea why DHCP is not with either of the DC's)
2. The problem:
A continuous ping to ANY internal resource has no packet loss. I can run it for hours. This is true of all the servers and LAN side devices.
A ping to our ISP Gateway (64.x.x.x.) or anything beyond that (internet) displays variable 50-100% packet loss. Surfing is occasionally doable. DNS lookups are quick even for non-internally cached addresses.
3. The steps I have taken thus far are:
1. Plug laptop directly into T1 line. No packet loss so ISP is not the problem.
2. Swapped routers. Symptoms are the same.
3. Swapped cable between router and T1. Symptoms are the same.
3. Began testing one patch cable at a time from the LAN side into a spare switch connected to the router. Certain PC's immediately introduce the packet loss while others don't. For example, Server2 does not introduce this behavior, but 1 and 3 do.
4. In the course of troublehshooting, numerous viruses including a new one containing wmisys.exe and wmisync.exe was discovered on some machines. TrendMicro Enterprise AV claims to have removed it and the processes under the wmisys.exe and wmisync.exe names are no longer running. All other viruses were cleaned and quarantined successfully.
The item that is boggling my mind is how can a LAN device create packet loss to our ISP gateway but not any internal device? I would think if there were packet storming, viruses, bad cables, wet fiber repeaters, bad NIC's, etc. causing the problem, there would be internal connectivty loss too. The DNS zones are set correctly (forward and reverse), there's no DHCP error messages about duplicate addresses, the scope options are set correctly, etc. It really is driving me bonkers.
I have not discovered a common thread amongst the devices (win2k, winxp, win2k3) that cause this behavior and ones that don't.
Thanks very much for any advice or suggestions. I'm preparing a kit including cable testers and Wireshark for Monday but wanted to go in as fully armed as possible.