Diagnosing Odd Packet Loss

A recent problem with a client has me stumped so any advice would be appreciated.

1.  The network topology looks like this:
T1 into Router into PowerConnect 48port switch in unmanaged mode.
Powerconnect into 2 other PowerConnects in other locations (24ports not managed)
Server1:  SecondaryDC and DNS
Server2:  Primary DC and DNS
Server 3:  Application and DHCP
(we were brought into this case with no prior experience so no idea why DHCP is not with either of the DC's)


2.  The problem:
A continuous ping to ANY internal resource has no packet loss.  I can run it for hours.  This is true of all the servers and LAN side devices.
A ping to our ISP Gateway (64.x.x.x.) or anything beyond that (internet) displays variable 50-100% packet loss.  Surfing is occasionally doable.  DNS lookups are quick even for non-internally cached addresses.

3.  The steps I have taken thus far are:
1.  Plug laptop directly into T1 line.  No packet loss so ISP is not the problem.
2.  Swapped routers.  Symptoms are the same.
3.  Swapped cable between router and T1.  Symptoms are the same.
3.  Began testing one patch cable  at a time from the LAN side into a spare switch connected to the router.  Certain PC's immediately introduce the packet loss while others don't.  For example, Server2 does not introduce this behavior, but 1 and 3 do.
4.  In the course of troublehshooting, numerous viruses including a new one containing wmisys.exe and wmisync.exe was discovered on some machines.  TrendMicro Enterprise AV claims to have removed it and the processes under the wmisys.exe and wmisync.exe names are no longer running.  All other viruses were cleaned and quarantined successfully.

The item that is boggling my mind is how can a LAN device create packet loss to our ISP gateway but not any internal device?  I would think if there were packet storming, viruses, bad cables, wet fiber repeaters, bad NIC's,  etc. causing the problem, there would be internal connectivty loss too.  The DNS zones are set correctly (forward and reverse), there's no DHCP error messages about duplicate addresses, the scope options are set correctly, etc.  It really is driving me bonkers.

I have not discovered a common thread amongst the devices (win2k, winxp, win2k3) that cause this behavior and ones that don't.

Thanks very much for any advice or suggestions.  I'm preparing a kit including cable testers and Wireshark for Monday but wanted to go in as fully armed as possible.
cwhite2812Asked:
Who is Participating?
 
MysidiaCommented:
Congestion can cause packet loss.   If a link is at capacity, and you attempt to send more packets across the link than the link can accept, some packets will be lost.

This could be caused by a virus, or by normal usage,  if users are in aggregate requesting more data than the T1 can carry.

You generally have more bandwidth available on your LAN than on your WAN.

If a PC is spewing forth 2megabits per second of data out to the internet, that could easily create packet loss over your T1 link,  as you only have 1.5mbits of upstream bandwidth available.

But your 10 or 100 megabit LAN has much more capacity than your WAN link, so it would not necessarily suffer at all in that case.


Not all viruses create storms at maximum speeds; some try to send spam e-mail messages, as quickly as the T1 is capable of.
0
 
MaerosCommented:
Definitely check for saturation and congestion as Mysidia suggests.

If that doesn't work, try the following:

Isolate the problem area by going to each individual node/device along the line, starting directly from the Internet and working back into the internal network, and generate/listen to the traffic (ie. Wireshark).  This includes even potentially "dumb" devices, such as the unmanaged switches.  This is because even devices such as switches can either have a bad port, or the switch itself malfunctions and can start to babble.  You may need to enable management and check the logs and/or SNMP.  When listening to traffic, take a look and see where some of these potentially bad packets originate from (check the source IPs and MAC addresses).  From there it becomes a process of elimination until you find the culprit.  It is very well possible to have either a bad device or NIC somewhere.
0
 
cwhite2812Author Commented:
That is a good point about the WAN leg becoming congested and the LAN leg remaining unharmed more or less.  Do you have any tips or links to a Wireshark guide that would let me know what would look normal and what would not?  I ran it from my laptop very briefly as part of my first troubleshooting step, but didn't see anything that struck me as being unusual.  I have the capture saved so I can look it up and spend more time with it in a non-emergency situation.

I was hoping to avoid the port-by-port autopsy, but it sounds like that is the correct course of action for whatever length of time it takes.

Thanks very much,
Chris


0
 
cwhite2812Author Commented:
It may be worth adding that the log from the router (outgoing requests) was strictly regular websites (google.com, dell.com, au.microsoftupdate.com) and not odd SMTP or anything that didn't look legit.  If something was overloading the WAN link of a malicious nature, isn't it more likely I'd see those packets heading out?
0
 
cwhite2812Author Commented:
Mysidia had it correct.  The 2 PC's that didn't get virus protection on them on Friday because they were owned by other vendors had the virus and were broadcasting 445 traffic.  Dropping all 445 traffic going from the LAN subnet to the WAN subnet restored connectivity and allowed us to finish cleaning and quarantining.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.