[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Diagnosing Odd Packet Loss

Posted on 2009-02-20
5
Medium Priority
?
1,277 Views
Last Modified: 2012-06-22
A recent problem with a client has me stumped so any advice would be appreciated.

1.  The network topology looks like this:
T1 into Router into PowerConnect 48port switch in unmanaged mode.
Powerconnect into 2 other PowerConnects in other locations (24ports not managed)
Server1:  SecondaryDC and DNS
Server2:  Primary DC and DNS
Server 3:  Application and DHCP
(we were brought into this case with no prior experience so no idea why DHCP is not with either of the DC's)


2.  The problem:
A continuous ping to ANY internal resource has no packet loss.  I can run it for hours.  This is true of all the servers and LAN side devices.
A ping to our ISP Gateway (64.x.x.x.) or anything beyond that (internet) displays variable 50-100% packet loss.  Surfing is occasionally doable.  DNS lookups are quick even for non-internally cached addresses.

3.  The steps I have taken thus far are:
1.  Plug laptop directly into T1 line.  No packet loss so ISP is not the problem.
2.  Swapped routers.  Symptoms are the same.
3.  Swapped cable between router and T1.  Symptoms are the same.
3.  Began testing one patch cable  at a time from the LAN side into a spare switch connected to the router.  Certain PC's immediately introduce the packet loss while others don't.  For example, Server2 does not introduce this behavior, but 1 and 3 do.
4.  In the course of troublehshooting, numerous viruses including a new one containing wmisys.exe and wmisync.exe was discovered on some machines.  TrendMicro Enterprise AV claims to have removed it and the processes under the wmisys.exe and wmisync.exe names are no longer running.  All other viruses were cleaned and quarantined successfully.

The item that is boggling my mind is how can a LAN device create packet loss to our ISP gateway but not any internal device?  I would think if there were packet storming, viruses, bad cables, wet fiber repeaters, bad NIC's,  etc. causing the problem, there would be internal connectivty loss too.  The DNS zones are set correctly (forward and reverse), there's no DHCP error messages about duplicate addresses, the scope options are set correctly, etc.  It really is driving me bonkers.

I have not discovered a common thread amongst the devices (win2k, winxp, win2k3) that cause this behavior and ones that don't.

Thanks very much for any advice or suggestions.  I'm preparing a kit including cable testers and Wireshark for Monday but wanted to go in as fully armed as possible.
0
Comment
Question by:cwhite2812
  • 3
5 Comments
 
LVL 23

Accepted Solution

by:
Mysidia earned 500 total points
ID: 23697943
Congestion can cause packet loss.   If a link is at capacity, and you attempt to send more packets across the link than the link can accept, some packets will be lost.

This could be caused by a virus, or by normal usage,  if users are in aggregate requesting more data than the T1 can carry.

You generally have more bandwidth available on your LAN than on your WAN.

If a PC is spewing forth 2megabits per second of data out to the internet, that could easily create packet loss over your T1 link,  as you only have 1.5mbits of upstream bandwidth available.

But your 10 or 100 megabit LAN has much more capacity than your WAN link, so it would not necessarily suffer at all in that case.


Not all viruses create storms at maximum speeds; some try to send spam e-mail messages, as quickly as the T1 is capable of.
0
 
LVL 7

Expert Comment

by:Maeros
ID: 23698432
Definitely check for saturation and congestion as Mysidia suggests.

If that doesn't work, try the following:

Isolate the problem area by going to each individual node/device along the line, starting directly from the Internet and working back into the internal network, and generate/listen to the traffic (ie. Wireshark).  This includes even potentially "dumb" devices, such as the unmanaged switches.  This is because even devices such as switches can either have a bad port, or the switch itself malfunctions and can start to babble.  You may need to enable management and check the logs and/or SNMP.  When listening to traffic, take a look and see where some of these potentially bad packets originate from (check the source IPs and MAC addresses).  From there it becomes a process of elimination until you find the culprit.  It is very well possible to have either a bad device or NIC somewhere.
0
 

Author Comment

by:cwhite2812
ID: 23698581
That is a good point about the WAN leg becoming congested and the LAN leg remaining unharmed more or less.  Do you have any tips or links to a Wireshark guide that would let me know what would look normal and what would not?  I ran it from my laptop very briefly as part of my first troubleshooting step, but didn't see anything that struck me as being unusual.  I have the capture saved so I can look it up and spend more time with it in a non-emergency situation.

I was hoping to avoid the port-by-port autopsy, but it sounds like that is the correct course of action for whatever length of time it takes.

Thanks very much,
Chris


0
 

Author Comment

by:cwhite2812
ID: 23698657
It may be worth adding that the log from the router (outgoing requests) was strictly regular websites (google.com, dell.com, au.microsoftupdate.com) and not odd SMTP or anything that didn't look legit.  If something was overloading the WAN link of a malicious nature, isn't it more likely I'd see those packets heading out?
0
 

Author Comment

by:cwhite2812
ID: 23735570
Mysidia had it correct.  The 2 PC's that didn't get virus protection on them on Friday because they were owned by other vendors had the virus and were broadcasting 445 traffic.  Dropping all 445 traffic going from the LAN subnet to the WAN subnet restored connectivity and allowed us to finish cleaning and quarantining.
0

Featured Post

When ransomware hits your clients, what do you do?

MSPs: Endpoint security isn’t enough to prevent ransomware.
As the impact and severity of crypto ransomware attacks has grown, Webroot fought back, not just by building a next-gen endpoint solution capable of preventing ransomware attacks but also by being a thought leader.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Quality of Service (QoS) options are nearly endless when it comes to networks today. This article is merely one example of how it can be handled in a hub-n-spoke design using a 3-tier configuration.
In this article, the configuration steps in Zabbix to monitor devices via SNMP will be discussed with some real examples on Cisco Router/Switch, Catalyst Switch, NAS Synology device.
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
Suggested Courses
Course of the Month19 days, 13 hours left to enroll

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question