• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2125
  • Last Modified:

short-term packet loss

I'm having issues with my network. Machines seem to randomly drop off line for a moment. I have run Intermapper and EtherPeek to view and capture network traffic.

Intermapper tells me machines are having "short-term packet loss". I don't know much about packet capturing so I don't know what to look for or how to make sense of all this. How can I track down what is causing packet loss.

Please help in simple terms. Thanks.
0
lenivan
Asked:
lenivan
  • 9
  • 9
  • 2
  • +1
1 Solution
 
giltjrCommented:
Do you have managed switches?  If so do the logs show anything?  Assuming you are running Windows, does the Event Viewer show loss of network connectivity?
0
 
ormerodrutterCommented:
If you are using any hubs (i.e. those cheap box that extends you network connectivity by allowing you to plug 4-8 cables into one socket), then thats the first thing I would look at. I had this problem before as my Outlook occasionally drops connection to Exchange and it was casued by a faulty port on the hub.
0
 
lenivanAuthor Commented:
I have replaced all my managed switches. They did not and do not show any connection drops. On the workstations, the only thing reported in the event viewer is a disconnect and reconnect to the exchange server.

I've disconnected all unmanaged switches, but the problem persists. I ran a packet capture and a bad packet was reported. The details are below. Does this tell you anything? What else should I try?

  Flags:        0x00
  Status:       0x00
  Packet Length:74
  Timestamp:    13:00:40.836920 08/04/2007
Ethernet Header
  Destination:          00:12:3F:E9:65:64
  Source:               00:20:A6:5B:82:63
  Protocol Type:        0x0800  IP
IP Header - Internet Protocol Datagram
  Version:              4
  Header Length:        5  (20  bytes)
  Type of Service:      %00000000
                        000. .... Precedence: Routine,
                        ...0 .... Normal Delay,
                        .... 0... Normal Throughput,
                        .... .0.. Normal Reliability
                        .... ..0. ECT bit - transport protocol will ignore the CE bit
                        .... ...0 CE bit - no congestion
  Total Length:         56
  Identifier:           9503
  Fragmentation Flags:  %000
                        0.. Reserved
                        .0. May Fragment
                        ..0 Last Fragment
  Fragment Offset:      0  (0  bytes)
  Time To Live:         64
  Protocol:             1  ICMP - Internet Control Message Protocol
  Header Checksum:      0xDD39
  Source IP Address:    192.168.123.230
  Dest. IP Address:     192.168.123.53  xxx.xxx.com
  No IP Options
ICMP - Internet Control Messages Protocol
  ICMP Type:            3  Destination Unreachable
  Code:                 3  Port Unreachable
  Checksum:             0xB83A
  Unused (must be zero):0x00000000

Header of packet that caused error follows.
IP Header - Internet Protocol Datagram
  Version:              4
  Header Length:        5  (20  bytes)
  Type of Service:      %00000000
                        000. .... Precedence: Routine,
                        ...0 .... Normal Delay,
                        .... 0... Normal Throughput,
                        .... .0.. Normal Reliability
                        .... ..0. ECT bit - transport protocol will ignore the CE bit
                        .... ...0 CE bit - no congestion
  Total Length:         78
  Identifier:           15233
  Fragmentation Flags:  %000
                        0.. Reserved
                        .0. May Fragment
                        ..0 Last Fragment
  Fragment Offset:      0  (0  bytes)
  Time To Live:         128
  Protocol:             17  UDP - User Datagram Protocol
  Header Checksum:      0x0000
  Source IP Address:    192.168.123.53  xxx.xxx.com
  Dest. IP Address:     192.168.123.230
  No IP Options
UDP - User Datagram Protocol
  Source Port:          51888
  Destination Port:     137  netbios-ns
  Length:               58
  Checksum:             0x0000
FCS - Frame Check Sequence
  FCS (Calculated):     0x8756A4A1
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
giltjrCommented:
What was bad about the packet?  The only thing I see off hand is that the checksum is zero.  However that would be perfectly normal if this packet was being sent by the computer the trace was being done on and that computer had checksum offloading enabled.  

In that case the IP stack will not calculate the checksum, it lets the NIC do it and so when you caputure the packet check sum has not been caclulated yet.

The only other thing I see that could be considered a possible problem is that on the ICMP packet the type is dest not reachable, but the source and dest. IP addresses are addresses that are normally within the same subnet, 192.168.123.230 - source and 192.168.123.53 dest.  Typically the subnet mask would be 255.255.255.0 and show these would be in the same subnet and thus you would not expect to see a destination unreachable.
0
 
lenivanAuthor Commented:
The reason I stated this is a bad packet is because Ethereal reported it as a bad packet. I don't know much about decipering packets, so I'm only reporting what the program told me.

As for your last comment... If the ICMP packet is showing dest unreachable, what could be a possible cause of that? What should I look for and where?
0
 
jotase74Commented:
The packet says both destination unreachable, but also port unreachable.  This could happen if you were behind a firewall or proxy, and that port is shut down.  If you are behind a proxy, try pinging from the proxy and see if the computers are still unable to be reached.
0
 
jotase74Commented:
also, if it windows, try doing an: "nbtstat -a 192.168.123.53" from the workstation you are at and see if the MAC address matches up, and is indeed "00:12:3F:E9:65:64"
0
 
giltjrCommented:
You may want to upgrade from Ethereal to Wireshark (http://www.wireshark.com).  Wireshark is "ethereal" but the developer of Ethereal moved to another company and his original company had the copyright on the name ethereal so he changed the name.

Eitherway, it should tell you why it is bad.
0
 
lenivanAuthor Commented:
jotase74,
I did an nbtstat and all is correct. Also, I am not behind a proxy. The only firewall we have up and running is our Sonicwall.

I installed and ran Wireshark, but since I don't know much about packet captures, I don't know what to look for.

I did find one thing that looked odd. There is 1 machine that is trying to reference a server (middleserver) that is no longer on our network. The line from the capture is as follows:

14      3.556912      192.168.123.45      192.168.123.255      NBNS      Name query NB MIDDLESERVER<00>

I still have machines randomly dropping offline for a moment and reporting a short-term packet loss from 2-4%. What should I look for in my packet captures? How can I determine why these machiines are having a packet loss?

0
 
lenivanAuthor Commented:
Would anyone be willing to look at my packet capture to see if there is anything out of the ordinary? I can email it to you.

I am completely lost as to why this is happening to my network.
0
 
giltjrCommented:
Can you define what you mean by "dropping offline?"  How can you tell that something is going "offline"?

What is reporting packet loss?

Poor performance of an application does not mean something is going offline, nor does it mean that there is packet loss.

You need to find out what 192.168.123.45.

There is a site where you can use your EE userid and password to post a packet capture.  I can't remeber what it is right now. I will search and post what it is once I find it.
0
 
lenivanAuthor Commented:
What I mean by dropping offline is that various workstations as well as my servers report dropped packets. When this occurs, my workstations lose connectivity to Exchange as well as documents on the server. This only happens for a moment, then all are reconnected. I ran an IP monitor and it showed various machines dropping packets stating "short-term packet loss".The NIC cards as well as the managed switches do not show and port or connection drops. So something is preventing packets from going through normally.

192.168.123.45 is a workstation, but it was installed after Middleserver was removed. So why would it reference that server?

Please post the site where I can upload my packet capture in hopes of someone being able to decipher it. Thanks for all your help.
0
 
giltjrCommented:
Dropped packets does not necessary mean they are going offline.  To me offline would mean that you get notified that your NIC has lost connectivity.

What is reporting dropped packets? I have never had any OS tell me that packets were dropped.

If you ever used lmhosts file it could be a left over procedure that says to put it there or somebody copied a old lmhosts file.   Could have been a product that was installed with incorrect options, or many other possibilities.  The only thing I would suggest is doing a search in a registry for "Middleserver".

The trace won't tell you why the packets are being dropped, it will only confirm that they are being dropped.  However it will only confirm it if it were and inbound packet.

Just to make sure, the dropped packets are packets that are to/from servers on your LAN?  Not to/from servers that are connected via a WAN or the Internet.

What type of switches?  If they are Cisco you can do show int "int name" for each interface and see if any interfaces show any flushed packets.
0
 
giltjrCommented:
Link to upload files:

http://www.ee-stuff.com/
0
 
lenivanAuthor Commented:
I have run a 30 minute packet capture and have uploaded it to ee-stuff.com. The file can be viewed at this link: https://filedb.experts-exchange.com/incoming/ee-stuff/4263-Packet-Capture.zip

Please take a look and let me know if you see anything out of the ordinary.
0
 
lenivanAuthor Commented:
To answer your previous post,

When I say offline, I mean the workstation loses its connectivity to Exchange as well as to all shared documents on the server.

I use a program called Intermapper which monitors my network and it reports everytime there is a dropped packet going to a machine. Yes, the packets are dropped to and from servers on my LAN. The WAN is not affected.

My switches are Dell 3448 Managed Switches. What exactly is "Show Int"?

I have not checked the lmhosts file, but will do so when I get into the office tomorrow.
0
 
lenivanAuthor Commented:
One other question, what info should an lmhosts file contain?

I'm looking at an lmhost file on one of my workstations and all it says is:
127.0.0.1       localhost

I'm not sure what 127.0.0.1 is, but it has nothing to do with our network or ISP.
0
 
giltjrCommented:
I will need to look at the trace tomorrow sometime.  It looks like you looked at the hosts file, not the lmhosts file.  Typically computers don't have a lmhosts files.  MS ships a lmhosts.sam.  However if you have a lmhosts file it can cause major problems if you don't keep it up.

Something on that computer is knows about the host.

I would have to check the Dell switch doc.  "Show int" is short for "show interface".  I believe that dell as something like that.
0
 
giltjrCommented:
I have taken an initial look and most of the packets that are in error are from 192.168.12.10.  And all of the errors seem to be check sum errors.  So I am going to assume that this is the computer you did the trace from and that this computer had check sum offload enabled.

I would say off hand that 192.168.12.53 and 192.168.12.9 are two of your major servers and 192.168.12.14 also seems to be a server.  This is because they are constantly sending out ARP's to find the IP addresses of other computers on your network.

Although there is not a lot of them there are some packets that seem to have been lost.  However I am talking about 20 out of over 8,000, that is 0.002% which real LOW.

You have at least 1 device on the network that is configured to use IPX.  Not an issue, but if you don't need it, then disable it.

The biggest issue is I see is a bit unusual.  It seems that 192.168.12.10 keeps making DNS queries to 192.168.12.14.  The weird part is that 192.168.12.14 responds with a ICMP port 53 unreachable, but then also responds with a response to the DNS query.  There was a total of 104 DNS  queries, 102 to ".14" and 2 to ".251".  ".14" responded 135 times that port 53 was not reachable.  However ".14" responded 97 times to DNS queries, and ".251" responded once.  It makes not sense that ".14" says port 53 is not reachable and then it responds.  It sounds almost like this server is having serious performance problems.

That makes no sense.  This I will have to do more research on.  


Other than that, everything looks fairly normal.
0
 
lenivanAuthor Commented:
Major thanks for the feedback. To clarify what each machine is:

.10 used to be the DC when all the problems began. I have since moved all the FSMO roles to a new server (.14), then wiped .10 clean and did a clean install. It is now a file server.

.14 is the primary DC with DNS, DHCP and WINS.

.9 is a file server.

.53 is just a workstation I was using to ping test my network and run a packet capture (as well as .10).

I don't understand why .14 is responding with ICMP port 53 unreachable. I can't imagine this server having performance issues since it is only 4 months old and the problems occured before I ever put this new server in place. I will do a packet sniff on port 53 and see what I come up with and will report back.

Your help is grealy appreciated. Any additional info you find will be of great help.
0
 
giltjrCommented:
It's now how old the server is, it is how busy it is that causes performance problems.  Although I would find it hard to overwhelm any recently purchased (last 3 years) server that is just DC/DNS/DHCP/WINS in most enviroment today.  

It looks almost like there is something on the network that is configured to spoof MAC addresses and send out ICMP port unreachable messages.

What I would suggest is that you run a packet capture from .14, see if it is in fact sending out the port unreachable messages.  If .14 is sending these out, then you have to start digging there.

If .14 is NOT sending these out, then it looks like you have something on your network that is configured to monitor at least traffic to .14 and send out port unreachable messages for UDP 53.   That will be real fun to track down. :)
0

Featured Post

Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

  • 9
  • 9
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now