Improve company productivity with a Business Account.Sign Up

x
?
Solved

MTR packet loss analysis

Posted on 2013-02-03
7
Medium Priority
?
1,314 Views
Last Modified: 2013-03-10
Hi,

After some insight into the following MTR trace (host names obscured to protected the guilty ISP)

HOST: servername                             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. x.x.x.x                                  0.1%  1500    0.7   1.6   0.4 181.6  11.7
  2. some.router.provider.net                10.9%  1500    0.9   1.1   0.6 210.5   7.4
  3. target.server.com                        0.0%  1500    1.1   0.8   0.6   3.0   0.2

Open in new window


We have two servers with an ISP and occasionally have timeouts between them when requesting data from server-backend on server-frontend.

The application logs show all is fine with the backend server but the frontend logs show a timeout over the network connection within the timeframe of the mtr trace.

Anyway, we run mtr in continuous batches of 1500 packets to try and discover where the dropouts are occurring and managed to catch the above output.

To me, this indicates an issue with some.router.provider.net either with a fault or dropping mtr packets due to load.  Either way, its under load.

The ISP is saying this proves nothing because the last hop is showing no packet loss.

The question is, what is mtr actually showing here and is it useful or not in trying to determine why the end to end network timeout is happening?

Thanks
BT
0
Comment
Question by:brothertom
  • 4
  • 3
7 Comments
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 38857034
It depends on network topology. Is some.router.provider.net within the ISP? Can you post a diagram? (Ascii art will do)
0
 

Author Comment

by:brothertom
ID: 38857078
yes, within ISP network, both servers being at same ISP but on different networks

server1 > isp router > server2
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 38858436
So are you saying there is zero loss server1 <==> server2 but 10% loss server1 <==> router?
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 

Author Comment

by:brothertom
ID: 38858860
That is what it seems to be showing.
0
 
LVL 35

Assisted Solution

by:Duncan Roe
Duncan Roe earned 2000 total points
ID: 38861502
I really don't think the tool is helping you a lot. End-to-end response is what counts, and MTR reports it is fine. But at the application level you are experiencing timeouts - that is what you said right?
You need to run tcpdump or your favorite tool to determine whether the timeouts are associated with tcp retries. If not, the problem is at a higher level
0
 

Author Comment

by:brothertom
ID: 38872517
We monitor (via Nginx logs) the time taken for each call to the backend.
Generally we're looking at 4-6ms but during the times when MTR is showing timeouts in the middle of the route, we either get 200-2000ms shown or complete failure.

This would appear to indicate that the slow/failure is due to congestion on the network and according to the MTR trace, this would also appear to be at this middle routing stage.

Although the tcpdump tool is a good idea, these timeouts only occur for a few minutes every 2/3 weeks, but tricky to capture, unless we are able to setup tcpdump to run continuously, but only save stuff that is taking a long time.  Sounds like this will load the machine up quite a bit.
0
 
LVL 35

Accepted Solution

by:
Duncan Roe earned 2000 total points
ID: 38873726
How about running tcpdump -w output_file -C? (The file names are wrong way round for logrotate so you need to clean them up manually).
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Is your computer hacked? learn how to detect and delete malware in your PC
David Varnum recently wrote up his impressions of PRTG, based on a presentation by my colleague Christian at Tech Field Day at VMworld in Barcelona. Thanks David, for your detailed and honest evaluation!
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

585 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question