Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

MTR packet loss analysis

Posted on 2013-02-03
7
Medium Priority
?
1,299 Views
Last Modified: 2013-03-10
Hi,

After some insight into the following MTR trace (host names obscured to protected the guilty ISP)

HOST: servername                             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. x.x.x.x                                  0.1%  1500    0.7   1.6   0.4 181.6  11.7
  2. some.router.provider.net                10.9%  1500    0.9   1.1   0.6 210.5   7.4
  3. target.server.com                        0.0%  1500    1.1   0.8   0.6   3.0   0.2

Open in new window


We have two servers with an ISP and occasionally have timeouts between them when requesting data from server-backend on server-frontend.

The application logs show all is fine with the backend server but the frontend logs show a timeout over the network connection within the timeframe of the mtr trace.

Anyway, we run mtr in continuous batches of 1500 packets to try and discover where the dropouts are occurring and managed to catch the above output.

To me, this indicates an issue with some.router.provider.net either with a fault or dropping mtr packets due to load.  Either way, its under load.

The ISP is saying this proves nothing because the last hop is showing no packet loss.

The question is, what is mtr actually showing here and is it useful or not in trying to determine why the end to end network timeout is happening?

Thanks
BT
0
Comment
Question by:brothertom
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 38857034
It depends on network topology. Is some.router.provider.net within the ISP? Can you post a diagram? (Ascii art will do)
0
 

Author Comment

by:brothertom
ID: 38857078
yes, within ISP network, both servers being at same ISP but on different networks

server1 > isp router > server2
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 38858436
So are you saying there is zero loss server1 <==> server2 but 10% loss server1 <==> router?
0
Optimum High-Definition Video Viewing and Control

The ATEN VM0404HA 4x4 4K HDMI Matrix Switch supports 4K resolutions of UHD (3840 x 2160) and DCI (4096 x 2160) with refresh rates of 30 Hz (4:4:4) and 60 Hz (4:2:0). It is ideal for applications where the routing of 4K digital signals is required.

 

Author Comment

by:brothertom
ID: 38858860
That is what it seems to be showing.
0
 
LVL 35

Assisted Solution

by:Duncan Roe
Duncan Roe earned 2000 total points
ID: 38861502
I really don't think the tool is helping you a lot. End-to-end response is what counts, and MTR reports it is fine. But at the application level you are experiencing timeouts - that is what you said right?
You need to run tcpdump or your favorite tool to determine whether the timeouts are associated with tcp retries. If not, the problem is at a higher level
0
 

Author Comment

by:brothertom
ID: 38872517
We monitor (via Nginx logs) the time taken for each call to the backend.
Generally we're looking at 4-6ms but during the times when MTR is showing timeouts in the middle of the route, we either get 200-2000ms shown or complete failure.

This would appear to indicate that the slow/failure is due to congestion on the network and according to the MTR trace, this would also appear to be at this middle routing stage.

Although the tcpdump tool is a good idea, these timeouts only occur for a few minutes every 2/3 weeks, but tricky to capture, unless we are able to setup tcpdump to run continuously, but only save stuff that is taking a long time.  Sounds like this will load the machine up quite a bit.
0
 
LVL 35

Accepted Solution

by:
Duncan Roe earned 2000 total points
ID: 38873726
How about running tcpdump -w output_file -C? (The file names are wrong way round for logrotate so you need to clean them up manually).
0

Featured Post

Portable, direct connect server access

The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Note: for this to work properly you need to use a Cross-Over network cable. 1. Connect both servers S1 and S2 on the second network slots respectively. Note that you can use the 1st slots but usually these would be occupied by the Service Provide…
As companies replace their old PBX phone systems with Unified IP Communications, many are finding out that legacy applications such as fax do not work well with VoIP. Fortunately, Cloud Faxing provides a cost-effective alternative that works over an…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question