Link to home
Start Free TrialLog in
Avatar of piecealava
piecealava

asked on

exterem latency in trace route

We are using a hosted application for financial and scheduling. It is very slow.
When i spoke to the host and software provider they kicked back trace route showing latency of over 4000ms. I trace route to a random website (gatorsport.com) Came back even worse with latency as high as 4294963389 ms.
I sent these to my ISP provider. Mitel who is reselling XO. we have a single 1.5 T1. There answer was that we are over loading the T1.
I am running PRTG tracker and do see bandwidth usage some times hit the 1500 cap but only for short periods of time and not that often.

SO my question is the ISP correct or are they just trying to sell my another T1
Here are the trace route i spoke of. We are considering spending the cash for a second bonded T1, but I want to make sure that additional bandwidth will really solve this before i tell me boss to spend another $400 a month.
here are the trace routes.

Tracing route to gatorsports.com [205.139.40.99]

over a maximum of 30 hops:



  1    15 ms  3940 ms    <1 ms  ip67-154-174-65.z174-154-67.customer.algx.net [67.154.174.65]

  2     8 ms  3951 ms    14 ms  ip67-94-210-13.z210-94-67.customer.algx.net [67.94.210.13]

  3  4294963374 ms    13 ms    13 ms  ge5-2-0d502.mar1.seattle-wa.us.xo.net [71.5.183.81]

  4  4294963370 ms  4294963372 ms  3953 ms  ae1d0.mcr2.seattle-wa.us.xo.net [216.156.1.98]

  5  3966 ms  3967 ms    27 ms  vb1811.rar3.sanjose-ca.us.xo.net [216.156.0.197]

  6  4294963389 ms    35 ms  3966 ms  207.88.12.178.ptr.us.xo.net [207.88.12.178]

  7  3967 ms    22 ms  4294963385 ms  206.111.12.170.ptr.us.xo.net [206.111.12.170]

  8  4294963386 ms    28 ms  3961 ms  pr1-so-6-1-0.paloaltopaix.savvis.net [204.70.199.113]

  9  3966 ms    21 ms    28 ms  cr2-pos-0-0-5-0.sanfrancisco.savvis.net [204.70.200.194]

 10    78 ms    77 ms  4015 ms  cr1-pos-0-0-2-0.atlanta.savvis.net [204.70.192.73]

 11    70 ms   208 ms  4107 ms  hr1-te-1-0-0.atlantaat1.savvis.net [204.70.197.157]

 12  4015 ms  4015 ms    76 ms  das11-v3017.at1.savvis.net [205.139.48.98]

 13  4294963437 ms    76 ms  4016 ms  205.139.40.99

Tracing route to 69.74.112.75 over a maximum of 30 hops



  1    15 ms  3939 ms  3938 ms  ip67-154-174-65.z174-154-67.customer.algx.net [67.154.174.65]

  2  3941 ms  3940 ms  3940 ms  ip67-94-210-13.z210-94-67.customer.algx.net [67.94.210.13]

  3  3941 ms  3941 ms  3940 ms  ge5-2-0d502.mar1.seattle-wa.us.xo.net [71.5.183.81]

  4    69 ms    68 ms  4006 ms  vb1810.rar3.seattle-wa.us.xo.net [216.156.0.193]

  5    68 ms    68 ms  4006 ms  te-4-0-0.rar3.denver-co.us.xo.net [207.88.12.82]

  6  4006 ms    92 ms  4006 ms  te-4-1-0.rar3.chicago-il.us.xo.net [207.88.12.21]

  7    69 ms    69 ms  4008 ms  vb24.rar3.washington-dc.us.xo.net [207.88.12.34]

  8  4031 ms  4005 ms  4005 ms  ae0d0.mcr1.newark-nj.us.xo.net [216.156.0.22]

  9  4005 ms  4005 ms    67 ms  ae1d0.mcr1.nyc-ny.us.xo.net [216.156.1.9]

 10  4006 ms  4006 ms  4006 ms  216.55.2.10

 11  4006 ms  4006 ms  4007 ms  64.15.0.10

 12    77 ms  4007 ms  4007 ms  64.15.2.14

 13  4026 ms    68 ms    74 ms  rtr2-gec-2.cst.bthpny.cv.net [64.15.4.118]

 14    96 ms  4032 ms  4024 ms  valiant-comm.cst.lightpath.net [69.27.228.30]

 15     *        *        *     Request timed out.

 16     *        *        *     Request timed out.

 17     *        *        *     Request timed out.

 18     *        *        *     Request timed out.

 19     *        *        *     Request timed out.

 20     *        *        *     Request timed out.

 21     *        *        *     Request timed out.

 22     *        *        *     Request timed out.

 23     *        *        *     Request timed out.

 24     *        *        *     Request timed out.

 25     *        *        *     Request timed out.

 26     *        *        *     Request timed out.

 27     *        *        *     Request timed out.

 28     *        *        *     Request timed out.

 29     *        *        *     Request timed out.

 30     *        *        *     Request timed out.



Trace complete.


Avatar of Amick
Amick
Flag of United States of America image

There's not enough information here to make a diagnosis, but the numbers are high and the high variability in RTTs indicates something is wrong.

Try scheduling some traceroutes for the quietest period of the day and see if they change much from what you see here.  You should not see 4 digit latencies, and most should be in 2 digits. (A traceroute from my location to the two destinations you chose showed a worst case latency of 207 ms. With the two highest and lowest latencies thrown out, my average was 19 ms or about 1% of what yours was after completely ignoring everything over 5000 ms as simply too big to believe.)

 If you're regularly exceeding 80% network saturation during peak hours, another T1 is probably a good business decision, but I'm not confident that is the entire problem.
Instead of looking at trace routes, a better indication of latency of your T1 (and not anything else) is to ping an IP, such as the next upstream gateway (at your ISP, not the gateway at your location). Observe that for a while. Maybe try ping tests with a larger payload.

Busy routers typically don't give high priority to sending the ICMP packets used for the traceroute, so you really can't take latencies shown by traceroutes overly seriously.

However, the latencies shown in your traceroute are outside of any norm. At the same time, there are some more normal (low) values in there too. If the T1 was saturated all of the time, you would never see any latencies, so we can conclude that the T1 is not saturated constantly. It is of course possible that bandwidth utilization, while fluctuating, is uncomfortably high and peaks at 100% frequently, which isn't much better. By the way, when looking at bandwidth usage, if you're using the T1 for voice also, don't forget to take that into account (voice channels in use deduct 64kbps each).

At moments the T1 is saturated - either upstream or downstream - ping times shoot up. If this is happening more than occasionally, it probably makes sense to upgrade.

I agree with Amick that we don't have the whole picture, and other things may be going on. I suggest trying to do some more troubleshooting - try just the ping test, and also look for things like retransmits, fragmentation, etc.
Avatar of piecealava
piecealava

ASKER

What additional information is needed to male a diagnosis?
also we may peek to over 80% for very short less than 5-10 min.spikes in usage.

 usage-graph.docx
here is one done at midnight with almost no other usage.

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\jfifield>tracert 69.74.112.75

Tracing route to 69.74.112.75 over a maximum of 30 hops

  1   420 ms     1 ms  4294966905 ms  ip67-154-174-65.z174-154-67.customer.algx.
net [67.154.174.65]
  2    15 ms  4294966920 ms  4294966920 ms  ip67-94-210-13.z210-94-67.customer.a
lgx.net [67.94.210.13]
  3  4294966919 ms  4294966920 ms  4294966920 ms  ge5-2-0d502.mar1.seattle-wa.us
.xo.net [71.5.183.81]
  4  4294967001 ms   108 ms    89 ms  vb1810.rar3.seattle-wa.us.xo.net [216.156.
0.193]
  5   108 ms  4294966992 ms  4294966992 ms  te-4-0-0.rar3.denver-co.us.xo.net [2
07.88.12.82]
  6   102 ms    90 ms  4294966993 ms  te-4-1-0.rar3.chicago-il.us.xo.net [207.88
.12.21]
  7  4294966999 ms  4294966994 ms  4294967000 ms  vb24.rar3.washington-dc.us.xo.
net [207.88.12.34]
  8  4294966992 ms   136 ms    90 ms  ae0d0.mcr1.newark-nj.us.xo.net [216.156.0.
22]
  9   106 ms    90 ms  4294966992 ms  ae1d0.mcr1.nyc-ny.us.xo.net [216.156.1.9]

 10   508 ms    92 ms  4294966991 ms  216.55.2.10
 11  4294966994 ms  4294967000 ms  4294966999 ms  64.15.2.161
 12   171 ms   201 ms   238 ms  64.15.2.14
 13   528 ms  1124 ms   642 ms  rtr2-gec-2.cst.bthpny.cv.net [64.15.4.118]
 14   108 ms  4294967022 ms   131 ms  valiant-comm.cst.lightpath.net [69.27.228.
30]
 15     *        *        *     Request timed out.
 16     *        *        *     Request timed out.
 17     *        *        *     Request timed out.
 18     *        *        *     Request timed out.
 19     *        *        *     Request timed out.
 20     *        *        *     Request timed out.
 21     *        *        *     Request timed out.
 22     *        *        *     Request timed out.
 23     *        *        *     Request timed out.
 24     *        *        *     Request timed out.
 25     *        *        *     Request timed out.
 26     *        *        *     Request timed out.
 27     *        *        *     Request timed out.
 28     *        *        *     Request timed out.
 29     *        *        *     Request timed out.
 30     *        *        *     Request timed out.

Trace complete.

C:\Documents and Settings\jfifield>
What if you ping (not traceroute) for example 67.94.210.13 for a while?
Also, you can try this test: http://www.dslreports.com/pingtest

As I'm sure you know, the usage graph shows solid peaks between 1:00am and about 2:30am of outbound traffic (backup?), and roughly for a good hour at noon (inbound). Outside of those hours things should be ok, though keep in mind the graph shows 5 minute averages, you could have multiple short peaks while still showing the 5 minute average well below 100%.

If you are finding problems during off-peak hours (after 6pm is a very good time to test), something other than bandwidth usage may be going on.
C:\Documents and Settings\jfifield>ping -n 25 -l 1500 67.94.210.1

Pinging 67.94.210.13 with 1500 bytes of data:

Reply from 67.94.210.13: bytes=1500 time=-329ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-372ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=20ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=439ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-346ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=19ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=439ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-346ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=20ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-346ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=19ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=439ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-346ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-372ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=19ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=439ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=-346ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=19ms TTL=254
Reply from 67.94.210.13: bytes=1500 time=46ms TTL=254

Ping statistics for 67.94.210.13:
    Packets: Sent = 25, Received = 25, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 19ms, Maximum = -329ms, Average = 171798667ms
The PRTG usage graph indicates that you can probably justify an additional T1.  Of course this is just one 24 hour period, but note the mesas at 0100-0230 and 1200-1330.  That's 12.5% of the day spent in a bandwidth limited condition.  And if you calculate 40 hour weeks, 5*1.5=7.5 hours or almost a day per week for each of 2 shifts.  That kind of productivity loss should be able to justify  the $10-$20 per day that it would take to acquire another T1.  I'm doing a lot of extrapolation from very little data, but if the information presented is truly representative, then there is another T1 in your future.

The midnight traceroute still shows a lot of variation, and while it is better, it still isn't good. I think there is something wrong with your configuration, but I can't tell what it is.
Here are results from  dsireports.com

Test (From Central USA)      Loss      Min
Latency      Avg
Latency      Max
Latency      Pass
Fail
basic ping
10s of 40 byte packets, 2 per second       0%       68.8ms       79.6ms       85.4ms       [pass]
pass
low bandwidth stream
10s of 512 byte packets at 56 kbit       0%       76.1ms       84.7ms       110ms       [pass]
pass
medium bandwidth stream      was not performed
higher bandwidth stream      was not performed
your first hop ping
stream of 40byte pings to 67.94.210.14       0% loss       71.7ms       Could not estimate first hop speed      [pass]
pass
Jitter/loss with small packets tested from Central - USA:
Jitter/loss with small packets tested from West Coast - USA:

1 minute MTR (hop loss analysis) from Central - USA
Hop      Host      LOSS      Rcv      Sent      Best      Avg      Worst
0      ae-2.bb-d.slr.lxa.us.oneandone.net      0%      60      60      0.44      3.99      113.24
1      te-3-1.bb-d.ga.mkc.us.oneandone.net      2%      59      60      0.90      1.69      16.99
2      te-2-1.bb-d.ws.mkc.us.oneandone.net      0%      60      60      0.95      2.85      37.41
3      te-3-2.bb-d.cr.chi.us.oneandone.net      0%      60      60      11.52      11.69      17.77
4      p12-3.ir1.chicago2-il.us.xo.net      0%      60      60      11.48      14.64      76.65
5      vb2000d1.rar3.chicago-il.us.xo.net      0%      60      60      12.12      12.53      13.30
6      te-4-1-0.rar3.denver-co.us.xo.net      0%      60      60      64.10      64.66      67.03
7      te-3-0-0.rar3.seattle-wa.us.xo.net      0%      60      60      64.46      65.24      67.35
8      ae0d0.mcr1.seattle-wa.us.xo.net      0%      60      60      63.86      64.45      89.81
9      fe0-0.clr2.seattle3-wa.us.xo.net      0%      60      60      64.31      71.69      268.03
10      psr2429088.z210-94-67.customer.algx.net      0%      60      60      69.61      72.89      85.57
11      (TARGET IP ADDRESS)      0%      60      60      71.72      75.06      87.51
[pass]
pass

1 minute MTR (hop loss analysis) from West Coast - USA
Hop      Host      LOSS      Rcv      Sent      Best      Avg      Worst
0      unknown.Level3.net      0%      60      60      0.68      15.22      204.82
1      ae-33-80.car3.SanJose1.Level3.net      0%      60      60      1.18      17.59      173.73
2      XO-COMMUNIC.car3.SanJose1.Level3.net      0%      60      60      1.34      5.83      103.19
3      vb2000d1.rar3.sanjose-ca.us.xo.net      0%      60      60      1.74      2.23      10.72
4      ae0d0.mcr2.seattle-wa.us.xo.net      0%      60      60      18.72      23.35      100.07
5      fe5-0.clr2.seattle3-wa.us.xo.net      0%      60      60      19.16      25.67      215.43
6      psr2429088.z210-94-67.customer.algx.net      0%      60      60      24.51      30.45      71.23
7      (TARGET IP ADDRESS)      0%      60      60      26.58      32.42      45.80
[pass]
pass
Amick,
When you say something wrong with my configuration
Do you think it is on my side or the ISP.
ASKER CERTIFIED SOLUTION
Avatar of Amick
Amick
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I agree with Amick, the outside test from dslreports.com does not show the problems your own tests showed. It's hard to tell with the formatting, but is that a "fail" at the beginning? If so, what exactly failed?

Your own ping test shows some negative numbers, which like the extremely high numbers is not logically possible either. Assuming the traceroute and ping tests were done from the same system, something seems to be off with that machine.

To be sure, you could run the ping test from another machine, and also do some internal ping tests to make sure there are no problems on your internal network.

As for the case for the T1 upgrade, I can't possibly improve on Amick's analysis of that. :)
Tried running the dslreports.com to the host site in NY and it came back with fails from the west coast twice. East coast was fine. I am west coast
Does that possibly suggest that it is a problem on there end?

1 minute MTR (hop loss analysis) from West Coast - USA
Hop      Host      LOSS      Rcv      Sent      Best      Avg      Worst
0      unknown.Level3.net      0%      60      60      0.66      8.83      135.64
1      vlan60.csw1.SanJose1.Level3.net      40%      36      60      0.68      4.34      12.13
2      ae-64-64.ebr4.SanJose1.Level3.net      0%      60      60      0.94      5.23      13.85
3      ae-2-2.ebr2.NewYork1.Level3.net      0%      60      60      69.17      69.70      70.70
4      ae-72-72.csw2.NewYork1.Level3.net      0%      60      60      69.58      74.27      82.61
5      ae-2-79.edge2.NewYork1.Level3.net      0%      60      60      69.53      73.25      126.42
6      ???      100%      0      60      0.00      0.00      0.00
7      ???      100%      0      60      0.00      0.00      0.00
8      ???      100%      0      60      0.00      0.00      0.00
9      ???      100%      0      60      0.00      0.00      0.00
10      (TARGET IP ADDRESS)      2%      59      60      87.32      99.75      137.55]