After some insight into the following MTR trace (host names obscured to protected the guilty ISP)
HOST: servername Loss% Snt Last Avg Best Wrst StDev
1. x.x.x.x 0.1% 1500 0.7 1.6 0.4 181.6 11.7
2. some.router.provider.net 10.9% 1500 0.9 1.1 0.6 210.5 7.4
3. target.server.com 0.0% 1500 1.1 0.8 0.6 3.0 0.2
We have two servers with an ISP and occasionally have timeouts between them when requesting data from server-backend on server-frontend.
The application logs show all is fine with the backend server but the frontend logs show a timeout over the network connection within the timeframe of the mtr trace.
Anyway, we run mtr in continuous batches of 1500 packets to try and discover where the dropouts are occurring and managed to catch the above output.
To me, this indicates an issue with some.router.provider.net either with a fault or dropping mtr packets due to load. Either way, its under load.
The ISP is saying this proves nothing because the last hop is showing no packet loss.
The question is, what is mtr actually showing here and is it useful or not in trying to determine why the end to end network timeout is happening?