Honest bandwidth testing

I suspect this question will turn into nonsense because I get replies saying that my questions aren't clear, usually from the same people however so I'll give it a try.

I have a customer that is experiencing a ton of weird problems with his broadband connection which they use heavily for VoIP and development work. They called me in to try and figure out what is going on and since I've been checking, for about a month now, I see slow, sluggish speeds but good ping results.

When these problem happen, I use a number of tools to try and figure out what's going on, including connecting directly to the router to eliminate the local network as a source of the problem.

I'll test the bandwidth using (baloney) bandwidth services such as speednet.net and speedof.me and others and can see speeds of around 2/3Mb/s (this is a 50Mb/s broadband connection) and going down to KB/s and Kb/'s quite often, I'll run these tests both from a browser and from the command line of a linux box connected on the network.

I'll run mtr simultaneously and find that ping times are great but there are random LOSS of 50% and up.
I know this isn't ICMP being limited because it doesn't get limited when things are working fine.

The ping tests are always to the same server on the internet so that I can have a reference of some sort and that server is on a solid data center connection.

The provider constantly blames the customer and says that everything looks fine. Obviously, things aren't fine but there's no way of proving otherwise. Pretty frustrating that it's put on the customer but it is and they are the only option available so not much for choices.

The sluggishness is so bad that web browsing is slow and sluggish, sometimes waiting 30 to 60 seconds for a page to fully display. VoIP connections are constantly lost and remote work is almost impossible as ssh connections keep breaking.

I am looking for suggestions for other tests I might be able to run which would help me to identify and possibly show proof to the provider which they cannot deny or at least help them to figure out where/what the problem is and fix it.
projectsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

JustInCaseCommented:
There is no such thing as Honest bandwidth testing. :)
The old adage says: "All testings are at least polluted by testing itself."
But, let's try to make some sense of it.
The first thing that I would do is configure Netflow (WAN port) and SNMP to at least edge device. Beside those two, you should check syslog messages on edge device. No information about you edge router in post.
That should give you really good picture about your edge device and bandwidth.
Netflow can provide you with bandwidth usage (for all devices), top talkers and many more
SNMP can provide you with details about device CPU, memory usage etc

Problem with using  speednet.net and speedof.me and others (at least part of problem) is that you measuring bandwidth for your device, not total bandwidth for all network devices. If other users already use 45Mb, sure you can see just results for bandwidth that you get 5Mb max.
If there is no free memory on router, CPU is on 100% utilization - packets are dropped for either no place in buffer for new packets, or no free CPU cycles to process packets, tcp sessions become wild sending many times the same packets etc... so, you can have the same symptoms as there is no available bandwidth...
The provider constantly blames the customer and says that everything looks fine. Obviously, things aren't fine but there's no way of proving otherwise. Pretty frustrating that it's put on the customer but it is and they are the only option available so not much for choices.
Maybe ISP sees it as OK, 50Mb of bandwidth delivered to your customer, no packets loss on its side, but ISP is not interested do you actually need 100Mb of bandwidth to support your needs, or you have edge device that can't keep up to your customer needs.
Network monitoring can provide you with such data, so you can tell customer root cause of problem (locally - you need better router, or point finger to ISP and present data to support it).
I use PRTG, but you can use any other network monitoring software for this. In PRTG 100 sensors are free and that should be enough for you to find root cause of your problem.
0
Dave BaldwinFixer of ProblemsCommented:
I know that some of the hosts on http://www.speedtest.net/ can not keep up with my internet connections.  There was one that showed 2Mbps download and 6 Mbps upload.  When I switched to a different host, I got my 'true' connection speed of 100Mbps+ for download and 10Mbps for upload.

The other problem with speed tests is that the rated speed is with no other traffic.  If you are running two continuous tests at once, one is taking bandwidth from the other.

Web browsing speed is very dependent on the web site.  I know of several sites that take forever to load because they both loading too many files and they are running too many tracking scripts.  Traffic to major sites like Yahoo can't be used as a measure of the performance of your network because their traffic varies too much.

The first thing you have to determine is the traffic on the connection to the ISP.  If it is at 100% utilization, then they need a faster connection.  If it is not, then you have to determine what is causing the slowdown.
0
Maidine FouadEngineerCommented:
Your questions are clear , but the source of the problem is not ^^

As far as suggestions , from another perspective  :

I would First Try to Make a Visual map of the network , Divide this eagle eye view of the network map , into areas(Devices , internal network, External network ...) , and troobleshoot from on area to another until you are 100% Sure its not from that area.

Then some things to check on what might cause it :

Check the QOS configuration, Incorrect QOS can cause Jitter(packets get played back or arrive to late)

A flapping link can cause convergence problems in routing protocols in switch-based networks,packets get auto dropped if no valid alternate path is available

Packet loss in rare cases could be due to a problem with a Digital Signal Processor (DSP) in a hardware-based MCU, Or maybe The hardware used cant support business Heavy load , or the ISP connection is not business class , or Used at 100% like @Dave Baldwin said , which Really might be the case

Each Vendor has specific mechanism for reporting packet loss, so Depending on the type of equipment they use on their network , you should check their respective Troubleshooting software,Also The tools that @Predrag Jovic are solid and Will do the job
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

projectsAuthor Commented:
Only problem is, you're all talking about from the network perspective while I'm talking about from the providers perspective.
I can test everything possible internally by disconnecting the network from the router and all is fine, nothing overused, not a problem of multiple users using the bandwidth.

One thing I should have mentioned and wasn't able to edit since there were already replies is that this is a 'business' connection by a cable company and DOCSIS.3. On dedicated T1 connections and others, it is simple to diagnose and contact the provider where we can test together but when customers have cable connections, this is when these kinds of problems show up.
1
Ben CookCommented:
The gateway router between the high-bandwidth LAN and the lower-bandwidth WAN is the device that has the most influence over QoS. Even if your ISP is crap, a good traffic-shaping setup will be able to improve latency during congestion, especially if the clients are willing to sacrifice some bandwidth to assure more predictable latency. Since the clients used VOIP, a QoS configuration really should have been already been in place.

You need long-term stats from the router, by using Netflow or anything that can give us some idea what is happening.


What router do they have?
0
projectsAuthor Commented:
Again, I am not asking about the local network, I always have plenty of details for local networks which I monitor.
What I'm trying to explain is that customers user all kinds of different connections. The easiest ones to test are dedicated traditional circuits which are point to point. So long as I can constantly prove that there is no problem point to point, then I know it's from the provider on and because these are SLA, I can get support involved.

However, when it gets to silly DSL and cable companies, the level of professional is non existent. These companies sell so called business services using the same infrastructure as the house next door and everyone usually experiences the same problems.

It is these providers which I need more proof to show my case otherwise they almost always say 'everything is ok' when it is not. The only time they don't say that is when they *finally* figure out there is a problem in the area and they get to work on it.
A dedicated circuit typically ends at one of the providers POPs and then into one of their central locations while cable and others are already shared in the area, bouncing through amps and switches then onto their central location which is then further more shared with everything else they have going on.

Cable companies have no business offering so called business services because they are identical to the rest of their infrastructure, the only difference being that they respond slightly faster than consumer calls.

All of the above said, when I have to troubleshoot an SLA based circuit, there is always someone who cares to help me at the provider end. When I have to troubleshoot one of these joke business services, they almost never admit to problems until I can give them proof. I can gather all kinds of information but the one thing I cannot accurately get is a real bandwidth test when the connection is the joke business connection.

So again, please, do not think *inside* the network, that is not the issue.
0
JustInCaseCommented:
SNMP i NetFlow are the way to gather info about real bandwidth on your WAN port.
SNMP data from PRTGExplanation of graph:
I was not home, but computers were powered on so there is some minimal traffic, after reading your post I downloaded something (torrent) so you can see that real time data on my network is gathered, and you can see my internet speed is 6/1 Mb.
:)
0
Maidine FouadEngineerCommented:
"Thinking Outside the network",  if you want a decent real world Test , what i would suggest :

Testing with icmp doesn't always give correct results

Some routers are programmed to give lower priority to icmp packets , just so no processing power is lost on icmp instead of Real Traffic, so they can be high loss , but it doesn't necessarily mean it slows down real traffic , their router can be busy  or perhaps they dropped away the icmp  ( witch is not good  )...

Or they are mitigating Denial of Service attacks and not responding to ICMP ...

Some suggestions :

Checking what The SLA that was signed between the ISP and your client covers (If it doesn't cover connectivity to not just the ISP but the Rest of the internet as well , i mean really why bother testing then if they don't cover that )  , You could negotiate with the ISP to lower down your level of service , keeping them as backup or leaving them once the contract is over, and finding a better one for your client.

As far as testing  ,You can Send away big pieces of Data  that cant be cached ( i would pick 150 mg) ,Between at least 4 different Testing points  ,  Over a long period Of time , Repeatably , generate some graphs out of that data and show it to them

You could test against their Upstream as well
0
projectsAuthor Commented:
As mentioned above, I first check for a period of time to get a baseline, to know if icmp is being limited or not. In the cases I'm talking about, icmp is not an issue.

Also as mentioned above, SLA circuits are not a problem, I can always get help when there is a problem with those, because they are SLA.

@Predrag Jovic;
Yes, I understand this but again, I do not have any access or details from a network such as a cable modem one. There is no router I can test against as that would never be allowed. I can only pick a known point on the internet and test against that, where I control things.
Also, your test is expecting to be between the LAN/Router or Router/WAN and I don't always have that access.
0
projectsAuthor Commented:
The question is basically, what to do in order to get as real results as possible (hence, honest) when you don't have control of the network, you only have control from inside of it to some external point and need to see in between and pings/traceroute aren't helping to know why there is a *bandwidth* problem.
0
JustInCaseCommented:
Also, your test is expecting to be between the LAN/Router or Router/WAN and I don't always have that access.
No, you can configure network monitoring on your device that is facing ISP's router (wherever is last link - port on router or switch that is under your control - where your client send data to ISP's equipment) and on one host that will gather data, no big mystery here. You can get those data whenever you want since data are stored on that host. You can set monitoring to get data from all of your devices, I am just talking here that your exit point will give answer to your problem (either link is overloaded or ISP is problem).
The question is basically, what to do in order to get as real results as possible (hence, honest) when you don't have control of the network, you only have control from inside of it to some external point and need to see in between and pings/traceroute aren't helping to know why there is a *bandwidth* problem.
Again, the answer is SNMP (and NetFlow). All this monitoring is done only on my home edge router, it could be L3 switch also or any device that supports SNMP, I don't do anything with ISP's modem. This is measuring traffic on device under my control and has nothing with ISP's equipment.

To better explain (I hope) that you can get the similar result from few measuring points. I would get almost the same result by measuring data traffic from interface VLAN10, since all my home devices are in VLAN10, and there are no currently active devices in other VLANs, so result would be almost the same as amount of traffic measured on WAN port. You measure just amount of traffic through specific point in network, just one port (in both directions).

That is the most honest bandwidth testing that you can get.
:)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Maidine FouadEngineerCommented:
@projects Sorry this page was Loaded before you answered so i didn't get to read it

I still cannot believe how a dedicated t1 on docsis3.1 can have a low bandwith...

The question is basically, what to do in order to get as real results as possible (hence, honest) when you don't have control of the network, you only have control from inside of it to some external point and need to see in between and pings/traceroute aren't helping to know why there is a *bandwidth* problem.

This is like a Security audit  ^^...

you only have control from inside of it to some external point

Can you be more precise ? how much control you have (that External Point ?) ?

Also Can you please Give us a small sample of the data collected if you do start ?
0
projectsAuthor Commented:
There is no more information to give. I've presented an average situation. For example, sometimes, I have access to the customers firewall/router and even communicate with the provider on their behalf. Those times, I can usually find the problems and resolve the issues.

The main times where I cannot is when the customer only has say a cable modem and I know the problem isn't on the network side.
I'm not sure why anyone would ever say "I still cannot believe how a dedicated t1 on docsis3.1 can have a low bandwith..." because any kind of provider can have problems and cable providers are the very worst of all.

There are no samples to give either as they are simply ping times and bandwidth tests.
0
projectsAuthor Commented:
Many good answers, none that really work for me but most certainly could help others. Thanks.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Broadband

From novice to tech pro — start learning today.