Link to home
Start Free TrialLog in
Avatar of alanterrill
alanterrill

asked on

Why should an MPLS be slower when opening databases?

I've had an MPLS installed between two sites 15m apart. I used to use two leased lines joined by a Vpn but I was told that MPLS technology was much faster as it eliminated the Watchguard units we had and was directed over a private line not the general internet. In practice it's no faster at all.
Previously users had connected to the other site via a terminal server and opened their databases from there. I was hoping that I'd be able to do away with one of the terminal servers and save the users having to logon twice whenever they used a program at the other site. When I tried opening some of the databases directly over the p-wan they were noticeably slower than when used locally. Both leased lines are 10Mb so I thought I would try changing the speed of the network card on one PC down to 10mb to see how slow that would be. It was a little slower than when using the default 1Gb speed but not much. For example, the slowest database we have is a personnel system that uses  Pervasive SQL. On a 1gb connection it takes 10secs to open. On a 10mb PC it takes 15 secs to open. From a  1gb PC at the other site over the P-wan it takes 4 minutes to  open.
The comms company who supplied the link have done all sorts of tests and assure me its running at its correct speed and there is nothing in their firewall that is restricting any internal traffic.  But the database company has hundreds of sites and have never heard of this before. It is also slow with a more modern fundraising system running in SQL server 2008 although nothing like so bad as the personnel system.
Can anyone explain why a 10mb mpls should be so much slower than a 10mb PC running locally? No one has been able to give me any explanation so far.
Avatar of Infamus
Infamus

Have you verified that you are actually getting 10Mb?  I would look at the both ports from the switch and the router first to see if there are any errors.  You can also run iPerf test with your service provider to verify the speed or you can run it between the sites.
Avatar of alanterrill

ASKER

Thanks -yes we are getting 10Mb -its been verified with iperf and wireshark.  I've also checked the switches and there are no errors. I don't have access to the routers but I'm told there are no errors showing there either. For transferring files or opening Word documents its fine. its just databases.
Where is the database server located? (Sorry I don't quite understand what P-wan means)

Is it located in one of the MPLS sites or is it hosted over internet?

How is site A and site B accessing the internet?
Avatar of Craig Beck
Are you testing over the MPLS link when there's no other traffic on it?

What's the MTU over the link?  If you do a ping like this...

ping <IPATREMOTESITE> -f -l 1472

...what do you get?
stop following me around....

:P
Can't help it... you keep getting here first ;-)
We have two sites lets say A and B. There's a 10mb leased line to each site and these lines are joined together with an MPLS (also referred to as a P-wan -private wide area network) by the comms company. There's a breakout at the comms companies end to the internet.
Yes These tests have been carried out in the evening when no one is using the line.
We have servers at both sites holding databases at the site where the majority of users are. So our fundraising team are at site B so their database is held there, whereas the finance team live at site A so their database is held at A. But two of the finance team spend all day working with fundraising income so they access site B through the MPLS.
Running that ping line gives me 25ms.
Can anyone explain why a 10mb mpls should be so much slower than a 10mb PC running locally?

10Mb  MPLS is shared and it is single 10Mb pipe and 10Mb PC is going through your backbone (switch) which has way bigger pipe than 10Mb so full 10Mb traffic will flow through.

Just a quick explanation.
Now we're getting closer -could you expand on that? '10Mb MPLS is shared'? shared with who or what? I though the point was that we had exclusive use of this 10mb pipe?
Sure a 10mb Pc is going through a 1Gb switch but would that make a difference? Surely the switch can't make it any faster than 10mb?
MPLS circuit is shared by your users.

The switch has throughput which can handle more than a 1Gb (even the switch has 1Gb ports, it doesn't mean the switch itself can handle only 1Gb).  That is why I refered to "pipe" as an example.

I just wanted to explain this to answer your question above.

As to why it is so slow is still in question or maybe it is just normal for the bandwidth you have.

If you have SNMP enabled on the switch, you can monitor the port that is connected to your MPLS router using real time bandwidth monitor.   You can see how much bandwidth is utilized in real time when user is accessing the database.
Also, PC with 10Mb will fully utilize 10Mb through the switch.
What was the speed of the private line?

Is your vendor applying some type of QOS on the MPLS link?
I would agree with giltjr.  The MPLS probably uses a QoS policy which is classifying your database traffic as bulk data or some other lower-priority class, along with a policing policy.
I don't think this is a bandwidth issue, it takes 10 sec to open the database on a 1Gb netowork locally.

Author used terminal connection before the MPLS upgrade and obviously terminal server is located locally so they didn't have this slowness issue.
How big is the database in bytes?


Infamus, I missed the fact they used to use term serv.

However, after reading the problem, they were using terminal server and he was hoping that by getting a MPLS link he could do away "one of the terminal servers".

-->  "I was hoping that I'd be able to do away with one of the terminal servers and save the users having to logon twice whenever they used a program at the other site."

So I am assuming now they are trying to open the data base directly from computers located at the site where the database is NOT.  That is SITE1 users, SITE2 DB.  Users are are trying to open the DB directly from SITE1 and NOT use the terminal server any more.

If that is correct, then bandwidth is definitely the problem.
@giltjr

Correction,  the MPLS has no issue and yes, the bandwidth is the problem because 10Mb doesn't seem to be enough for them to access the database.

What I meant was that they are getting the 10Mb as expected therefore there is no issue with the service.  However, they need more bandwidth to accomodate faster database access from site B to site A.
@Infamus

I misunderstood what you meant.  You are 100% correct.
@giltjr

My first statement was actually wrong and you pointed it out very well.

Thanks!!!
"So I am assuming now they are trying to open the data base directly from computers located at the site where the database is NOT.  That is SITE1 users, SITE2 DB.  Users are are trying to open the DB directly from SITE1 and NOT use the terminal server any more.

If that is correct, then bandwidth is definitely the problem. "

Yes that's exactly what I meant. When we had the two sites linked with a pair of watchguard boxes using a VPN we couldn't run these databases, so we installed a terminal server at each site. This worked but users have to login to their PC, then login again to the terminal server, then minimise the remote session when they want to work locally. My aim was to do away with one of the terminal servers to make life easier for my users, keeping one for people who login from home. The comms company told me that MPLS was much more efficient than a VPN and would be much faster. However, I've not ben able to tell any difference  in speed with the MPLS and I'm trying to ascertain if its worth spending even more to increase the two lines to 20mb or 100mb or whether there's something about an MPLS that will always mean the databases run slow and I'll be wasting my money. The comms company can't tell me and of course signing up for a new line means entering into a 3 year contract I can't get out of.
No network technology can over come the laws of physics.  MPLS is efficient.  However you must understand what a "VPN" is.  A VPN is a virtual private network.  

VPN technologies include MPLS networks, Frame Relay based networks, X.25 based networks, and maybe a few other networking technologies.  That is, it is a networking technology that makes it look like you have a dedicated/private (that is point-to-point) link between two locations.

However, today when you say VPN, 99.9999% of all people think a encrypted tunnel over the Internet.  Yes, a MPLS network is more efficient in most cases than a encrypted tunnel over the Internet.  A MPLS network may or may not be faster than a Internet based VPN it depends on a lot of factors.

However a MPLS connection is NOT more efficient than a private/dedicated phone circuit (a.k.a. leased line).  It is cheaper in most cases and typically more reliable, but not more efficient.   Given the same link speeds, a MPLS based connection will NOT be faster than a dedicated circuit, in fact it could will be slightly slower due to the overhead of going through an extra layer of networking that a dedicated circuit does not have.
 
First thing you need to do is see how big the data base is, in bytes.  Then see what the RAW transfer time would be to open that database. over a 10 Mbps link, assuming you only get 90% data through-put, that is normally you will only get 1 MB per second.  If your database is 100MB, then it will take roughly 100 seconds to open.  However, some network protocols are not as efficient as others.  Example:  FTP is way more efficient than CIFS/Samba.  So what could take FTp to tranfser in 100 seconds may take CIFS/Samba 150-200 seconds.
in addition to giltjr, you might wnat to take a look at your application as well.

I'm not a dba but taking 10 seconds to access the database in 1Gb network is pretty strange.

We have SQL and Oracle database running for multiple applications and the remote offices with 3Mb MPLS connection can access the database just fine.  

Instead of increasing the bandwidth, I would recommend to take a look at some of the WAN accelerator as well such as Riverbed as they will let you test it before you buy.
I'm confused now.  The OP said that the database opens in 15s when connected to the same network as the database using a 10Mbps connection, yet over a 10Mbps MPLS (which should be the same in theory) it takes over 4 mins.
On a 10mb PC it takes 15 secs to open. From a  1gb PC at the other site over the P-wan it takes 4 minutes to  open.
That is just screaming queueing or QoS to me.
I agree if the user tested the connection when no traffic is going through the MPLS.

They have 10Mb connection with say, 5 users are connecting, that's only 2Mb per connection roughly.

If PC is set to 10Mb it will use full 10Mb through gigabit switch, no?
He already said the test was done when no other users were using the MPLS.

with QoS you can police the traffic so you could say bulk data can only ever hit a certain limit regardless of the size of the pipe or how busy it is or isn't.
I somehow missed that.

Thanks for pointing that out.
He may have been the only user but he also has 25ms round trip latency. Which seems high to me.
The RTT time is just because it's over a MPLS link which may be a long distance away over various routers/switches/etc...

That wouldn't add 14 mins to the time it takes to perform the same function at the local site.
Thanks for all your suggestions -I don't understand some of this, so I've forwarded the whole conversation on to the company who support our network and the company who provide the MPLS. There's much useful information here which I hope will jog someone to make some more checks on things like QOS over which I have no control. I should add that I'm a one man department at a charity and I have to cover everything IT related, so network protocols are a little beyond me , but thanks to you guys I'm learning all the time.
Good luck and hope it gets resolved quickly.
I do realize that the MPLS network path is going to be more than 15 miles (the distance between the two sites), but 25ms is still a bit high to me.

I would not expect that high of a latency unless there was a lot of traffic on the link or as you pointed out QOS is involved.  My vote is some type of  QOS.
What sort of ping figure would you expect? The comms company though this figure was reasonable.
I've had a reply fro them today -here's what they said:
"I have passed your correspondence to our Network Specialists for further review. I can confirm that, to the best of my knowledge, all the people on Expertsexchange saying that it's a QOS issue are incorrect.

As pointed out by ..., depending on how the database is configured this could be a latency issue that can be resolved by a database/application redesign. If each data request results in the database being re-opened for each access then this can cause the slowness, for example. The MPLS link itself is performing as it should in terms of "agnostic" data access (i.e. data packets are passing over it at full speed, we've proved in testing with you), so the differential performance experienced by database applications versus others is most unlikely to be a result of the nature of the data or the network ports between which it is travelling, and is more likely to be a result of how it's accessed.

I'm really sorry that you are still experiencing these issues, and we will do what we can to help you to discover their cause, but I feel we've reached the stage now where the only way to prove that the MPLS is the direct cause of the problems is to find a way to disprove our assertion that it is not the cause (or vice versa, of course).

To check this you could consider creating a private site-to-site VPN tunnel over the MPLS and accessing the database through it. The VPN tunnel would "disguise" the nature of the data from the MPLS, so eliminating any possibility of QOS or anything similar being the cause. If file transfer by FTP through the tunnel is fast, but database access through the tunnel is slow, then because we know that the MPLS sees both sets of data just as "VPN data" this would confirm that the MPLS is not introducing the delay, and you could confidently go back to your Database provider for a solution.

On the other hand, if database access is fast through this VPN then we would have been proved wrong; but at the same time this would mean that a work-around to slow database access will have been found, so while we were looking for whatever we've been missing to date,".
The latency really depends on the true network path, but I would not expect 25ms for two sites that are 15 miles apart unless their true network path distance is 700 miles.

We have a office in Washington, DC and one in London England.  Over the Internet we get 90ms, when we had a frame relay link it 70ms.  That about 3,700 miles.

We have a primary and a secondary data center that are about 50 miles apart in a straight line and our ping times are under 5ms.

But back to your problem.  I would suggest:

1) craigbeck suggested doing "ping <IPATREMOTESITE> -f -l 1472" and report back what you get.  If you have not done that, then you should, from both sites.  This will help identify if you are fragmenting packets or not.  If you are fragmenting packets, this can add to the overhead.

2) Run a packet capture on both the server that contains the database and a client while you are opening the database.  Look at where the delays are.
SOLUTION
Avatar of Craig Beck
Craig Beck
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I downloaded a test 100Mb file and tested how long it took to copy it from my pc to the server at the same site, and then to a server at the other site:

With PC at default 1Gb -8secs
with PC at 100Mb    - 16 secs
with PC at 10Mb      - 1min 59 secs
with PC at 10Mb to server at site 2 - 2mins 20secs
From server at site 1 to server at site 2 - 1 mins 54 secs
From server at site 2 to server at site 1  - 2mins 18 secs

So the link is operating at 10Mbs as it should.

For the ping test suggested I get 18ms, TTl -124
Yeah, as I said earlier, it could be the database issue.
So, if the MPLS is good for copying a file from a server at a remote site there's no reason why the database should be displaying different symptoms across it unless there's QoS or some kind of traffic management in play.

Try the VPN - see what it gives you.
"So, if the MPLS is good for copying a file from a server at a remote site there's no reason why the database should be displaying different symptoms across it unless there's QoS or some kind of traffic management in play."

Yes-that's exactly what I thought but the comms company say there's no QOS in place. I've looked into rigging up the VPN again but even the comms company can't tell me how that would work. Before the MPLS we had two leased lines connected by a vpn. But now those two lines are joined externally to a private loop to the comms company. If I reconnect the vpn I've no idea what might happen as the signal will seemingly have the choice of going  via the VPN or the MPLS.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Oh.. You said that there's no QoS...never mind about disabling...
Here's what the comms company said "I have passed your correspondence to our Network Specialists for further review. I can confirm that, to the best of my knowledge, all the people on Expertsexchange saying that it's a QOS issue are incorrect."

So I don't know if that means there is no QOS or that they don't think its an issue.

I have run the database with wireshark running and I've sent the resulting file to the comms company and the database company. Both said it didn't show anything amiss but I don't know how to interpret it myself. The databse company have connected in to my PC and seen the problem for themselves and they've checked the database settings but were unable to find anything wrong. they've gone off to talk to their SQL contacts to see if they can find anything.
Also do you have SNMP supported switch?

You can download solarwinds realtime bandwidth monitor and monitor the port where MPLS router is connected.  See how much bandwidth is utilized when accessing the database.
Thanks -I give that a go.
Yes-that's exactly what I thought but the comms company say there's no QOS in place.

So this statement is not true and there is a chance that QoS is still in place?
Interpreting a packet capture can be a bit daunting.   Do you have "full" access to a PC that is in the same building as the database?

If so, what I would suggest is set that PC's NIC speed to 10 Mbps (as you did on one other computer ealier) and run a packet capture on it.

Then run a packet capture on a PC that uses the MPLS link.

In the filter box, filter on the db server's IP address and just look at the time difference between the packets.    You will want to view the time as the difference between displayed packets (View --> TimeDisplay Format --> Seconds since previously displayed packet).

Going across the MPLS link with a 25ms latency you should see some time difference, but not much.


As to QOS, I would say that based on their response it sounds as if QOS could still be being used.  In fact I would almost 00% guarantee that they are using QOS.   They may not be apply different QOS levels to YOUR traffic.  However, since MPLS networks are shared with other customers, I find it hard to beleive that NONE of their customers are using different levels of QOS for their traffic.  

Within the MPLS network all customer are still sharing the network, thus sharing bandwidth.
I can pretty much guarantee that if a service provider says they're not running QoS on an MPLS, they're not telling the truth!  Any service provider not running QoS is not a service provider I'd consider using to be honest.
Oops, my last post should have said:

"In fact I would almost 100% guarantee that they are using QOS."
I've tried running wireshark while the database was opening, but I'm afraid I have no idea what the output means. I tried filtering by the server Ip address (ie putting ip == 172.16.0.12 in the filter box) but it just gave a blank screen.
Here are the first few lines if it helps:
User generated image
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I've repeated the above test on a PC local to the server, but with the PC's network card switched to 10Mb speed. I'd be interested if you can see the difference from wireshark's output. Although this should be the same as from my PC over the wan, this one takes 40sces to open whereas the remote one takes 4mins. So why is it so different?User generated image
It looks like it's all bursty traffic.

I'd be guessing but that would indicate that maybe there is a rate-limit or policing policy on their router which is limiting this across the WAN.
Thank you -it does look like its a combination of the way our database operates combined with some sort of policy on the MPLS which slows it further. I've just forwarded portions of our conversation to our comms company and we'll see what they say. Thank you so much for all your help.
I agree that the traffic looks very bursty.

I did make a mistake and the time between packets on the first capture about about 0.01 to 0.02 seconds (1 to 2 one hundredth of a second) a apart, not 0.10 (a tenth of a second.

However on the second capture the packets are 0.001 or less (one Milli-second) apart.  So locally you are getting data at least 10 times faster.

If you still have both captures, open them up and then enter:

ip.addr == 172.16.0.12

In the filter box, this will filter on traffic to/from that IP address only.

Then click on View --> Time Display Format --> Seconds since previously displayed packet.

I am assuming you did the same exact "function" on both clients.  If you did, then both packet traces should look very similar.

This will show you the time between each packet displayed.  You can verify what the difference is.

What you want to look at is the time difference between the client PC sending a packet out and the server responding.  See if there is a noticeable difference, which I would expect to see since going over the MPLS network you have 25 ms round trip latency.
I've just received a reply from the Comms company about QOS etc:

1. There is no QoS applied on both ends of the Circuit.
2. There is a rate limiter policy of 10Mb which is what the provide is.

In other words, the service is limited to 10Mbps, but traffic within that 10Mbps is not in any way subject to any limitation or preferential treatment.

I've also tried opening a simple access database locally with a 10Mb PC and then again opening it at the other site over the MPLS -again instantaneous opening locally -very long delay remotely. Wireshark traces look like the same sort of thing you call 'bursty' is happening with Access as well.
So -if this slowness is caused by the time delay over a long site to site line, is there any technology that will make this latency shorter? -I assume going for a faster line isn't going to make any difference as the line length will be the same. And going for a different provider isn't necessarily going to make any difference as they never quote latency times only the speed, which as I've found out, is largely irrelevant unless you're copying large chunks of date, which I'm not.
How are they limiting the link to 10M?

Are they doing it by setting the port to 10Mbps, or are they using an actual rate-limit or srr-queue policy to limit?
I don't know, but its not by the port as they asked me to see my switch to 100Mb full duplex to match theirs.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That's exactly what I'm thinking... they probably have a burst-rate limit set.
Thanks -the ping when there is no traffic is 18ms. I'll ask them about the dropped packets possibility.
O.K., 18ms is much better.

Do you have any thing that is monitoring link utilization?  If not, I would get something and start monitoring the utilization.

What you need to remember is that utilization is an average over time, not an "exact" amount.

Link utilization at any specific point in time is either 100% or 0%. If you have 10 Mbps link, if data is being sent it is being sent at 10 Mbps, period.

So if you have an average link utilization of 50% over a 5 minute period, what that really means is that for a total of 2.5 minutes the link was 100% utilized and 2.5 minutes it was 0.  Not 2.5 consecutive minutes, but a total, 10 second here, 20 seconds there, 5 second here and so on so the total time is 2.5 minutes.
I've had another comment on some of the above points from my comms company:

"As far as how the 10Mbps is enforced, naturally it's not by limiting the port to 10Mbps .
It's implemented by a Cisco "police" policy. Full details (and the difference between policing and shaping) are at <http://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-policing/19645-policevsshape.html>, and third-party overviews at <http://blog.habets.pp.se/2010/01/Shaping-and-policing-on-Cisco> & <http://en.wikipedia.org/wiki/Traffic_policing>.

We use policing as it allows us to create separate policing policies for separate traffic, allowing an absolute bandwidth reservation for VOIP traffic, for example, to ensure it is not affected by attempts to transit high volumes of data. In the case of your PWAN the police is applied to all traffic.

Both policing and traffic-shaping can add to latency, but policing less so, and is certainly better than shaping when traffic is below the 10Mbps threshold and when UDP traffic is involved (as it is for VOIP). If you wanted to bypass the effects of the police, setting your own interface to 10Mbps Full Duplex /might/ work, but I'd be prepared to change it back if you see no improvement, as it's definitely not recommended.

The statement "dropping packets is bad" is a bit naive. Dropping packets is bad, but so is queuing them, so there is no ideal solution to ensuring the 10Mbps service is performing at that rate unless you can take steps so that the traffic being sent and received over the link is rate-limited at source.

As far as the database traffic is concerned, if your database application uses "curses" to deal with one record at a time, rather than requesting the result of queries as a bulk transfer, latency will always be a problem, which is why it would be worth a chat with the database solution vendors to see if they have a solution they can offer."

They have turned off the policing for the last three days and I've retried the database and it makes no difference at all. I think there's no more I can ask of them.

I've also had a reply from the database company:
"SDMS Support has raised the issue with both the Database Provider and the Development Language Provider and that they cannot identify any thing which may be causing the slowness problem. The recommendation is should you wish to be able to access the application from the Telford office you should look at either using Remote Desktop Connections or other Thin Client Technologies."

So my conclusions are:
1) The database is pretty badly written and old fashioned in its use of technology and the people that write it know less than you guys :)
2) The comms company are not doing anything wrong -the line is performing as well as any other 10Mbps line
3) A 10mb MPLS is not the same as a 10Mb network -latency makes all the difference.
4) Latency will vary depending on where you are in the country relative to the wiring infrastructure
5) When buying an MPLS ask more questions before signing.

I think that draws a line under this conversation but many thanks to all those who've contributed.