Strange SQL traffic over WAN (csv file attached)

I'd like some opinions on a  wireshark capture between a web server and SQL database server that takes place over our WAN.  This is east coast to west coast.  We are trying to troubleshoot issues with slow performance on this web application.  It was developed in house.   We already know we have some issues with our WAN connection to this location from time to time and also looking into that.   I haven't found in issues on the LAN yet.  Also, there are other dependencies but this seems to be the main bottleneck since it's the only one cross-country.

I'm not a packet guru, I do know the fundamentals of TCP.  I'm seeing a lot of ACK's with no SYN's and multiple ACK's in a row. Is this a bad capture?  Also, it's saturated with reassembled PDUs.   The average MS for this connection is 90-100, which I know might be out of the range of tolerance for SQL and could be causing the problem.  Can anyone take look at this capture and shed some light on what they're seeing?
websrv-dbsrv-cap.csv
LVL 1
Vontech615Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

robocatCommented:
Look no further, you can almost certainly blame latency for your performance issues.

The web server and SQL server should really be on the same LAN. Any chance of moving the web server ?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Bill BachPresident and Btrieve GuruCommented:
It is almost always easiest to post the entire trace file, if the data is not critical.  This allows you to easily filter traffic down and look at one specific conversation and then build your analysis accordingly.  Sometimes you need to see data that is simply not in the highest level decode, such as sequence numbers.

Let's look at your questions in order:
>I'm seeing a lot of ACK's with no SYN's and multiple ACK's in a row. Is this a bad capture?
While you are correct in that the SYN/SYN-ACK/ACK handshake will send a simple ACK packet, it is ALSO true that larger packets will send ACK packets, too.  Essentially, whenever the TCP stack receives a packet, the TCP receive process posts an ACK into the SEND buffer, but it does NOT send it immediately.  Since the packet is not yet full, the send process starts a timer and waits.  Subsequent receive packets may update the ACK counter (without resetting the timer).  If no data is leaving the receiver within the timeout period, the bare ACK is sent by itself.  This must be done to tell the sender that the data has been received, so that it doesn't start retransmitting.  It is only when a packet is going back to the sender that the ACK goes along with a data packet.

Having said that, lets look at packets 26-47.  Here, we see three packets streaming from the SQL server back to the client (presumably transmitting a large data block around 24K in size, which is then segmented by the network).  You can tell they are streaming due to the low delay between packets.  Packet 32 is an ACK that tells the SQL server that the client machine wants byte 7301 next.  (Again, since the entire TCP details are not shown, I cannot tell you which packet has bytes through 7300 in it.  You'll want to look at the Sequence numbers of each packet to see what packet is being ACKed here.)  Then, two more packets are sent, followed by another ACK, and so on.  Essentially, the server is sending a large block of data to the client, with no data coming in the other direction.  Thus, the bare ACK packets are seen.  

>Also, it's saturated with reassembled PDUs.
As per above, the SQL engine is sending out a very large buffer (around 24K).  The analyzer has no higher-level packet data to interpret, so it assumes that each packet is part of the above packet.  This is actually very good for a WAN link.  The round trip network latency will kill performance of an application, so requesting lots of data and then streaming data in one direction is vastly superior for performance reasons.  

You can see this a lot more clearly in packets 146 through 164.  Here, we see the client sending a request to the server.  The TCP packet gets to the server, and the ACK is posted in the buffer.  However, the buffer time expires before the response comes through, so it goes by itself.  The next packet from the server comes some 116ms later.  As such, I expect that your network latency cross-country is around 116ms.  

Further, I might venture a guess that this trace was taken from the client side (not the server side), as indicated by the packets 165/166 which are large, 8K packets, which the analyzer saw before they were disasssembled.  (This disassembly is what causes the ACK storm from 168-182.)  Also, it looks like you might have some sort of WAN accelerator or VLAN connection involved, as your ACKs were coming back from the server MUCH faster that one would expect for a complete round-trip delay.  These devices can also make analyzing data streams a bit more difficult, and sometimes you need to be capturing at the server side at the SAME time to filter out the extra traffic generated by the link itself.


So -- to summarize -- your app is indeed taking slowness from the cross-country connection.  This is more due to the sheer number of requests that is being sent.  Remember that each one will take 100ms to complete, or only 10 requests per second.  Redesigning the application to use fewer requests (i.e. use Stored Procedures, use table joins to retrieve related data together instead of several smaller queries, etc.) will improve performance.  Beyond that, though, you may be better served by setting up a terminal server at the server end, and allow users to run their app on that termserver instead.  The screen shots will go across the wire, but this data will be streaming (i.e. much less delay) and the app won't see any major delays in processing.
robocatCommented:
@BillBach: the OP indicates this is a web application. Makes more sense to move the web server next to the SQL server and let HTTP requests go over the WAN (or to the internet?).

HTTP is designed for high latency and will tolerate the latency much better.
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

Bill BachPresident and Btrieve GuruCommented:
Yeah, by the end of my tome, I had somehow dropped that fact, as I was instead looking at the CSV data only.  In this case, the "client" is the web server.  This nullifies the suggestion about a Term server as a solution, but the rest of the analysis is still accurate -- and moving the client (in this case the web server) closer to the database will help.
Vontech615Author Commented:
Thanks BillBach and robocat, this is very helpful information!  The client begins with end users and their web browser of choice (HTTP), the Webserver is next in line requesting data from the DB server cross country (SQL).  I met with the developers, and PM's this morning and presented some evidence of the 100-120 ms round trips that appear to be killing the response time of the web application.  

As BillBach stated...
Remember that each one will take 100ms to complete, or only 10 requests per second

There are actually 2 servers on the east coast requesting data from the 1 on the west coast. Fore the most part everyone agrees that moving these boxes to the same LAN will at least eliminate the delay.

Does anyone know of a best practice guide to SQL client\server design in regards to network considerations (location,delay, etc) ?  I can find a lot of forum posts supporting what we're discussing here but nothing official I could actually use in my meetings with these people.
Bill BachPresident and Btrieve GuruCommented:
I don't have any "formal" papers, but I suppose we could write one.

As a child, did you ever do one of those races where you need to fill a glass with water from across the room, one spoonful at a time?  This is a great analogy to explain this concept to people, since most people have at least seen one of these, if not actually participated.

The network latency is the time it takes you to cross the room.  Each spoonful of water is one packet.  Spoonfuls are pretty small, so you have to go back & forth many times to get the job done.  Imagine if you could instead carry a 1/2-cup scoop every time?  It might take more time to fill the 1/2-cup scoop, and the latency is no different, but it requires far fewer trips in total, and you'll be done many times faster.  Now, imagine a 1-quart container doing the same!

In short, it's not the volume of data, but the number of times you have to go back & forth that makes the difference here.  Designing an application which reduces the number of actual database calls to the bare minimum will reduce this round trip time delay, and thus run faster.
robocatCommented:
Because light in a fiber has a certain speed, latency increases with distance and this can not be avoided. This is why traders set up shop close by the location of the stock exchange to get their orders in as fast as possible.

You won't find any hard numbers on maximum latency because this is very much application dependent.  An application that needs 10 requests to serve a user, at 100ms might perform as badly as an application that requires 100 requests at 10ms.

In general I would recommend <1ms latency, implying 1Gbit or 10Gbit LAN speed between database and application. 10Gbit can increase performance even if the bandwidth is not needed, because of the lower latency.

 Anything above 10 ms should be considered poor. And 100ms is really bad.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Network Analysis

From novice to tech pro — start learning today.