App servers periodically fail to login to the DB server

zc2
zc2 used Ask the Experts™
on
A client of ours has two windows web application servers (virtual on VMware) and a DB server (physical) with MSSQL 13.0.5366.0, all three located at a hosting provider. Few weeks back the app servers started periodically having trouble to login to the DB server (They use OLEDB "SQL Server native Client 11.0"). That happens not always but periodically, 3-4 times a day.
Running "ping -t db" does not show any interruptions.

I wrote a simple script which tries to execute a simple SQL query every 10 seconds, and if it fails record the error the OLEDB produces to a log file plus it runs the command "ping db" redirecting the output to the same log file. The script is  running on all three servers (2 apps and 1 db). Only the app servers have records of lost connectivity, not the db server!

Here is a record stored recently:
1/2/2020 9:47:44 AM Error: Login timeout expired (80004005) Microsoft SQL Server Native Client 11.0

Pinging db [172.16.30.80] with 32 bytes of data:
Reply from 172.16.30.80: bytes=32 time<1ms TTL=128
Reply from 172.16.30.80: bytes=32 time<1ms TTL=128
Reply from 172.16.30.80: bytes=32 time<1ms TTL=128
Reply from 172.16.30.80: bytes=32 time<1ms TTL=128

Ping statistics for 172.16.30.80:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

Open in new window

So, it shows that ping is just fine right after the login error occurs. Also, I tried manually do the command "telnet db 1433", a connection was established just fine.

Again, no errors from the same script running on the db server.

Connection string:
Provider=SQLNCLI11.1;Password=<password>;Persist Security Info=True;User ID=<user id>;Data Source=db;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=STG-WEB;Use Encryption for Data=False;Tag with column collation when possible=False;MARS Connection=False;DataTypeCompatibility=0;Trust Server Certificate=False;Application Intent=READWRITE

Open in new window


Please help!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
ste5anSenior Developer

Commented:
Well, check the SQL Server error logs. What to they say? Also check the SQL Server guest, is it running at its capacity limits?

Author

Commented:
I'm sorry, what exactly you mean by " SQL Server error logs"?
No, it's not a capacity issue. The login timeout happen also at night, when nobody is using the app.
ste5anSenior Developer

Commented:
Ask your DBA for it.
How to Generate Services Revenue the Easiest Way

This Tuesday! Learn key insights about modern cyber protection services & gain practical strategies to skyrocket business:

- What it takes to build a cloud service portfolio
- How to determine which services will help your unique business grow
- Various use-cases and examples

Author

Commented:
We asked the SQL expert from the hosting company to investigate the server. He did that thoroughly and did not find any issues.
ste5anSenior Developer

Commented:
When the SQL Server has no problems using SQL Server authentication, then the login timeout must be a network related (*).

Thus check the network. This includes that you disable TCP chimney (TCP Offload, SNP) on your SQL Server guest.

(*) The databases is not set to auto close by mistake?

Author

Commented:
Thus check the network
I checked ("netsh interface tcp>show global") on both app and db server   "Chimney Offload State : disabled"
Do you recommend to enable it or what?

(*) The databases is not set to auto close by mistake?
Checked, the "create database" script includes this line:
ALTER DATABASE [<db name>] SET AUTO_CLOSE OFF 
GO

Open in new window

Author

Commented:
TCP Chimney offload was already disabled. Also tried to disable the TCP checksum offload (in the NIC advanced settings), did not help.
Installed wireshark to monitor the traffic, it highlights some TCP records red on black (I guess that means an error? a screenshot attached),
but I don't know how make any conclusion from that.
e5a58020dbdac8100c9b876f689619e9.iix.pdf
It turned out the problem was caused by the hosting provider's network infrastructure.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial