Link to home
Start Free TrialLog in
Avatar of Lou Pereira
Lou PereiraFlag for United States of America

asked on

SQL 2005 64 Bit job failure

Hello everyone and thank you for your help:
Background:  Have a SQL 2005 64bit with one instance with three databases (A,B,C).  This is connected to a MS cluster using Windows 2003R2 64 bit SP3.  The cluster was at SP and we were getting timeout errors running jobs from database A to B.  It was recommended to upgrade to SP3 which we did.  Now we get timeout errors from A to B and B to C when running jobs.  
Error meassage:
Date  1/8/2010 10:13:04 AM
Log  Job History (BPW_DAILY_00 - stream 00 hard delete extracts [00:01])

Step ID  3
Server  servername\PWp
Job Name  BPW_DAILY_00 - stream 00 hard delete extracts [00:01]
Step Name  Delete Records in Staging Files
Duration  00:12:05
Sql Severity  16
Sql Message ID  16389
Operator Emailed  
Operator Net sent  
Operator Paged  
Retries Attempted  0

Message
Executed as user: server01\APP-PWp. TCP Provider: The semaphore timeout period has expired. [SQLSTATE 08S01] (Error 121)  Communication link failure [SQLSTATE 08S01] (Error 121).  The step failed.

Avatar of Raja Jegan R
Raja Jegan R
Flag of India image

>> Now we get timeout errors from A to B and B to C when running jobs.  
>> Step Name  Delete Records in Staging Files

Ok.. Kindly let me know what exactly( logic) is done in the job and how..
Also it fails while Deleting records ( Guessing as per your job step description).
If you have huge no. of records to delete, then it might create Transactional logs more and spanning for a long time requires more server resources thus causing these kind of issues..

Timeouts can be reduced by tuning the existing logic or approach and by placing appropriate indexing if not present in the existing one..

Some more inputs on the aspects mentioned should help me to guide you in the correct direction..
Avatar of Lou Pereira

ASKER

The jobs are copying records from one DB to another in the same instance.  The DB is very large I believe 8TB or >.    I also saw posts regarding implementing registry keys as the following:
HKLM\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters
"MaxFreeConnections"=dword: 00000064
"MinFreeConnections"=dword: 00000020
"MaxRawWorkItems"=dword: 00000200
"MaxWorkItems"=dword: 00002000

1. Set TcpMaxDataRetransmissions to 30 (decimal);
2. Set KeepAliveInterval to 25000 (decimal).


This does not seem to work!! Thank you for your help and try to be detailed as I'm not a DBA.
Haven't tried out the above Registry key approached on my own and hence not able to comment on it..

Kindly post out the queries which causes timeouts so that we can try optimizing it or tuning it out along with Execution plans of them..

Hope you are aware of how to check execution plans in SSMS...
I'm confused with your statements.  The error I posted is all over the SQL community with various fixes.   Have you ever seem this error message before, try googling it, this will take you to the SQL msg boards.  I think I fixed with one of the fixes I got from the SQL msg board.  
Ok.. If that is the case, then request you to post the link / solution which worked for your case and close this question accordingly..
ASKER CERTIFIED SOLUTION
Avatar of Lou Pereira
Lou Pereira
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial