Lou Pereira
asked on
SQL 2005 64 Bit job failure
Hello everyone and thank you for your help:
Background: Have a SQL 2005 64bit with one instance with three databases (A,B,C). This is connected to a MS cluster using Windows 2003R2 64 bit SP3. The cluster was at SP and we were getting timeout errors running jobs from database A to B. It was recommended to upgrade to SP3 which we did. Now we get timeout errors from A to B and B to C when running jobs.
Error meassage:
Date 1/8/2010 10:13:04 AM
Log Job History (BPW_DAILY_00 - stream 00 hard delete extracts [00:01])
Step ID 3
Server servername\PWp
Job Name BPW_DAILY_00 - stream 00 hard delete extracts [00:01]
Step Name Delete Records in Staging Files
Duration 00:12:05
Sql Severity 16
Sql Message ID 16389
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
Executed as user: server01\APP-PWp. TCP Provider: The semaphore timeout period has expired. [SQLSTATE 08S01] (Error 121) Communication link failure [SQLSTATE 08S01] (Error 121). The step failed.
Background: Have a SQL 2005 64bit with one instance with three databases (A,B,C). This is connected to a MS cluster using Windows 2003R2 64 bit SP3. The cluster was at SP and we were getting timeout errors running jobs from database A to B. It was recommended to upgrade to SP3 which we did. Now we get timeout errors from A to B and B to C when running jobs.
Error meassage:
Date 1/8/2010 10:13:04 AM
Log Job History (BPW_DAILY_00 - stream 00 hard delete extracts [00:01])
Step ID 3
Server servername\PWp
Job Name BPW_DAILY_00 - stream 00 hard delete extracts [00:01]
Step Name Delete Records in Staging Files
Duration 00:12:05
Sql Severity 16
Sql Message ID 16389
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0
Message
Executed as user: server01\APP-PWp. TCP Provider: The semaphore timeout period has expired. [SQLSTATE 08S01] (Error 121) Communication link failure [SQLSTATE 08S01] (Error 121). The step failed.
ASKER
The jobs are copying records from one DB to another in the same instance. The DB is very large I believe 8TB or >. I also saw posts regarding implementing registry keys as the following:
HKLM\SYSTEM\CurrentControl Set\Servic es\lanmans erver\para meters
"MaxFreeConnections"=dword : 00000064
"MinFreeConnections"=dword : 00000020
"MaxRawWorkItems"=dword: 00000200
"MaxWorkItems"=dword: 00002000
1. Set TcpMaxDataRetransmissions to 30 (decimal);
2. Set KeepAliveInterval to 25000 (decimal).
This does not seem to work!! Thank you for your help and try to be detailed as I'm not a DBA.
HKLM\SYSTEM\CurrentControl
"MaxFreeConnections"=dword
"MinFreeConnections"=dword
"MaxRawWorkItems"=dword: 00000200
"MaxWorkItems"=dword: 00002000
1. Set TcpMaxDataRetransmissions to 30 (decimal);
2. Set KeepAliveInterval to 25000 (decimal).
This does not seem to work!! Thank you for your help and try to be detailed as I'm not a DBA.
Haven't tried out the above Registry key approached on my own and hence not able to comment on it..
Kindly post out the queries which causes timeouts so that we can try optimizing it or tuning it out along with Execution plans of them..
Hope you are aware of how to check execution plans in SSMS...
Kindly post out the queries which causes timeouts so that we can try optimizing it or tuning it out along with Execution plans of them..
Hope you are aware of how to check execution plans in SSMS...
ASKER
I'm confused with your statements. The error I posted is all over the SQL community with various fixes. Have you ever seem this error message before, try googling it, this will take you to the SQL msg boards. I think I fixed with one of the fixes I got from the SQL msg board.
Ok.. If that is the case, then request you to post the link / solution which worked for your case and close this question accordingly..
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>> Step Name Delete Records in Staging Files
Ok.. Kindly let me know what exactly( logic) is done in the job and how..
Also it fails while Deleting records ( Guessing as per your job step description).
If you have huge no. of records to delete, then it might create Transactional logs more and spanning for a long time requires more server resources thus causing these kind of issues..
Timeouts can be reduced by tuning the existing logic or approach and by placing appropriate indexing if not present in the existing one..
Some more inputs on the aspects mentioned should help me to guide you in the correct direction..