sedostal
asked on
What could a Database reboot be doing to fix a hanging query
If you have a query that is stuck in processing, what would a database reboot be doing that would fix the query after restarting Statistics don't appear to be involved here as they are updated and are not changed before, and after, the reboot. This has happned 3 times where a reboot of the database fixed the problem.
We are using MSSQL 2008 ( only version supported by the application) on WIndows server 2008
I have query that that can hang will , only at times, but not all the time ( This is inside a cobol program) When the database is rebooted, the query completes in 10 minutes. What would the reboot be doing to fix the problem with the query. Here are the symptoms of of the query duing the hanging process and what was done up to this point.
Symtoms: Numbered 1-6
When running this statement
1. SELECT session_id,status,row_coun t,reads,wr ites,logic al_reads,c pu_time FROM sys.dm_exec_sessions for that SPID, (all numbers run to a certain point and then freeze.
2. When the TEMPDB is examined , the numbers freeze and CPU time is stopped
FROM sys.dm_db_file_space_usage ;
SELECT
sys.dm_exec_sessions.sessi on_id AS [SESSION ID],
DB_NAME(database_id) AS [DATABASE Name],
HOST_NAME AS [System Name],
program_name AS [Program Name],
login_name AS [USER Name],
status,
cpu_time AS [CPU TIME (in milisec)],
total_scheduled_time AS [Total Scheduled TIME (in milisec)],
total_elapsed_time AS [Elapsed TIME (in milisec)],
(memory_usage * 8) AS [Memory USAGE (in KB)],
(user_objects_alloc_page_c ount * 8) AS [SPACE Allocated FOR USER Objects (in KB)],
(user_objects_dealloc_page _count * 8) AS [SPACE Deallocated FOR USER Objects (in KB)],
(internal_objects_alloc_pa ge_count * 8) AS [SPACE Allocated FOR Internal Objects (in KB)],
(internal_objects_dealloc_ page_count * 8) AS [SPACE Deallocated FOR Internal Objects (in KB)],
CASE is_user_process
WHEN 1 THEN 'user session'
WHEN 0 THEN 'system session'
END AS [SESSION Type], row_count AS [ROW COUNT]
FROM sys.dm_db_session_space_us age INNER join sys.dm_exec_sessions
ON sys.dm_db_session_space_us age.sessio n_id = sys.dm_exec_sessions.sessi on_id
3. dbcc opentran shows the LSN has stopped moving for the SPID and but continues to show the transaction running
4. The query does show locks on the tables (sp_lock2) but the locks are exlusive to that SPID
5. When the database is refreshed to another instance, the process runs fine.
6. Several times, when this happened, the query was trapped in the DBCC inputbuffer( spid ) and analyzed. statistics were updated after the hang, but it was only the reboot still fixed the problem.
Other things tried at this point.
Statistics were updated on all tables that are involved in the query trapped by the DBCC Input buffer.
Before the reboot, the DBCC FREEPROCCACHE was executed and the attempt was made again, still the query hangs in a "running" state.
We are using MSSQL 2008 ( only version supported by the application) on WIndows server 2008
I have query that that can hang will , only at times, but not all the time ( This is inside a cobol program) When the database is rebooted, the query completes in 10 minutes. What would the reboot be doing to fix the problem with the query. Here are the symptoms of of the query duing the hanging process and what was done up to this point.
Symtoms: Numbered 1-6
When running this statement
1. SELECT session_id,status,row_coun
2. When the TEMPDB is examined , the numbers freeze and CPU time is stopped
FROM sys.dm_db_file_space_usage
SELECT
sys.dm_exec_sessions.sessi
DB_NAME(database_id) AS [DATABASE Name],
HOST_NAME AS [System Name],
program_name AS [Program Name],
login_name AS [USER Name],
status,
cpu_time AS [CPU TIME (in milisec)],
total_scheduled_time AS [Total Scheduled TIME (in milisec)],
total_elapsed_time AS [Elapsed TIME (in milisec)],
(memory_usage * 8) AS [Memory USAGE (in KB)],
(user_objects_alloc_page_c
(user_objects_dealloc_page
(internal_objects_alloc_pa
(internal_objects_dealloc_
CASE is_user_process
WHEN 1 THEN 'user session'
WHEN 0 THEN 'system session'
END AS [SESSION Type], row_count AS [ROW COUNT]
FROM sys.dm_db_session_space_us
ON sys.dm_db_session_space_us
3. dbcc opentran shows the LSN has stopped moving for the SPID and but continues to show the transaction running
4. The query does show locks on the tables (sp_lock2) but the locks are exlusive to that SPID
5. When the database is refreshed to another instance, the process runs fine.
6. Several times, when this happened, the query was trapped in the DBCC inputbuffer( spid ) and analyzed. statistics were updated after the hang, but it was only the reboot still fixed the problem.
Other things tried at this point.
Statistics were updated on all tables that are involved in the query trapped by the DBCC Input buffer.
Before the reboot, the DBCC FREEPROCCACHE was executed and the attempt was made again, still the query hangs in a "running" state.
ASKER
Actually, the query is aborted(stopped thru the application ( Cobol that's running it) . The stopping and starting of the query has been tried with each time the hanging symptom re-surfacing. Its only the database reboot that fixes the problem. Is it possible that one of two things are happening:
There is one table with 1.2 million rows, when the database it rebooted, there is room in the memory for that table but later, with other queries, its removed and has to be retrieved from disk. This table is also used several times in the same query. is it possible, that with the delay, the query is locking within itself causing these symptoms?
There is one table with 1.2 million rows, when the database it rebooted, there is room in the memory for that table but later, with other queries, its removed and has to be retrieved from disk. This table is also used several times in the same query. is it possible, that with the delay, the query is locking within itself causing these symptoms?
Run EXPLAIN + likely the answer will come up.
For example, if your query is reading 1.2 million rows without using an index, likely adding an index will help.
EXPLAIN is your friend.
For example, if your query is reading 1.2 million rows without using an index, likely adding an index will help.
EXPLAIN is your friend.
ASKER
I follow your concept of the index. :)
but I would think that the problem would resurface after a reboot. It actual runs fine after a reboot. It goes from never finishing to finishing in 3 minutes. The system as a whole, is not stressed, Cache hit ratio is fine, CPU's are on idle and this job is executed on off hours and even than, will only complete after a reboot.
I was just curious if someone had thoughts on what the reboot was doing to make this work vs the normal performance/indexing/stats issue.
but I would think that the problem would resurface after a reboot. It actual runs fine after a reboot. It goes from never finishing to finishing in 3 minutes. The system as a whole, is not stressed, Cache hit ratio is fine, CPU's are on idle and this job is executed on off hours and even than, will only complete after a reboot.
I was just curious if someone had thoughts on what the reboot was doing to make this work vs the normal performance/indexing/stats
ASKER
We were able to re-create the hanging problem in a test environment. First, it ran fine, then we selected a series of very large tables and ran processes, re-ran the original process and it hung.
After experimenting with removing rows from the biggest tables used in the query and not having any luck and having the process continue to hang, the execution of the
DBCC FREESYSTEMCACHE('ALL');the n runs to success.
This is Microfocus cobol program running on
Microsoft SQL Server 2008 (SP2) - 10.0.4000.0 (X64) Sep 16 2010 19:43:16 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)
After experimenting with removing rows from the biggest tables used in the query and not having any luck and having the process continue to hang, the execution of the
DBCC FREESYSTEMCACHE('ALL');the
This is Microfocus cobol program running on
Microsoft SQL Server 2008 (SP2) - 10.0.4000.0 (X64) Sep 16 2010 19:43:16 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)
ASKER
Any idea what still could be causing this?
Post the output from EXPLAIN + likely someone can assist you.
ASKER
How would everyone like the output, xml format ... screen shots of the explain plan graphics?
Output EXPLAIN as text + highlight + click the CODE menu item, before you hit submit.
ASKER
I have attached an explain plan. I'm running 2008 that probably doesn't have the code snip it feature.
What we are seeing here in the explain plan are work tables that populated at the time of the run, this may not be a good reflection of what is happening. I might state here that I'm still getting 2 minute run completion run when the database is rebooted and prior to the reboot, it doesn't finish even after 4+ hours of running indicating to me that maybe a poor part of the query is consuming memory.
This Peoplesoft cobol program has complaints everywhere regarding this problem and more proposed "fixes" then I think I've ever seen.
ExplainPlan0719.txt
What we are seeing here in the explain plan are work tables that populated at the time of the run, this may not be a good reflection of what is happening. I might state here that I'm still getting 2 minute run completion run when the database is rebooted and prior to the reboot, it doesn't finish even after 4+ hours of running indicating to me that maybe a poor part of the query is consuming memory.
This Peoplesoft cobol program has complaints everywhere regarding this problem and more proposed "fixes" then I think I've ever seen.
ExplainPlan0719.txt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I agree with ya. This is very odd. Out of about 5,000 programs and 14,000 data tables and this is the only one that I'm having issues with, its really looking like something off the wall. I have some ideas and will continue to work the problem
Good luck!
ASKER
Not sure what the reboot is doing to fix the problem yet, but at least I'm not missing something obvious.
Thank you David.
Thank you David.
A better way to think about this is all queries are stopped when a reboot occurs.
You can accomplish the same thing by bouncing your database server (bounce == stop/restart), which will also abort all current queries.
Hopefully your SQL version supports the EXPLAIN statement, so just insert EXPLAIN in front of your entire query, on the command line using whatever command line process your product provides.
More than likely this will turn up some sort of lock contention or searching a table, returning every row, which means adding an index may speed up your query, so it finishes.