This article discusses various scenarios that can end up in a DBCC CHECKDB process getting stuck in a Killed/Rollback state and how to resolve the problem.
One of the primary responsibilities of a DBA is to ensure Database integrity. Every DBA would have scheduled a DBCC CHECKDB in their Production environments at some time and would have executed DBCC CHECKDB commands on Production databases whenever there is any suspicion of corruption.
However, if the DBCC CHECKDB executes for a long time, many DBA's get panicked, and you have decided to cancel the DBCC CHECKDB execution, then you can end up with a DBCC CHECKDB command stuck in a Killed/Rollback state. In this article, we dive into detail on how the DBCC CHECKDB command works and how to resolve the DBCC CHECKDB process stuck in Killed/Rollback.
How DBCC CHECKDB works?
The DBCC CHECKDB is a Database Console Command (DBCC) to verify the Logical and Physical Integrity of all objects inside the specified database. Starting from SQL Server 2005 onwards, executing DBCC CHECKDB on a database uses an Internal Database Snapshot to run a crash recovery of the database. (SQL Server 2000 and earlier use a different logic to perform DBCC CHECKDB) For crash recovery, uncommitted transactions in the target database are rolled back in the internal database snapshot to give a transactionally consistent database view.
Performing Logical and Physical Integrity Checks on Internal Database Snapshots helps avoid the Blocking and Concurrency issues on the database. Once the tests are completed, Internal Database Snapshots will be dropped out.
How DBCC CHECKDB ends up in a Killed/Rollback state?
Let’s say a DBCC CHECKDB was executing on a database for a long time, and it meets any of the below scenarios:
- Scheduled DBCC CHECKDB execution running for longer than the average execution time.
- Ad hoc execution of DBCC CHECKDB on a Production database running for a longer time and DBCC CHECKDB was never executed on the database before.
- DBCC CHECKDB execution waiting on some other process or long-running queries to complete for a long time to start crash recovery.
- DBCC CHECKDB was causing several resource crunches on the server, and you prefer to stop it, impacting the server's performance.
Landing in any one of the above scenarios without knowing when the DBCC CHECKDB process might complete will land any DBA into a miserable state. Don't panic and try killing the DBCC CHECKDB command, or it will result in the DBCC CHECKDB command entering a Killed/Rollback state. That only further complicates the scenario since killing the DBCC CHECKDB process when the Database Internal Snapshot is performing Crash Recovery won't do anything and will still wait for the Crash Recovery process to complete.
To be clearer, this is how the internals of a Database Snapshot work; hence it has nothing to do with the DBCC CHECKDB operation. Now that you are reading this article either for learning purposes or already having a DBCC CHECKDB in a Killed/Rollback state, let’s see how we can resolve the problem.
How to Fix DBCC CHECKDB Stuck in Killed/Rollback State?
We have a couple of options available to recover from the Killed/Rollback state that has been sorted based on the Recommended approach.
Option 1 – Do Nothing & Wait for the Transactions to Rollback
Don't Panic!! As SQL Server has already mentioned, the Database engine is trying to Rollback the DBCC CHECKDB process and return it to a transactionally consistent state.
We can identify the SPID of the DBCC CHECKDB process that was stuck in Killed/Rollback status by running the query below:
SELECT spid, cmd, status, loginame, login_time, last_batch, cpu, memusage, physical_io
WHERE cmd = 'KILLED/ROLLBACK';
While waiting for Rollback to happen, we can execute the following command to check how much estimated time is remaining for the rollback to complete:
KILL <spid> WITH STATUSONLY;
Replace <spid> with the SPID of the DBCC CHECKDB session. In this example we would replace <spid> with 60.
Once you know the estimated time for the rollback completion, wait for it to complete. Please note that the times provided are estimated and can increase further. If it seems stuck or the estimated time keeps increasing without any progress, try the next option.
Option 2 – Restart SQL Server Services
Try to restart your SQL Server services to check if that resolves the DBCC CHECKDB stuck issue. You can do that by using the SQL Server Configuration Manager, SQL Server Management Studio (SSMS), or via the command line.
Using Configuration Manager
- Open SQL Server Configuration Manager, and click SQL Server Services from the left pane.
- In the right-side pane, right-click on SQL Services (MSSQLSERVER) and click on Restart.
Using the Command Line
- In SSMS, under Object Explorer, right-click on the SQL Server instance and then click Restart.
Enter the following command in the command prompt to start your SQL Server Service:
net stop MSSQL$ instancename
net start MSSQL$ instancename
Replace ‘instancename’ with the name of your SQL Server instance that you want to restart.
Option 3 – Run Cumulative Update for SQL Server
Note: This solution applies to SQL Server 2016 users.
In SQL Server 2016, there’s a known issue wherein you may experience SQL Server crashing when the DBCC CHECKDB process is cancelled against a large database. Essentially, when you run the DBCC CHECKDB command on a database and cancel it, the session gets stuck in a killed/rollback state. This issue with SQL Server 2016 is fixed as part of CU5 for SQL Server 2016 SP2.
Cumulative Update 5 for SQL Server 2016 SP2
Latest cumulative update for SQL Server 2016
Learn more about this fix from here
What Else Can You Do?
Below are a few precautionary measures you can take before executing DBCC CHECKDB commands in your production environment.
- For Scheduled or more frequent execution of DBCC CHECKDB on Production databases, use the PHYSICAL_ONLY option and run without the PHYSICAL_ONLY option less frequently. The PHYSICAL_ONLY option performs limited checks to reduce the overhead load on server resources.
- Execute DBCC CHECKDB on Production Database only during Off-peak hours when the Server resources are less used out. Running DBCC CHECKDB during peak hours will increase the time required for DBCC CHECKDB to complete along with impacting other queries executing on the Server.
- Estimate the time required for DBCC CHECKDB to complete and baseline it to answer the Business or Users regarding the time estimates. To estimate the time required for DBCC CHECKDB execution, kindly check my article here.
- If the database crashed while killing the DBCC CHECKDB command and you don’t have a backup to restore the database, you might need to resort to using a professional SQL Repair Tool to restore the database to its original form without any data loss.
Killing a DBCC CHECKDB command doesn’t affect; hence, it won’t kill a DBCC CHECKDB command after it is started. When you kill a DBCC CHECKDB command, it may get stuck in a killed/rollback state until the crash recovery process is complete.
If the crash recovery process seems stuck, try restarting your SQL server instance using one of the methods discussed in this article. If you have SQL Server 2016, make sure the latest CU patches are applied to avoid the known issues. If that doesn’t help, try downloading and using a professional SQL Repair Tool instead.
I hope you found this helpful. If you have any questions, please leave a comment below.