Link to home
Create AccountLog in
Avatar of marrowyung
marrowyung

asked on

can't kill a SQL spid

Dear all,

Right now I ran the SQL monitor.exe from :

http://sqltouch.blogspot.ca/2013/05/sql-performance-monitoring-tool-simple.html

but when it can't connect to a SQL instance it will left some code in the tempDB of the SQL server it failed to connect:

EXEC shb_generate_waitstats    /************************************************************  ** category wait stats  ************************************************************/  USE tempdb

Open in new window


and when i kill it, it seems ok, and if I kill it again, it shows:

"SPID xx: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds."

it just keep hanging around there and we have 4 x connections like this now..

how can I force to kill it without restarting the SQL service?
Avatar of Saurabh Bhadauria
Saurabh Bhadauria
Flag of India image

Check the locking in Database...

use sp_lock  or  sys.dm_tran_locks   to see information



Thanks,
saurabh
SOLUTION
Avatar of edtechdba
edtechdba
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of Anthony Perkins
You have KILLed it and that is why you cannot kill it again.  What it is doing now is rolling back the changes and they can take as long as it took them to be made in the first place.

If you restart your sever in the middle of a rollback, you risk losing data or at the very least it will simply continue to rollback.

Now, I realize this thread will now continue endlessly, so if you don't mind I am not going to contribute any further.
Avatar of marrowyung
marrowyung

ASKER

"If you restart your sever in the middle of a rollback, you risk losing data or at the very least it will simply continue to rollback.

Now, I realize this thread will now continue endlessly, so if you don't mind I am not going to contribute any further. "

funny !
I strongly blieve that this thread will not keep continue endlessly as someone in the forum already say restart it does help.


Saurv:

"use sp_lock  or  sys.dm_tran_locks   to see information "

then what???
ASKER CERTIFIED SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Saurv,

"if you found any other locking  like tab lock or something else.... then kill that SP ID first... "

that spid of that connection is already TAB, as I said after I killed that and it will say:

"SPID 62: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds."

today still 0%..... don't tell me it needs to rollback serveral days.

"TAB = Lock on an entire table, including all data and indexes."

You mean this is this kind of lock kill the whole thing and the rest of the process can then, be kill?

This is the first time I found a spid can't be kill, the troublesome Oracle or MS SQL process that kills our tempdB all the time can be kill however.
What I am suspecting , there is something which is preventing Rollback...

so if there is any other lock ...then you should kill that..connection...


chk below links... it may be helpful..

http://www.mssqltips.com/sqlservertip/1473/killing-a-spid-and-checking-rollback-progress-in-sql-server/

http://blogs.msdn.com/b/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx

Thanks,
Saurabh
right the result from running:

exec sp_who2
go

is those spid is in SUSPENDED status, it is not in rollback status, a bit different from what I expected and what is list from the link:

but command is KILLED/ROLLBACK and DBNAME is tempdb.

http://www.mssqltips.com/sqlservertip/1473/killing-a-spid-and-checking-rollback-progress-in-sql-server/

what can I do ?
Does sp_who2 shows any blocking session under Blkby column.. ??
oh,, forget you are at works now.. you are in Europe ?

all Blkby column shows '.' and it means. ...................
India...   :)


Blkby  is column returned by SP_who2 , which tells us if a SPID is blocked by some other SPID ....

look at below link..

http://www.dbadiaries.com/using-sp_who2-to-help-with-sql-server-troubleshooting


Thanks,
Saurabh
last steps is restart the service, but no......

this time is funny that.  the hardest SPID can be kill but not this !! what the XXXX !!!!

last time we have a query from Oracle side that kill the tempdB log and I can simply use sys.sysprocess to find it out bsaed on the userid, then just kill it and shrink.
great, so what do you think about my case ? any more comment? can't see how.

india very smart. I agree !!
I read both:

http://www.mssqltips.com/sqlservertip/1473/killing-a-spid-and-checking-rollback-progress-in-sql-server/
http://msdn.microsoft.com/en-AU/library/ms173730.aspx


it seems that we can find the UOW and just kill it, how can we find out the UOW of that SPID (We knew all SPID in trouble now)?
Can you attach  here  the result of SP_Who2  here...


-Saurabh
User generated image
here you go, note the highlighted one.. this is one of them..
SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
mark_wills:

I ran the SQL monitor.exe from :

http://sqltouch.blogspot.ca/2013/05/sql-performance-monitoring-tool-simple.html

but when it can't connect to a SQL instance it will left some code in the tempDB of the SQL server it failed to connect:

"saw an oracle comment above), "

Which one?

this software has connection problem in only one server at this moment. if what I scare that about is that one is a core server and I can't real time monitor that. we don't have any proactive monitoring tools for DB yet.
by running select * from sys.dm_tran_locks;, it don't show the spid but GUID, I don't know how is it related to the SPID I am concerning.

how to know the GUID of the process having problem? I plan to kill using GUID, is it the so called UOW?

Ref:http://msdn.microsoft.com/en-AU/library/ms173730.aspx

"UOW
Identifies the Unit of Work ID (UOW) of distributed transactions. UOW is a GUID that may be obtained from the request_owner_guid column of the sys.dm_tran_locks dynamic management view. UOW also can be obtained from the error log or through the MS DTC monitor. For more information about monitoring distributed transactions, see the MS DTC documentation.

Use KILL UOW to terminate orphaned distributed transactions. These transactions are not associated with any real session ID, but instead are associated artificially with session ID = '-2'. This session ID makes it easier to identify orphaned transactions by querying the session ID column in sys.dm_tran_locks, sys.dm_exec_sessions, or sys.dm_exec_requests dynamic management views.
"
Mark_will,

Please try to answer this if you have time:

https://www.experts-exchange.com/questions/28168394/DB2-DR-procedure.html
Well,

I am not familiar with that particular Monitoring tool, but if it cannot connect, then what the heck is being left in TempDB ? You say code - what type of code ?

It does say that it uses its own "SQLite" database and guessing that it is utilising all the DMV's it can, simply storing and snapshotting into sqlmonitor.db3.

It would be worthwhile to see if there are any linked servers created on the SQL Server in case they are doing "select into linked_server .... " type of transactions.

In which case there might be remnants of trying to connect to the linked server and a few processes set up to store results and maybe that is the connection error (a security or permissions problem from that particular SQL server to the sqlmonitor db).

It might help if you can post some of the error messages and describe some of the remnants in TempDB.

That Oracle comment was your post above in https://www.experts-exchange.com/questions/28175841/can't-kill-a-SQL-spid.html?anchorAnswerId=39301629#a39301629 and re-reading it might not be relevant.
"Well,

I am not familiar with that particular Monitoring tool, but if it cannot connect, then what the heck is being left in TempDB ? You say code - what type of code ?"

yeah, you are right and this is one of the reason I don't understand and it can't connect, why it left some code in the tempdB:

EXEC shb_generate_waitstats    /************************************************************  ** category wait stats  ************************************************************/  USE tempdb

Open in new window


I contact the author and he don't konw why too and said the hanging process will not kill tempdb. yes, it didn't for the last few days.

"It does say that it uses its own "SQLite" database and guessing that it is utilising all the DMV's it can, simply storing and snapshotting into sqlmonitor.db3."

yes.

the .db3 will be create during the first time and will left there.

"It would be worthwhile to see if there are any linked servers created on the SQL Server in case they are doing "select into linked_server .... " type of transactions."

I just connect this from other DB server and that DB server has link server in between, however,  it still does make any sense to be so call "can't connect" but still have object left in tempdB.

"It might help if you can post some of the error messages and describe some of the remnants in TempDB."

When I tried to kill it by spid, it said it is ok but it still there from selecting sys.sysprocesses. if you kill it again it will say:

"SPID 62: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds."

this has already been serveral days.

see the answer above you will see I did the sp_who2 already and another contributor also don't know what process block me from killing it.

sPID 62 is doing TAB IX lock.
OK, with a linked server in between, and/or the application using DTC, then would first try restarting DTC. You might need to choose a quieter time depending on your environment because any other apps relying on DTC might generate error messages for your active users, but would only be momentarily, they could retry and continue on.

That message is normally due to waiting on a resource that no longer exists. While it should simply drop off, it doesn't because it cannot access that resource anymore, so hangs there.

Now, you can also run
sp_lock 62

Open in new window


and get a list of object ID's to look at in sys.objects (make sure you look at sys.objects in the correct database).

We went through some of this a while ago with locking (hope you recall)

The idea is to find the root cause of the lock and if a table, then we might be able to trick the table to become unlocked. But is a long shot - the DTC sounds a better starting point.
Do  I have to take ths one offline and online again:?

User generated image
so restarting this DO NOT bring down the SQL cluster and SQL server service ?

When I do  sp_lock 62, I get this:

User generated image.

is it normal ?
can see why it related to the DTC as I run the software from other nodes but I don't connect it via that node.

for example, SQL A and SQL B, I run that SQLmonitor.exe from SQL A and access SQL B, any only SQL B has this problem..


not reatled to DTC even there are linked server setup between A and B?
The linked server I was referring to was if the routines were trying to update sqlmonitor.db

If you are talking to the Author of that routine, ask if it uses DTC in any way (or linked servers).

Are you running an active SQL cluster for failover purposes ?

It doesn't affect SQL Server or SQL Agent per se. Really DTC is a windows op-sys function to manage distributed transactions. You might want to read through : http://technet.microsoft.com/en-us/library/cc754134(v=ws.10).aspx and see if you can list the transactions before being too heavy handed.

So in the meantime, lets see what that object_id is pointing to...

select * from tempdb.sys.objects o where o.OBJECT_ID = 733259272

Also, worth noting that external processes / applications you do need to kill that externally first before killing the spid, and that will likely remove the spid as well.
"Are you running an active SQL cluster for failover purposes ?"

yes.


"Also, worth noting that external processes / applications you do need to kill that externally first before killing the spid, and that will likely remove the spid as well. "

I have killed that last week, it doesn't works...

right now, still:

SPID 62: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.

0% !!

after running :

select * from tempdb.sys.objects o where o.OBJECT_ID = 733259272


I got the name:

#am_wait_stats_snapshots____________________________________________________________________________________________0000003577CC

Open in new window


it should be the same statement if I run the sys.sysprocesses.
The author said they don't use any DTC in the application, but we should sure about that !! as  why use ? just simple real time monitoring tools .
Just "simple" real time monitoring ? But, that simple monitoring tool has created problems for you, and not even the Author of the tools seems to be able to resolve.

I can understand where, why and how they might want to use DTC, but that seems quite redundant now.

The comment before about aborting the external application first was more for future reference. Killing the SPID first does not stop the app, and the app might have various wait states or transaction starts that never complete.

The temp table shown seems to be the problem. You might like to attempt to drop that table, but not sure if you will be successful (exclusive locks and all that)...

I would still be inclined to look at that DTC link above and check for uncommitted transactions.

If you are using clustering / mirrors then it will be using DTC. You can still stop and start the DTC process but will need to take into account what will happen to your cluster. Having said that, you will need to consider those impacts if you end up restarting SQL too.
"Just "simple" real time monitoring ? But, that simple monitoring tool has created problems for you, and not even the Author of the tools seems to be able to resolve."

yes, you are rihgt!! but I just test all server, only one server has problem, can't see why

"I can understand where, why and how they might want to use DTC, but that seems quite redundant now."

They don't use it, we can forget about that.

"The comment before about aborting the external application first was more for future reference. Killing the SPID first does not stop the app, and the app might have various wait states or transaction starts that never complete."

The applicatoin has been stop as I see it is Hanging and i just kill it when I try to connect to the DB server we are concerning.

So this one is not going to happen any more.

"I would still be inclined to look at that DTC link above and check for uncommitted transactions. "

uncommited transaction can be the sys.sysprocess with open_trans > 0.. but it might not helping me at all.

"If you are using clustering / mirrors then it will be using DTC. You can still stop and start the DTC process but will need to take into account what will happen to your cluster. Having said that, you will need to consider those impacts if you end up restarting SQL too."

yeah, as I will turn on the C2 audit later on and I need restarting SQL service anyway, but just causion about the if the left behind process will kill the tempdb and now... at this moment, it will not.... so I am ok with this. The authoer also say this hanging thing will not kill the  tempdb.
thanks.

again, no mark for acperkins, who come and B.S. all time, he has too much time left.
DTC really is external to SQL Server, so there might be other transactions that aren't in sysprocesses - you cannot assume that SQL has every answer every time for hardware / network / other environmental aspects. A lot of people only experience DTC because of SQL and make the mistake that it is all part of SQL Server - it isn't.

That error message is not new. It is often a mystery requiring a restart. It is normally as a result of a broken transaction leaving exclusive locks hanging around waiting for a resource that no longer exists, or it is impossible for it to commit (because it was aborted). Would be brilliant if it did time out (like a deadlock) but it hangs there trying to rollback.

There have been a few "connect" items logged with MS but seem to get closed out without a workaround, without a fix (other than grab a dump and call MS). see http://connect.microsoft.com/SQLServer/feedback/details/433703/killed-rollback as an example.

It is often associated with external programs - like backups, OSQL, xp_cmdshell, Linked servers (and replication), DTC and third party apps. Especially susceptible are external apps creating a SP that starts a transaction but then the app abends (or is aborted). It seems that the SP almost owns that transaction and not the spid. But that is just conjecture on my part.

I do wish that apps that need to create an SP for their own use did so as a temp stored procedure (same as a temp table - prefix the SP with a #). When the connection is closed (or aborted) then so is the (temp) procedure (and hopefully prevents this type of problem). or, avoid having an app (for the sake of that session) creating SP's altogether and use sp_executesql to run a "batch" of sql statements.

There are a few other strategies that can be tried, but when it comes to objects in tempdb it is almost a law unto itself. Some of those strategies are to run a checkpoint, login using DAC, if there are linked servers or replication / mirrors, then the DTC becomes interesting once again. If it has created a SP, then would be trying to drop that procedure as well.
And, on a personal note, I do have a lot of time for acperkins. Sometimes comes across as a bit gruff, but generally speaking, has a lot to offer and taught me a couple of things along the way...
Thanks Mark.
mark_wills,

"And, on a personal note, I do have a lot of time for acperkins. Sometimes comes across as a bit gruff, but generally speaking, has a lot to offer and taught me a couple of things along the way..."

very professional on this, as I always said you are helpful.

but not this:

"Now, I realize this thread will now continue endlessly, so if you don't mind I am not going to contribute any further. "

he might need to learn something else other than pure IT tech.. politeness is one of this.

acperkin: please learn from Mark_wills too, he is a kind man..

"DTC really is external to SQL Server, so there might be other transactions that aren't in sysprocesses - you cannot assume that SQL has every answer every time for hardware / network / other environmental aspects. A lot of people only experience DTC because of SQL and make the mistake that it is all part of SQL Server - it isn't."

yeah, I belive that but I think this is a exceptional case and I just don't focus on this any more and I don't think anyone else can help me further. and therefore, I close this ticket .
(acperkin will be happy to see it).

I have some oracle ticket that I didnio't close as IT incident keep coming, for example, our vendor just release some patch since 2 days ago and it keeps one of the MS SQL serve's agent can't show all jobs, the active montior can't shows up correctly.

I will start doing the check the rest of ticket today.

The vendor just don't know what to do and enquire us (and therefore me...!!!) , it is very funny that one spid block another every 6 hours. it try to create the same SP at a very close time and therefore, msdb in trouble now.

but I just found out why sys.sysprocess, sp_who2 and sp_locks as a quick check..

this time very NORMAL that that SPID can be kill..... AHAA it shoud be.. the blocking one killed

"). Would be brilliant if it did time out (like a deadlock) but it hangs there trying to rollback."

In my case I don't see it will, let see as I plan for a restart.

"There have been a few "connect" items logged with MS but seem to get closed out without a workaround, without a fix (other than grab a dump and call MS). see http://connect.microsoft.com/SQLServer/feedback/details/433703/killed-rollback as an example."

yeah, from time to time we see this.


"It is often associated with external programs - like backups, OSQL, xp_cmdshell, Linked servers (and replication), DTC and third party apps. Especially susceptible are external apps creating a SP that starts a transaction but then the app abends (or is aborted). It seems that the SP almost owns that transaction and not the spid. But that is just conjecture on my part."

reasonable, this is exactly what it is to me...

"There are a few other strategies that can be tried, but when it comes to objects in tempdb it is almost a law unto itself. Some of those strategies are to run a checkpoint, login using DAC, if there are linked servers or replication / mirrors, then the DTC becomes interesting once again. If it has created a SP, then would be trying to drop that procedure as well. "

Any reading for this, I would iike to read. This is some kind of appliation archtecture, right?
Mark_wills,  Saurv and edtechdba:

please help to answer this :

https://www.experts-exchange.com/questions/28161454/SQL-server-2008-R2-Audit.html
Looks like that thread has been answered - or at least an Accepted solution.

If you have more questions about C2 Audit then maybe you need to raise a new question ?
that one is reopened, I can't see I can reopen it again!