Link to home
Start Free TrialLog in
Avatar of shootbox
shootboxFlag for Italy

asked on

Analyzing a SQL trace file containing a Deadlock

Hi,

I have a job scheduled to run every 3 hours for some data copying between databases, and every 02:00am run it is also running some extra calculations.

The problem is that every couple weeks, the 2am run will fail for no apparent reason citing a deadlock:

Executed as user: SQLserver\Administrator. Generated flat file - OK [SQLSTATE 01000] (Message 0) COPY SUCCEEDED [SQLSTATE 01000] (Message 0) Transaction (Process ID 77) was deadlocked on thread | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.


So I went ahead and created a profiler job that will run every night at 01:55am for 15 minutes and hopefully catch the deadlock.
I now have a trace with EventClass=Deadlock Graph and I have exported that event to a XDL file and attached it here as a .txt file.

The problem is that I have no idea where to go from here with analyzing the trace file and understanding why the deadlock is happening. any help would be great.

Thanks 2.xdl.txt
Avatar of shootbox
shootbox
Flag of Italy image

ASKER

Umm, am I totally off track here? I never messed with debugging Deadlocks before, so I might be doing something totally wrong. any help or advice would be greatly appreciated.

Thanks,
Avatar of Mark Geerlings
I'll start by saying I'm not a SQL Server expert, but I am very experienced with Oracle so some of what I know may apply.  In Oracle, deadlocks happen when two different applications (or different parts of the same application) lock two (or more) different tables, but in the opposite order.  For example, one screen (or program) may lock the order_headers table first, then attempt to lock the order_lines table.  Another screen (or program) locks the order_lines table first, then attempts to lock the order_headers table.  This causes a deadlock.  Oracle will give one of these screens (or programs) a deadlock error, then kill it allowing the other screen (or program) to finish without error.

Is it possible that occasionally when your data copy program runs, some other program (or user?) is active in the system and has locked one (or more) tables in a different order than your program processes them?
Hi, thanks for you reply. well, I think it would be pretty obvious just from the name of this issue and from the actual events and errors, that a Deadlock happens when two parties try to access the same resource and one of them locked it for its own use before that.
The thing is, this is happening in a 2AM run, so there are no active users at that time.
Moreover, the procedures and jobs are constant, they always happen exactly at the same time and every night, but only about once every 10 days it actually fails with a deadlock.

I have already went through all the jobs that might be scheduled for this time, but I can't see anything that works alongside the 2AM run, and if it did, I would probably get more locks.
This is why after checking the actual configuration, I went into debugging it and collected the trace. if the trace is useless now, then I really have no idea how to continue. but I don't see a reason why a trace would be useless...
What's the average duration time for that job that runs every 3 hours?
It's also runs at 2:00AM? If so, try to start the other task a little bit after 2:00AM so you may avoid concurrent transactions.
If you can't change schedules then you can use Locking Hints for avoiding deadlocks: http://msdn.microsoft.com/en-us/library/aa213026(SQL.80).aspx

Good luck
The average runtime is about an hour. when it fails, it fails within 3-4 minutes.
It's the same job all around man... it's just scheduled to run every 3 hours. there are no different jobs...

Can I get some explanation regarding that link and how it can help me, I can't really make anything out of it.
Oooh, it's the same job. Sorry I missunderstood that.
Means that the problem only occurs in 2:00AM execution. You have any other job that runs at same hour? Maybe some import/export or maintenance plan (Reindex, Backup, ...)?

I can help you better if you tells me what the job is doing.
I checked all possible options that are running on the database at that time. there was one job that runs every 5 minutes 24 hours a day, I added code to that job's procedure to not run at 02:00am, but the problem still persists.
Then you better run a SQL Profiler at that time to see what's happening.

Good luck
VMontalvao, wow, did you even read my message?
I did run it, and the output is attached in this thread.
You're right shootbox.
It was passed more than 1 month since your last answer that I didn't read again your question. My fault.
But you still didn't answer my question of what's job doing? You can also post here the job code so we can help you better.

Cheers
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524

Can anyone approve / deny or has ever dealt with this issue?
I'm scared that running SQL with these parameters will effect performance.
You can do it. It's very pacific option. The only problem is that your SQL Server log file will grow more than usual. But after you have some deadlocks information you can remove the trace flag.
Shootbox, what was the solution for this problem?
ASKER CERTIFIED SOLUTION
Avatar of shootbox
shootbox
Flag of Italy image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>The one I marked as the solution obviously:<<
I realize you are new here, just try and be polite and you will be amazed at the help you get.

Also, VMontalvao has spent a considerable time on this thread and he/she deserves a little courtesy to the point that you may want to re-read the EE Guidelines regarding closing questions here:
https://www.experts-exchange.com/help.jsp#hs=29&hi=407
And specifically this section:
Accept your own comment, and award points to Experts for their assistance
If you answered your own question, but wish to award points to Experts for their attempts to help, use this option.

Thanks,
Anthony
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524