Solved

Analyzing a SQL trace file containing a Deadlock

Posted on 2010-08-23
16
1,007 Views
Last Modified: 2012-05-10
Hi,

I have a job scheduled to run every 3 hours for some data copying between databases, and every 02:00am run it is also running some extra calculations.

The problem is that every couple weeks, the 2am run will fail for no apparent reason citing a deadlock:

Executed as user: SQLserver\Administrator. Generated flat file - OK [SQLSTATE 01000] (Message 0) COPY SUCCEEDED [SQLSTATE 01000] (Message 0) Transaction (Process ID 77) was deadlocked on thread | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.


So I went ahead and created a profiler job that will run every night at 01:55am for 15 minutes and hopefully catch the deadlock.
I now have a trace with EventClass=Deadlock Graph and I have exported that event to a XDL file and attached it here as a .txt file.

The problem is that I have no idea where to go from here with analyzing the trace file and understanding why the deadlock is happening. any help would be great.

Thanks 2.xdl.txt
0
Comment
Question by:shootbox
16 Comments
 

Author Comment

by:shootbox
Comment Utility
Umm, am I totally off track here? I never messed with debugging Deadlocks before, so I might be doing something totally wrong. any help or advice would be greatly appreciated.

Thanks,
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
I'll start by saying I'm not a SQL Server expert, but I am very experienced with Oracle so some of what I know may apply.  In Oracle, deadlocks happen when two different applications (or different parts of the same application) lock two (or more) different tables, but in the opposite order.  For example, one screen (or program) may lock the order_headers table first, then attempt to lock the order_lines table.  Another screen (or program) locks the order_lines table first, then attempts to lock the order_headers table.  This causes a deadlock.  Oracle will give one of these screens (or programs) a deadlock error, then kill it allowing the other screen (or program) to finish without error.

Is it possible that occasionally when your data copy program runs, some other program (or user?) is active in the system and has locked one (or more) tables in a different order than your program processes them?
0
 

Author Comment

by:shootbox
Comment Utility
Hi, thanks for you reply. well, I think it would be pretty obvious just from the name of this issue and from the actual events and errors, that a Deadlock happens when two parties try to access the same resource and one of them locked it for its own use before that.
The thing is, this is happening in a 2AM run, so there are no active users at that time.
Moreover, the procedures and jobs are constant, they always happen exactly at the same time and every night, but only about once every 10 days it actually fails with a deadlock.

I have already went through all the jobs that might be scheduled for this time, but I can't see anything that works alongside the 2AM run, and if it did, I would probably get more locks.
This is why after checking the actual configuration, I went into debugging it and collected the trace. if the trace is useless now, then I really have no idea how to continue. but I don't see a reason why a trace would be useless...
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
What's the average duration time for that job that runs every 3 hours?
It's also runs at 2:00AM? If so, try to start the other task a little bit after 2:00AM so you may avoid concurrent transactions.
If you can't change schedules then you can use Locking Hints for avoiding deadlocks: http://msdn.microsoft.com/en-us/library/aa213026(SQL.80).aspx

Good luck
0
 

Author Comment

by:shootbox
Comment Utility
The average runtime is about an hour. when it fails, it fails within 3-4 minutes.
It's the same job all around man... it's just scheduled to run every 3 hours. there are no different jobs...

Can I get some explanation regarding that link and how it can help me, I can't really make anything out of it.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
Oooh, it's the same job. Sorry I missunderstood that.
Means that the problem only occurs in 2:00AM execution. You have any other job that runs at same hour? Maybe some import/export or maintenance plan (Reindex, Backup, ...)?

I can help you better if you tells me what the job is doing.
0
 

Author Comment

by:shootbox
Comment Utility
I checked all possible options that are running on the database at that time. there was one job that runs every 5 minutes 24 hours a day, I added code to that job's procedure to not run at 02:00am, but the problem still persists.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
Then you better run a SQL Profiler at that time to see what's happening.

Good luck
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:shootbox
Comment Utility
VMontalvao, wow, did you even read my message?
I did run it, and the output is attached in this thread.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
You're right shootbox.
It was passed more than 1 month since your last answer that I didn't read again your question. My fault.
But you still didn't answer my question of what's job doing? You can also post here the job code so we can help you better.

Cheers
0
 

Author Comment

by:shootbox
Comment Utility
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524

Can anyone approve / deny or has ever dealt with this issue?
I'm scared that running SQL with these parameters will effect performance.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
You can do it. It's very pacific option. The only problem is that your SQL Server log file will grow more than usual. But after you have some deadlocks information you can remove the trace flag.
0
 
LVL 45

Expert Comment

by:Vitor Montalvão
Comment Utility
Shootbox, what was the solution for this problem?
0
 

Accepted Solution

by:
shootbox earned 0 total points
Comment Utility
The one I marked as the solution obviously:

I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524
0
 
LVL 75

Expert Comment

by:Anthony Perkins
Comment Utility
>>The one I marked as the solution obviously:<<
I realize you are new here, just try and be polite and you will be amazed at the help you get.

Also, VMontalvao has spent a considerable time on this thread and he/she deserves a little courtesy to the point that you may want to re-read the EE Guidelines regarding closing questions here:
http://www.experts-exchange.com/help.jsp#hs=29&hi=407
And specifically this section:
Accept your own comment, and award points to Experts for their assistance
If you answered your own question, but wish to award points to Experts for their attempts to help, use this option.

Thanks,
Anthony
0
 

Author Closing Comment

by:shootbox
Comment Utility
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to shrink a transaction log file down to a reasonable size.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now