Solved

Analyzing a SQL trace file containing a Deadlock

Posted on 2010-08-23
16
1,105 Views
Last Modified: 2012-05-10
Hi,

I have a job scheduled to run every 3 hours for some data copying between databases, and every 02:00am run it is also running some extra calculations.

The problem is that every couple weeks, the 2am run will fail for no apparent reason citing a deadlock:

Executed as user: SQLserver\Administrator. Generated flat file - OK [SQLSTATE 01000] (Message 0) COPY SUCCEEDED [SQLSTATE 01000] (Message 0) Transaction (Process ID 77) was deadlocked on thread | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.


So I went ahead and created a profiler job that will run every night at 01:55am for 15 minutes and hopefully catch the deadlock.
I now have a trace with EventClass=Deadlock Graph and I have exported that event to a XDL file and attached it here as a .txt file.

The problem is that I have no idea where to go from here with analyzing the trace file and understanding why the deadlock is happening. any help would be great.

Thanks 2.xdl.txt
0
Comment
Question by:shootbox
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
16 Comments
 

Author Comment

by:shootbox
ID: 33511007
Umm, am I totally off track here? I never messed with debugging Deadlocks before, so I might be doing something totally wrong. any help or advice would be greatly appreciated.

Thanks,
0
 
LVL 35

Expert Comment

by:Mark Geerlings
ID: 33511320
I'll start by saying I'm not a SQL Server expert, but I am very experienced with Oracle so some of what I know may apply.  In Oracle, deadlocks happen when two different applications (or different parts of the same application) lock two (or more) different tables, but in the opposite order.  For example, one screen (or program) may lock the order_headers table first, then attempt to lock the order_lines table.  Another screen (or program) locks the order_lines table first, then attempts to lock the order_headers table.  This causes a deadlock.  Oracle will give one of these screens (or programs) a deadlock error, then kill it allowing the other screen (or program) to finish without error.

Is it possible that occasionally when your data copy program runs, some other program (or user?) is active in the system and has locked one (or more) tables in a different order than your program processes them?
0
 

Author Comment

by:shootbox
ID: 33518850
Hi, thanks for you reply. well, I think it would be pretty obvious just from the name of this issue and from the actual events and errors, that a Deadlock happens when two parties try to access the same resource and one of them locked it for its own use before that.
The thing is, this is happening in a 2AM run, so there are no active users at that time.
Moreover, the procedures and jobs are constant, they always happen exactly at the same time and every night, but only about once every 10 days it actually fails with a deadlock.

I have already went through all the jobs that might be scheduled for this time, but I can't see anything that works alongside the 2AM run, and if it did, I would probably get more locks.
This is why after checking the actual configuration, I went into debugging it and collected the trace. if the trace is useless now, then I really have no idea how to continue. but I don't see a reason why a trace would be useless...
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 33523145
What's the average duration time for that job that runs every 3 hours?
It's also runs at 2:00AM? If so, try to start the other task a little bit after 2:00AM so you may avoid concurrent transactions.
If you can't change schedules then you can use Locking Hints for avoiding deadlocks: http://msdn.microsoft.com/en-us/library/aa213026(SQL.80).aspx

Good luck
0
 

Author Comment

by:shootbox
ID: 33529742
The average runtime is about an hour. when it fails, it fails within 3-4 minutes.
It's the same job all around man... it's just scheduled to run every 3 hours. there are no different jobs...

Can I get some explanation regarding that link and how it can help me, I can't really make anything out of it.
0
 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 33529802
Oooh, it's the same job. Sorry I missunderstood that.
Means that the problem only occurs in 2:00AM execution. You have any other job that runs at same hour? Maybe some import/export or maintenance plan (Reindex, Backup, ...)?

I can help you better if you tells me what the job is doing.
0
 

Author Comment

by:shootbox
ID: 33820901
I checked all possible options that are running on the database at that time. there was one job that runs every 5 minutes 24 hours a day, I added code to that job's procedure to not run at 02:00am, but the problem still persists.
0
 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 33820926
Then you better run a SQL Profiler at that time to see what's happening.

Good luck
0
 

Author Comment

by:shootbox
ID: 33828718
VMontalvao, wow, did you even read my message?
I did run it, and the output is attached in this thread.
0
 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 33838587
You're right shootbox.
It was passed more than 1 month since your last answer that I didn't read again your question. My fault.
But you still didn't answer my question of what's job doing? You can also post here the job code so we can help you better.

Cheers
0
 

Author Comment

by:shootbox
ID: 33901859
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524

Can anyone approve / deny or has ever dealt with this issue?
I'm scared that running SQL with these parameters will effect performance.
0
 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 33902317
You can do it. It's very pacific option. The only problem is that your SQL Server log file will grow more than usual. But after you have some deadlocks information you can remove the trace flag.
0
 
LVL 50

Expert Comment

by:Vitor Montalvão
ID: 34051466
Shootbox, what was the solution for this problem?
0
 

Accepted Solution

by:
shootbox earned 0 total points
ID: 34051472
The one I marked as the solution obviously:

I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 34051530
>>The one I marked as the solution obviously:<<
I realize you are new here, just try and be polite and you will be amazed at the help you get.

Also, VMontalvao has spent a considerable time on this thread and he/she deserves a little courtesy to the point that you may want to re-read the EE Guidelines regarding closing questions here:
http://www.experts-exchange.com/help.jsp#hs=29&hi=407
And specifically this section:
Accept your own comment, and award points to Experts for their assistance
If you answered your own question, but wish to award points to Experts for their attempts to help, use this option.

Thanks,
Anthony
0
 

Author Closing Comment

by:shootbox
ID: 34082683
I think I found the missing link as to why I can't see which procedure actually made the deadlock, as described in the following kb article:
http://support.microsoft.com/kb/832524
0

Featured Post

Increase Agility with Enabled Toolchains

Connect your existing build, deployment, management, monitoring, and collaboration platforms. From Puppet to Chef, HipChat to Slack, ServiceNow to JIRA, Splunk to New Relic and beyond, hand off data between systems to engage the right people.

Connect with xMatters.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question