1intelsys
asked on
All schedulers on Node 0 appear deadlocked due to a large number of worker threads waiting on LCK_M_S.
Hey everyone,
We've been having an issue with our database for the past week that has been making our sharepoint site virtually unavailable around the same time every night. I've exhausted everything I can think of that would be causing this, so I turn to you for help.
We're running a small sharepoint server farm, 3 servers. One MOSS 2007 Front-end, one SQL Server (2005 enterprise), and a third server to handle crawls and service searches. These are all VM's run through VMWare. All three servers are configured pretty much the same, with a few minor changes tailored to each servers role. The basics are this: 4 vCPU's, 16GB of RAM (except the crawl/search server which has 6), and Windows Server 2003 sp2 enterprise.
First, the we receive the error:
09/11/2010 05:40:14,Server,Unknown,Al l schedulers on Node 0 appear deadlocked due to a large number of worker threads waiting on LCK_M_S. Process Utilization 1%.
This goes on for anywhere between 3-7 hours, until we get this error:
09/11/2010 06:13:51,spid116,Unknown,T he instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance<c/> or to check for long-running transactions.
at which time the servers are restarted by our night crew. When the server comes back up it goes through the CHECKDB for the databases and begins a recovery of our content database.
Attached is a trace log from Sql Server Profiler. I'm baffled and don't know where to go from here. We've completely turned off the crawl/search functions, ran Database Engine Tuning Analyzer, set a memory limit (10GB) for the SQL engine, and anything else we could think of or could find online to no avail. Any ideas?
Trace.xml
We've been having an issue with our database for the past week that has been making our sharepoint site virtually unavailable around the same time every night. I've exhausted everything I can think of that would be causing this, so I turn to you for help.
We're running a small sharepoint server farm, 3 servers. One MOSS 2007 Front-end, one SQL Server (2005 enterprise), and a third server to handle crawls and service searches. These are all VM's run through VMWare. All three servers are configured pretty much the same, with a few minor changes tailored to each servers role. The basics are this: 4 vCPU's, 16GB of RAM (except the crawl/search server which has 6), and Windows Server 2003 sp2 enterprise.
First, the we receive the error:
09/11/2010 05:40:14,Server,Unknown,Al
This goes on for anywhere between 3-7 hours, until we get this error:
09/11/2010 06:13:51,spid116,Unknown,T
at which time the servers are restarted by our night crew. When the server comes back up it goes through the CHECKDB for the databases and begins a recovery of our content database.
Attached is a trace log from Sql Server Profiler. I'm baffled and don't know where to go from here. We've completely turned off the crawl/search functions, ran Database Engine Tuning Analyzer, set a memory limit (10GB) for the SQL engine, and anything else we could think of or could find online to no avail. Any ideas?
Trace.xml
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Hmm, i though SQL 2005 default setting is 0 and only on specific configurations increasing it is helpful
http://technet.microsoft.com/en-us/library/ms187024(SQL.90).aspx
(I am only commenting to this, because I encounter SQL deadlocks on various operations and I would be very courious to know if doing this change would help)
http://technet.microsoft.com/en-us/library/ms187024(SQL.90).aspx
(I am only commenting to this, because I encounter SQL deadlocks on various operations and I would be very courious to know if doing this change would help)
ASKER
My knowledge of SQL Server 2005 is pretty basic, but as soon as I changed that setting it resolved my problem of the server going down each night, and I stopped seeing the error I mentioned in the subject line. The article you referenced stated that a value of 0 would automatically configure the max worker thread value of 256, however when I changed it manually to 255 it fixed my problem. All I can gather from that is that for some reason the value of 0 was no longer defaulting to 256, giving me a much lower value for that setting.
Thank you; this is good to know; if I may ask only one other question: what is the moss version you are running?
try run profiler for deadlock.
log following event
(under Locks group)
- Deadlock graph
- Lock: Deadlock
- Lock: Deadlock Chain
pick following item
SPID, ObjectID, StartTime, TextData, DatabaseName, ApplicationName, ServerName, BinaryData, LoginName
if deadlock happen, profiler should got something appear.
then you can right click at "Deadlock Graph" and choose to "Extract Event Data..."
which is XML file, u can try to read it to see what is the command cause deadlock