Start Free Trial

asked on

What's the cause of my frequent 50% CPU usage in 1 of 3 instances of SQL 2000

Hi,
Every so often one of my instances of SQL Server 2000 frequently keeps using 45-50% CPU (Every 2-3 secs.)
After a reboot things go back to normal for a period of weeks of months, then starts again.

Windows 2000 Server SP4 (DHCP,DNS, SQL,File, Print)
SQL Server 2000 SP4 x 2 instance , MSDE x 1
XEON 2.4
4 GB RAM (Avg MB available = 775MB)
6 SCSI HDDs (RAID 5)

As far as I can tell from Performance Monitor, the machine is handling the workload easily.
SQL Profiler also doesn't report any reason for the frequent peaks. (That I can tell)
SEM doesn't show any significant locking. (As far as I know)

In the past I have been able to link this problem to NTDBSMGE.EXE from Pervasive.
Pervasive is still installed on the machine, but has no activity and no CPU usage.

Has any one got any idea's about this?
How can I tell which instance is the problem instance and how can I find and resolve the issue?

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi MrRobot,

Thanks for your comment.

I know the theory behind Performace monitor, just have little experience with the practical.
For starters, how do I know which sqlserver.exe goes with which instance of sql server.
If I look at the counters under Process I see sqlserver#1, sqlserver#2, sqlserver#3

Once I have that I could look at Process / %CPU for the instance, but that will only confirm what I can see in task manager i.e. it uses CPU in fits and bouts.

In short, I need instructions on how to find out which instance of SQL is giving me the trouble and how
I can find out what is causing the issue. Then I might need help resolving the issue.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi,

Is tasklist /svc also fpr Windows 2000 Server?
It isn't recognised.
How else can I find out which one is the naughty instance?

Mastoo,
I ran the script against each the Master db of the instances. One (MSDE) didn't return any results.
The other 2 returned one line with CPUPercent = 0.
I ran your script when CPU% was >50.

I also tried the old version, again it is only seeing the execution in QA and nothing more. Maybe I am in the wrong instance???

Cheers

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi, just got back in the office.....

Mastoo,
I did run the script against each of the instances.
sc queryex is not recognized.

Dreamguy,
I downloaded Process Explorer and now Microsoft Debugging Tools For Windows for the DbgHelp.dll.
It tells me it is needed when I click on the threads tab.
On the services tab I can see which instance I am looking at.
Looks like I might try some of those other tools as well soon. Thanks for the tip!

I have also setup the performance couter logs as per your specification. Alas, I do not have access to any ftp site. Could you maybe recommend any?

BUT....... Automatic Updates updated my server last night and rebooted. Now the problem has gone and I have no strange issues to monitor anymore. Users happy, me disgruntled.
I'm sure this problem will popup again in future as it seems to do periodically.
If I close this question now and add comments to it later (When the problem starts again) will you
guys get an email about the new post?

damn i'm disappointed. all these inputs from all us experts and the problem goes away after automatic updates and a server reboot? :(

ASKER

btw
tasklist /svc = tlist -s

Windows 2000 support tools

ASKER

It's happening again guys. :-( :-)
Users are breating down my neck again so time is short before i HAVE to restart to temporarly resolve this.

tlist - s revealed that it's the local instance that is causing the issue.

Task manage stats:
CPU Time : 10:35
Mem. usage : 798656K
Page Faults : 703252
VM Size : 878104K
Handles : 715
Threads : 48
I/O Reads : 17341100952
I/O Writes : 6597142203

I'll run my troubleshooting Profiler template for an hour and get back to you.

ASKER

mastoo:
I ran both your scripts multiple times against each database.
The most lines returned were 3 and the highest CPU% was 4.7 all the rest of the CPU% were under 4.
These results seem very acceptable to me, right?

ASKER

Performance monitor stats. (Only for the troublesome instance.)

%CPU Time : Avg. 88
%Priveledge : Avg. 82
%User time : Avg 2.5

Instance is using both Xeon cores.
Both boost SQL priority on windows & use windows NT fibres are NOT selected.

Could this issue be related to me using remote desktop and running queries?

Re-reading everything I need to ask something basic: When you're looking at high cpu in task manager have you looked at the process list and verified it is one of your instances of sqlservr.exe that is causing the high cpu? I'm not clear whether your cpu numbers posted are total usage or process usage. I was pretty confident if it is sqlservr.exe with the high usage my script would show you the sql causing it.

ASKER

>>When you're looking at high cpu in task manager have you looked at the process list and verified it is one of your instances of sqlservr.exe that is causing the high cpu?
Yes that's right.

My previous post was only for the instance of SQL that is giving me troubles.

Performance monitor has been running all day. Here are the latest averages.
% CPU = 130
% User Time = 99
% Priviledge Time = 42

I have also been running Profiler for most of the day and only just stopped and save the results to a table on the same instance as the one that's giving me hassles. (Local Instance)
... oh and I'am also running the Index Tuning wizard at the moment. (Both your scripts show Index Tuning Wizard running > 100% CPU.

ASKER

I look at this article ( http://support.microsoft.com/kb/314530 ) to help troubleshoot at SP time error
i'm getting.
No additional information was provided with DBCC TRACEON (7300, 3604).

DBCC SQLPERF(WAITSTATS) returns: (Shortend version, removed all 0 rows )
Wait Type Requests Wait Time Signal Wait Time
MISCELLANEOUS      1.0      0.0      0.0
LCK_M_S        537.0      12762.0      230.0
LCK_M_U        23.0      9796.0      15.0
LCK_M_X        490.0      2374.0      141.0
LCK_M_IS        2.0      31.0      0.0
LCK_M_IX        4.0      79.0      0.0
SLEEP        769417.0      7.6852006E+8      7.610743E+8
IO_COMPLETION      7662.0      52707.0      108.0
ASYNC_IO_COMPLETION      1272.0      126243.0      0.0
OLEDB        8681987.0      7.9850118E+8      3.9679987E+9
RESOURCE_QUEUE      536188.0      2.2438339E+9      7.4988486E+8
ASYNC_DISKPOOL_LOCK17505.0      218.0      0.0
PIPELINE_INDEX_STAT      1.0      0.0      0.0
WRITELOG        194037.0      2950177.0      28557.0
LOGBUFFER        73.0      892.0      0.0
CMEMTHREAD      429.0      125.0      0.0
CXPACKET        33668.0      141220.0      11338.0
PAGESUPP        10.0      0.0      0.0
WAITFOR        44.0      44514.0      44514.0
LATCH_SH        3.0      63.0      0.0
LATCH_EX        850.0      404.0      309.0
PAGELATCH_SH      38043.0      1186.0      1092.0
PAGELATCH_UP      15556.0      2393.0      2192.0
PAGELATCH_EX      18881.0      1531.0      1422.0
PAGEIOLATCH_SH      6896.0      59743.0      127.0
PAGEIOLATCH_UP      7452.0      124414.0      156.0
PAGEIOLATCH_EX      3793.0      70841.0      15.0
NETWORKIO        717393.0      300259.0      0.0
Total        1.1052217E+7      3.8147571E+9      5.4790482E+9

SPID 69 Has 2 TAB locks:
master.dbo.spt_values TAB Sch-S GRANT Xact
tempdb.dbo.##lockinfo69 TAB X GRANT Xact
Last TSQL command batch run : dbcc inputbuffer(69)

In SEM Process info I see:
PID 69
User Me !!
DB Master
Status Runnable
Open Transactions 2
Command SELECT INTO
Application MS SQLEM
Waiting type Not waiting
CPU 935

AND

PID 52
User NT System
DB msdb
Status Sleeping
Open Transactions 0
Command AWAITING COMMAND
Application SQLAgent - Alert Engine
Waiting type Not waiting
CPU 567719 (565 000 higher than any other)

ASKER

btw,

We only have 10-15 users working on this server at any one time.

My own (Amature) VB applicaton also runs on this instance. ADO connections are made and closed 1000+ times a day by each user.

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

>> Is parallel query execution disabled on the (3) instances? if not, disable that.
Not sure, where can I find this setting?

The combined max memory of all instances did indeed exceed 3.7GB. Task manager reported 450MB available. I lowered the max on each instance and now have 950MB available.

I have multiple scheduled backup tasks the run troughout the day. The CPU hogging also occurs between these jobs.

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Okay. I have changed the settings on all 2 instances to only use CPU 1 and not CPU 0.
Only time will tell if this helps resolve this is permanently as it only occurs some weeks after rebooting.

Is there any possiblity that this is linked to me using RDP & SEM remotely?
Do I need to worry about those TAB locks or the 2 open transactions I mentioned earlier?

btw, thanks for taking a look at this question. Very much appreciated !! :-)

Guy Hengel [angelIII / a3]

>Okay. I have changed the settings on all 2 instances to only use CPU 1 and not CPU 0.
that's actually the wrong setting :-(
note also that these settings only get effective after restarting the instance(s).

ASKER

Oops.
I looked again, this time with my eyes open. Found it!

I'll restart now. This would temporaraliy resolve the problem anyway, lets hope it resolves it permanently this time.

ASKER

Things are looking good, but I'll hold off awarding points for a while longer.

ASKER

Problem's back :-(

I have no new information. Everything looks the same as last time. Grrrr

I think go back to where we left off (at the time the script accurately told you it was the index tuning wizard using the cpu). So, start with looking at task manager and verify that it is one of your sql server instances using most of the cpu. Run that script on that instance and see what it says.

ASKER

Hi,

I just ran both your scripts 20 times each. the highest cpu % was 3.2.
I ran the script mostly against a user database that is used more than any other database.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi MrRobot,

How are ya?

Yes I am watching the sql server counters on prefmon. I can see that the CPU Priveledge % grows with each spike. I just don't know why and how I can find out.

I would like to try taking some of the other User DBs off line. How can I kick everyone off a DB so that I can take it off line?

I am already doing a lot of overtime to get other things done, so I would really, really perfer not to re-install SQL Server.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi,

Ok I'm running System Monitor at the moment. Looking at locking.
Lock request / sec are related to the cpu spikes but
Lock Timeout / sec, Lock Wait time, Lock Waits /sec & number of deadlocks are all 0.

As far as full scans are concerned, no correlation to the CPU spikes (Almost no scans happening at all). Index Searches / sec on the other hand are happening on each CPU spike.
Probe & range scans are at a bare minimum.

If I am interpreting this data correctly, the I either have too many indexes, bad indexes or fragemented indexes. All of these I find strange as I have run the Index tuning wizard 2-3 times this year.

I just run DBCC SHOWCONTIG example E. from BOL
E. Use DBCC SHOWCONTIG and DBCC INDEXDEFRAG to defragment the indexes in a database
This example shows a simple way to defragment all indexes in a database that is fragmented above a declared threshold. (30.0)

Now Index Seaches / sec are continuously high. Avg. 265 / sec, Max 4025 /sec. :-(

ASKER

Hi,

I had to restart the service to resolve this temporarily again.
I left perfmon running with the same tokens.
Index Searches still peak regulary, but no loss of performance.
I no longer think Index Searches have anything to do with this issue.

Any idea's?

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi,

Profiler hasn't revealed much. But I addmit, I'm only only a novice in using it. I have created some traces based on what I have found on the net. Interpreting is a another skill all together. From what I have seen there are some sp's that are taking a long time (High reads & duration) but only when things are running slow, otherwise things look okay. Maybe you have some tips for me?

As far as I know activity monitor is a tool in SQL 2005 and doesn't exist in SQL 200. Please corect me if I'm wrong.

I will open a support case with McAfee. Who knows, maybe that's causing the trouble. ProtectionPilot does run a lot of auditing & logging sps. These are the ones I was refering to above.

ASKER

These MS articles explain situations that are similar to what I'm experiencing.
http://support.microsoft.com/kb/323213
http://support.microsoft.com/kb/835864

Unforunately their resolution is to install the latest service pack, which already is.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

also try sql batch completed events.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

I'm running profiler at the moment. I'll use the trace as a baseline.
I had to appease the natives by restarting the service recently, or else they would have linched me.

Everything is running fine at the moment. When things slow down again i'll run the traces as you suggested.
In the meantime I'll download activity monitor and experiment a bit.

Thanks for the help!

ASKER

about that Acivity Monitor.
Googling gave me a lot of "Big Brother" programs.

From this MS site:
http://www.microsoft.com/downloads/details.aspx?FamilyID=d09c1d60-a13c-4479-9b91-9e8b9d835cdc&displaylang=en

I downloaded this:
http://www.microsoft.com/downloads/details.aspx?FamilyId=79F151C7-4D98-4C2B-BF72-EC2B4AE69191&displaylang=en

Please let me know if this is the one you were refering to.

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

Hi,

This issue has come back a few times since I last posted.
Just now I tried to see what was happening in ProcessMonitor.
I got the dreaded Blue Screen of Death upon doubleclicking the processmonitor exe. FLTMGR.SYS being the culprit.

I will contiue to monitor all events from PID 1388 (Local SQL instance) with a duration > 1.
Because I just had to restart, I'll have to wait for the issue to raise its head again.

My applogies from the consistantly slow replies. Even though this is becomming a major problem, it is still not in my top 5 priorities. My boss keeps me occupied with all sorts of other things he deems to be more important......
Please bare with me, I still need your wisdom, guidance and especially your patience.

ASKER

It didn't take longer to raise its head. :-(

SEM Locks revealed:
Object LockType Mode Status Owner Index
TestDB DB S Granted Sess
MutDet TAB IX Granted Sess
MutDet Page IX Granted Sess PK_MutDet_7e6cc920
MutDet Key X Granted Sess PK_MutDet_7e6cc920

The testdb is a restored (renamed) backup from the actual DB on the same instance.
Even though, I killed the application on my client machine, these still show up in SEM on the server, but not in SEM on the client.

........ Just got interupte by the big boss himself........ I now have other priorities than server crashes and poor performance...... :-( grrrrr.
I'll still be listening though.

ASKER

How big an impact on performace should sp_setapprole have?

I am monitoring profiler duration > 50. 99% displayed is sp_setapprole, duration 150 - 210, cpu = 0.
Is this normal? Could it have something to do with the cpu usage?

My application ADO connection uses an application role. the ADO Connection is closed after every tsql.
This means that sp_setapprole is used tens of thousand times a day.
Is this normal practice, or far from best practice?

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial