Link to home
Start Free TrialLog in
Avatar of DennisPost
DennisPostFlag for Netherlands

asked on

What's the cause of my frequent 50% CPU usage in 1 of 3 instances of SQL 2000

Hi,
Every so often one of my instances of SQL Server 2000 frequently keeps using 45-50% CPU (Every 2-3 secs.)
After a reboot things go back to normal for a period of weeks of months, then starts again.

Windows 2000 Server SP4 (DHCP,DNS, SQL,File, Print)
SQL Server 2000 SP4 x 2 instance , MSDE x 1
XEON 2.4
4 GB RAM (Avg MB available = 775MB)
6 SCSI HDDs (RAID 5)

As far as I can tell from Performance Monitor, the machine is handling the workload easily.
SQL Profiler also doesn't report any reason for the frequent peaks. (That I can tell)
SEM doesn't show any significant locking. (As far as I know)

In the past I have been able to link this problem to NTDBSMGE.EXE from Pervasive.
Pervasive is still installed on the machine, but has no activity and no CPU usage.

Has any one got any idea's about this?
How can I tell which instance is the problem instance and how can I find and resolve the issue?
SOLUTION
Avatar of MrRobot
MrRobot
Flag of Türkiye image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DennisPost

ASKER

Hi MrRobot,

Thanks for your comment.

I know the theory behind Performace monitor, just have little experience with the practical.
For starters, how do I know which sqlserver.exe goes with which instance of sql server.
If I look at the counters under Process I see sqlserver#1, sqlserver#2, sqlserver#3

Once I have that I could look at Process / %CPU for the instance, but that will only confirm what I can see in task manager i.e. it uses CPU in fits and bouts.

In short, I need instructions on how to find out which instance of SQL is giving me the trouble and how
I can find out what is causing the issue. Then I might need help resolving the issue.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Is tasklist /svc also fpr Windows 2000 Server?
It isn't recognised.
How else can I find out which one is the naughty instance?

Mastoo,
I ran the script against each the Master db of the instances. One (MSDE) didn't return any results.
The other 2 returned  one line with CPUPercent = 0.
I ran your script when CPU% was >50.

I also tried the old version, again it is only seeing the execution in QA and nothing more. Maybe I am in the wrong instance???

Cheers

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi, just got back in the office.....

Mastoo,
I did run the script against each of the instances.
sc queryex is not recognized.

Dreamguy,
I downloaded Process Explorer and now Microsoft Debugging Tools For Windows for the DbgHelp.dll.
It tells me it is needed when I click on the threads tab.
On the services tab I can see which instance I am looking at.
Looks like I might try some of those other tools as well soon. Thanks for the tip!

I have also setup the performance couter logs as per your specification. Alas, I do not have access to any ftp site. Could you maybe recommend any?


BUT....... Automatic Updates updated my server last night and rebooted. Now the problem has gone and I have no strange issues to monitor anymore. Users happy, me disgruntled.
I'm sure this problem will popup again in future as it seems to do periodically.
If I close this question now and add comments to it later (When the problem starts again) will you
guys get an email about the new post?


Avatar of dreamyguy
dreamyguy

damn i'm disappointed. all these inputs from all us experts and the problem goes away after automatic updates and a server reboot? :(
btw
tasklist /svc = tlist -s

Windows 2000 support tools
It's happening again guys. :-(    :-)
Users are breating down my neck again so time is short before i HAVE to restart to temporarly resolve this.

tlist - s revealed that it's the local instance that is causing the issue.

Task manage stats:
CPU Time : 10:35
Mem. usage : 798656K
Page Faults : 703252
VM Size : 878104K
Handles : 715
Threads : 48
I/O Reads : 17341100952
I/O Writes : 6597142203

I'll run my troubleshooting Profiler template for an hour and get back to you.
mastoo:
I ran both your scripts multiple times against each database.
The most lines returned were 3 and the highest CPU% was 4.7 all the rest of the CPU% were under 4.
These results seem very acceptable to me, right?
Performance monitor stats. (Only for the troublesome instance.)

%CPU Time : Avg. 88
%Priveledge : Avg. 82
%User time : Avg 2.5

Instance is using both Xeon cores.
Both boost SQL priority on windows & use windows NT fibres are NOT selected.

Could this issue be related to me using remote desktop and running queries?
Re-reading everything I need to ask something basic: When you're looking at high cpu in task manager have you looked at the process list and verified it is one of your instances of sqlservr.exe that is causing the high cpu?  I'm not clear whether your cpu numbers posted are total usage or process usage.  I was pretty confident if it is sqlservr.exe with the high usage my script would show you the sql causing it.
>>When you're looking at high cpu in task manager have you looked at the process list and verified it is one of your instances of sqlservr.exe that is causing the high cpu?
Yes that's right.

My previous post was only for the instance of SQL that is giving me troubles.

Performance monitor has been running all day. Here are the latest averages.
% CPU = 130
% User Time = 99
% Priviledge Time = 42

I have also been running Profiler for most of the day and only just stopped and save the results to a table on the same instance as the one that's giving me hassles. (Local Instance)
... oh and I'am also running the Index Tuning wizard at the moment. (Both your scripts show Index Tuning Wizard running  > 100% CPU.

I look at this article ( http://support.microsoft.com/kb/314530 ) to help troubleshoot at SP time error
i'm getting.
No additional information was provided with DBCC TRACEON (7300, 3604).

DBCC SQLPERF(WAITSTATS) returns: (Shortend version, removed all 0 rows )
Wait Type                          Requests      Wait Time      Signal Wait Time
MISCELLANEOUS      1.0      0.0      0.0
LCK_M_S                            537.0      12762.0      230.0
LCK_M_U                            23.0      9796.0      15.0
LCK_M_X                            490.0      2374.0      141.0
LCK_M_IS                            2.0      31.0      0.0
LCK_M_IX                            4.0      79.0      0.0
SLEEP                           769417.0      7.6852006E+8      7.610743E+8
IO_COMPLETION      7662.0      52707.0      108.0
ASYNC_IO_COMPLETION      1272.0      126243.0      0.0
OLEDB                           8681987.0      7.9850118E+8      3.9679987E+9
RESOURCE_QUEUE      536188.0      2.2438339E+9      7.4988486E+8
ASYNC_DISKPOOL_LOCK17505.0      218.0      0.0
PIPELINE_INDEX_STAT      1.0      0.0      0.0
WRITELOG                           194037.0      2950177.0      28557.0
LOGBUFFER                           73.0      892.0      0.0
CMEMTHREAD      429.0      125.0      0.0
CXPACKET                           33668.0      141220.0      11338.0
PAGESUPP                           10.0      0.0      0.0
WAITFOR                           44.0      44514.0      44514.0
LATCH_SH                           3.0      63.0      0.0
LATCH_EX                           850.0      404.0      309.0
PAGELATCH_SH      38043.0      1186.0      1092.0
PAGELATCH_UP      15556.0      2393.0      2192.0
PAGELATCH_EX      18881.0      1531.0      1422.0
PAGEIOLATCH_SH      6896.0      59743.0      127.0
PAGEIOLATCH_UP      7452.0      124414.0      156.0
PAGEIOLATCH_EX      3793.0      70841.0      15.0
NETWORKIO                           717393.0      300259.0      0.0
Total                           1.1052217E+7      3.8147571E+9      5.4790482E+9

SPID 69 Has 2 TAB locks:
master.dbo.spt_values          TAB            Sch-S              GRANT                           Xact
tempdb.dbo.##lockinfo69       TAB            X                      GRANT                           Xact
Last TSQL command batch run : dbcc inputbuffer(69)

In SEM Process info I see:
PID                           69
User                         Me !!
DB                            Master
Status                      Runnable
Open Transactions  2
Command                 SELECT INTO
Application               MS SQLEM
Waiting type             Not waiting
CPU                          935

AND

PID                           52
User                         NT System
DB                            msdb
Status                      Sleeping
Open Transactions  0
Command                 AWAITING COMMAND
Application               SQLAgent - Alert Engine
Waiting type             Not waiting
CPU                          567719 (565 000 higher than any other)
btw,

We only have 10-15 users working on this server at any one time.

My own (Amature) VB applicaton also runs on this instance. ADO connections are made and closed 1000+ times a day by each user.
SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>> Is parallel query execution disabled on the (3) instances? if not, disable that.
Not sure, where can I find this setting?

The combined max memory of all instances did indeed exceed 3.7GB. Task manager reported 450MB available. I lowered the max on each instance and now have 950MB available.

I have multiple scheduled backup tasks the run troughout the day. The CPU hogging also occurs between these jobs.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Okay. I have changed the settings on all 2 instances to only use CPU 1 and not CPU 0.
Only time will tell if this helps resolve this is permanently as it only occurs some weeks after rebooting.

Is there any possiblity that this is linked to me using RDP & SEM remotely?
Do I need to worry about those TAB locks or the 2 open transactions I mentioned earlier?

btw, thanks for taking a look at this question. Very much appreciated !! :-)
>Okay. I have changed the settings on all 2 instances to only use CPU 1 and not CPU 0.
that's actually the wrong setting :-(
note also that these settings only get effective after restarting the instance(s).
Oops.
I looked again, this time with my eyes open. Found it!

I'll restart now. This would temporaraliy resolve the problem anyway, lets hope it resolves it permanently this time.

Things are looking good, but I'll hold off awarding points for a while longer.
Problem's back :-(

I have no new information. Everything looks the same as last time. Grrrr
I think go back to where we left off (at the time the script accurately told you it was the index tuning wizard using the cpu).  So, start with looking at task manager and verify that it is one of your sql server instances using most of the cpu.  Run that script on that instance and see what it says.
Hi,

I just ran both your scripts 20 times each. the highest cpu % was 3.2.
I ran the script mostly against a user database that is used more than any other database.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi MrRobot,

How are ya?

Yes I am watching the sql server counters on prefmon. I can see that the CPU Priveledge % grows with each spike. I just don't know why and how I can find out.

I would like to try taking some of the other User DBs off line. How can I kick everyone off a DB so that I can take it off line?

I am already doing a lot of overtime to get other things done, so I would really, really perfer not to re-install SQL Server.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Ok I'm running System Monitor at the moment. Looking at locking.
Lock request / sec are related to the cpu spikes but
Lock Timeout / sec,  Lock Wait time, Lock Waits /sec & number of deadlocks are all 0.

As far as full scans are concerned, no correlation to the CPU spikes (Almost no scans happening at all). Index Searches / sec  on the other hand are happening on each CPU spike.
Probe & range scans are at a bare minimum.

If I am interpreting this data correctly, the I either have too many indexes, bad indexes or fragemented indexes. All of these I find strange as I have run the Index tuning wizard 2-3 times this year.

I just run DBCC SHOWCONTIG example E. from BOL
E. Use DBCC SHOWCONTIG and DBCC INDEXDEFRAG to defragment the indexes in a database
This example shows a simple way to defragment all indexes in a database that is fragmented above a declared threshold. (30.0)

Now Index Seaches / sec are continuously high. Avg. 265 / sec, Max 4025 /sec. :-(



Hi,

I had to restart the service to resolve this temporarily again.
I left perfmon running with the same tokens.
Index Searches still peak regulary, but no loss of performance.
I no longer think Index Searches have anything to do with this issue.

Any idea's?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Profiler hasn't revealed much. But I addmit, I'm only only a novice in using it. I have created some traces based on what I have found on the net. Interpreting is a another skill all together.  From what I have seen there are some sp's that are taking a long time (High reads & duration) but only when things are running slow, otherwise things look okay. Maybe you have some tips for me?

As far as I know activity monitor is a tool in SQL 2005 and doesn't exist in SQL 200. Please corect me if I'm wrong.

I will open a support case with McAfee. Who knows, maybe that's causing the trouble. ProtectionPilot does run a lot of auditing & logging sps. These are the ones I was refering to above.
These MS articles explain situations that are similar to what I'm experiencing.
http://support.microsoft.com/kb/323213
http://support.microsoft.com/kb/835864

Unforunately their resolution is to install the latest service pack, which already is.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
also try sql batch completed events.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm running profiler at the moment. I'll use the trace as a baseline.
I had to appease the natives by restarting the service recently, or else they would have linched me.

Everything is running fine at the moment. When things slow down again i'll run the traces as you suggested.
In the meantime I'll download activity monitor and experiment a bit.

Thanks for the help!
about that Acivity Monitor.
Googling gave me a lot of "Big Brother" programs.

From this MS site:
http://www.microsoft.com/downloads/details.aspx?FamilyID=d09c1d60-a13c-4479-9b91-9e8b9d835cdc&displaylang=en

I downloaded this:
http://www.microsoft.com/downloads/details.aspx?FamilyId=79F151C7-4D98-4C2B-BF72-EC2B4AE69191&displaylang=en

Please let me know if this is the one you were refering to.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

This issue has come back a few times since I last posted.
Just now I tried to see what was happening in ProcessMonitor.
I got the dreaded Blue Screen of Death upon doubleclicking the processmonitor exe. FLTMGR.SYS being the culprit.

I will contiue to monitor all events from PID 1388 (Local SQL instance) with a duration > 1.
Because I just had to restart, I'll have to wait for the issue to raise its head again.

My applogies from the consistantly slow replies. Even though this is becomming a major problem, it is still not in my top 5 priorities. My boss keeps me occupied with all sorts of other things he deems to be more important......
Please bare with me, I still need your wisdom, guidance and especially your patience.
It didn't take longer to raise its head. :-(

SEM Locks revealed:
Object     LockType     Mode      Status      Owner     Index
TestDB    DB                S            Granted   Sess        
MutDet     TAB              IX           Granted   Sess
MutDet     Page             IX           Granted   Sess       PK_MutDet_7e6cc920
MutDet     Key               X           Granted   Sess       PK_MutDet_7e6cc920

The testdb is a restored (renamed) backup from the actual DB on the same instance.
Even though, I killed the application on my client machine, these still show up in SEM on the server, but not in SEM on the client.

........ Just got interupte by the big boss himself........ I now have other priorities than server crashes and poor performance...... :-( grrrrr.
I'll still be listening though.
How big an impact on performace should sp_setapprole have?

I am monitoring profiler duration > 50. 99% displayed is sp_setapprole, duration 150 - 210, cpu = 0.
Is this normal? Could it have something to do with the cpu usage?

My application ADO connection uses an application role.  the ADO Connection is closed after every tsql.
This means that sp_setapprole is used tens of thousand times a day.
Is this normal practice, or far from best practice?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial