Link to home
Start Free TrialLog in
Avatar of TomMonkey
TomMonkey

asked on

Strange CPU spikes

Hello,

One of our Win 2003 Ent, SQL 2005 servers is behaving strangly.  Every 6th ping seems to take a long time to return.  It corresponds with a spike of activity on one of the processor cores.  the server has 2 NICs - I have updated to the latest drivers, tried using just the first, just the second and both.  We still get the slow ping.  I have attached the screen shot of the pings and the task manager.

How should I start to investigate this please?  I have looked at performance monitor which tells me the same as task manager but I don't know how to pin point the cause.

Cheers
Tom
Taskmanager.JPG
ping.JPG
SOLUTION
Avatar of chuckyh
chuckyh
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of TomMonkey
TomMonkey

ASKER

Hi chuckyh,

Thanks for that.  I have run it and can't see any processes that correspond to the spikes.  It just seems to be hardware interupts.  i have attached a screen shot.  Could this mean that the processor is on it's way out?

Cheers
Tom
Process-Monitor.JPG
I don't see those spikes you are talking about on the process explorer performance monitor. You can sort the processes by cpu usage, if there's a spike you should be able to see it, maybe increase the refresh rate on the process explorer temporarily.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yeah, they are kind of small.  It just 17% of 1 processor of 4.  You have to zoom in a bit.  That's why I showed the interupts properties.  When I sort by CPU usage the only process that jumps up the queue is sqlservr.exe.  I've attached the screen shot of it's properties.  There is a very small spike at about the same time as the hardware interupts...
SQL.JPG
TechnoButt,

For some reason there is no advanced tab on the NIC's properties so I can't get to those settings.  These are fairly old cards.  Maybe replacing them would help...
You may have to disable ToE in the registry if that's the issue, a while back I had a lot of problems with an exchange cluster because the heartbeat was shared using Broadcomm nics affected by the ToE issues (before Microsoft would own up).

Basic info (with a grain of wikisalt):
http://en.wikipedia.org/wiki/TCP_Offload_Engine

Microsoft info:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496
BTW, ToE is only on broadcomm nics (as far as I know) and only recent models (recent being last several years).  
Thanks for the links.  I checked and I already have that update installed.

These are fairly old Intel NICs.

I have spoken to one of our SQL DBAs and he is looking into some encrypted packets that are recieved from a Sharepoint server that could be causing the problem.  Just a theory at this stage.

I've also found an update for the chipset (clutching at straws now!)

I'll keep you guys posted...

Cheers
Tom
ASKER CERTIFIED SOLUTION
Avatar of Dave Lloyd
Dave Lloyd
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Disorganise,

I've just checked the event logs and there are some disk errors in there:

The device, \Device\Harddisk0, has a bad block.

I don't know why I didn't check the event log sooner!  I guess when you think you already know what might be wrong...

These are new disks in this machine.  I'll run chkdsk tonight and see what it comes up with.  

Thanks for the advise.

Tom
No worries - it's easy to get stuck into a certain trail of thought.

My philosophy, and one I try to instill in my staff and friends, is to 'make event log your friend'.

Good luck!
Right, Chkdsk found a bad cluster and sorted it out.  It was where a backup file was being written:

\Backup\SQL2005\Backup\SUSDB\SUSDB_~4.BAK

After rebooting the same prblem still existed.  I tried stopping SQL server and that didn't help so I guess SQL is a symptom and not a cause.

There are no errors in the DNS logs on any of our DNS servers and they seem to all be running sweetly...
Thanks for the help guys.

I spent some more time on the machine over the weekend and I'm pretty convinced it's the RAID controller that's screwed.  Unfortunalty it's an embedded ICH5R, I have been reading up on and turns out to be a 'fake' hardware RAID.  It is telling me that the RAID is degraded but both diaks are ok.  I cannot run Seatools on the disks because the program does not recognise the RAID controller.

Everytime Windows start it kicks off Chkdsk, which then says it has fixed the errors but kicks off again after it has finished.

I am moving all databases to another server as quickly as possible...