Exchange 2013 CPU Spikes

We have been having many issues with resources on the Exchange 2013 CU6 servers recently... we have always had them but they seem to be happening more often now.

I remember back with Exchange 2010 there was the issue with iOS 6.1 devices syncing... this almost feels similar as our server is hitting 100% CPU usage. I found that we still have some of those, but I can't find any documentation stating that this carried over to Exchange 2013. Does anyone know?

It seems like everytime I look at Sysinternals Process Explorer the following are pretty much all at 90-100% CPU;  
-  w3wp.exe
-  noderunner.exe
-  Microsoft.Exchange.Store.Worker.exe

When the CPU spikes, we generally get calls from users saying Outlook disconnected and after a 2-5 mins Outlook clients connect again. We see this on all our servers in the DAG...

Ive been checking the counters in "\MSExchange RpcClientAccess\User Count" thinking this could help find the issue, but have had no luck... Running ExPerfWiz all afternoon. Averaging 8000-9000

Looking for ideas or suggestions?

Thanks
Christian HansAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott CSenior EngineerCommented:
Use ExPerfWiz and collect 2-4 hours of data.

https://experfwiz.codeplex.com/

Run it through PAL .

https://pal.codeplex.com/

Look at the report and see what it says.  PAL is a great tool for getting a quick overview.


If you see high LDAP follow this:

HKLM\SYSTEM\CurrentControlSet\services\NTDS\Diagnostics

Change 15 Field Engineering to a Value of "5"

Once this is set look in the Directory Services log for event ID 1644.

If this Event ID shows up install HOTFIX http://support.microsoft.com/kb/2862304

LDAP Read and Search times should be under 15 with spikes not higher than 50.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott CSenior EngineerCommented:
Post back.  I'm heading home but will follow up tomorrow.
0
Scott CSenior EngineerCommented:
You also need to be running ExPerfWiz on your CAS and be looking at RPC Client access on those.  At 300 your users will be affected an at 500 users will get disconnected.

I have a OneNote on RPC.  I'll post that tomorrow.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Christian HansAuthor Commented:
ScottCha, I really appreciate your help. Im just running a new experfwiz this morning for 4 hours so I should have it by lunchtime... PAL usually take a while to finish but Im confident Ill have results right after lunch and update here. Thank you Sir.

BTW... setting up the Active Directory Diagnostic Logging  "HKLM\SYSTEM\CurrentControlSet\services\NTDS\Diagnostics"... this needs to be done on the Active Directory server(s) correct?
0
Scott CSenior EngineerCommented:
Sounds good.  Here is the RPC notes I promised.

User impact will start at RPC Requests around 300.

500 means server is done, no more connections, users start getting disconnected and slow @ 300.

Counters to look at:

Process/Thread Count/Microsoft.ExchangeRpcClientAccess.Service

MSExchange RpcClient Access with counters RPC Averaged Latency and RPC Requests.

Look in the RPC Client Access logs on the CAS for backoffs from the MBX servers.


The error that will show will be "0x6bb".
0
Christian HansAuthor Commented:
So... that was torture waiting for the PAL report to build. :-)

I changed '15 Field Engineering' to a Value of "5" on the Domain Controller and only saw the one 1644 Event right when I turned it on in the Directory Services log.

PAL results show this output throughout the day.... I don't know if you see anything that could help troubleshoot the issue...

2015-08-04-17-11-44.jpg
2015-08-04-17-06-25-01.jpg
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.