high cpu on hub mailbox 2010 sp3 ru 10

I have one  our of 8 hub/mailbox servers having high CPU split between the Store.exe and msexchangerepl.exe processes.

My topoplogy:
2 AD sites  2 CAS servers in each site 4 hub/mailbox servers in each site  the DAG spans the site with each mailbox database having 2 copies in each site.  (FSW is in a third site)

All servers are 2008r2 with the latest patches.  

These are all virtual machines on vmware.

The server having the high CPU has no active databases on it just copies  (4 of them).

Not showing any consistent errors or warnings.  it has been doing this for over a week now.  It will spike to 100% for a while then operate between 80 and 50%  for a while then in the evenings drop to almost normal (5%- 15%)  like the other hub/mailbox servers.

Restarting services or rebooting has little to no effect.

This server also houses the public folders for this site.  I have dismounted them and it makes no difference.  I also dismounted the non DAG replicated database (which has no mailboxes on it) - no change.

and I'm sure I'm forgetting something else I tried....

Thank you in advance for your help!
cdshreveAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott CSenior EngineerCommented:
Run ExPerfWiz on your CAS and MBX servers.

https://experfwiz.codeplex.com/

For around 2 hrs.  Use an interval of 5 and use the -threads switch.

Take this data and run it through PAL.

https://pal.codeplex.com/

Also look at LDAP Read and Search times.  These counters are under MSExchagne ADAccess Domain  Controllers.  Times should average under 15 with spikes no higher than 50

If they are, apply HOTFIX http://support.microsoft.com/kb/2862304


Also look at RPC Averaged Latency and RPC Requests.  If requests are over 300 performance will be affected and at 500 users will start losing connection.

You might also want to look at sizing.

It's possible the MBX server is getting flooded with connections requests and it's getting overloaded.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AmitIT ArchitectCommented:
VMware tools are up to date on that one server, where you are facing issue. Also, how is load on ESX server itself, do you have enough resource on that server. Finally, I assume CPU and RAM is not allocated dynamically.
Scott CSenior EngineerCommented:
Here is the VMWare best practices guide for Exchange 2010

http://www.vmware.com/files/pdf/exchange-2010-on-vmware-best-practices-guide.pdf

Hyper threading needs to be turned off and don't oversubscribe resources.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Adam FarageSr. Enterprise ArchitectCommented:
I would make sure VMware tools are 100% up to date, although I doubt this is the issue. Furthermore you want to turn off HT and not over subscribe. Hyperthreading is not a big issue for the store processes (e.g: database, which is consuming your CPU cycles) but will cause .NET garbage collection, thus issues for the transport processes eventually.

What you are describing sounds like a sizing issue.. either with the amount of available cores or disk IO. If you dont have enough IOPS for the database operations, you will see a spike in CPU operations (from what I have seen). I would recommend running some sizing tools, and then placing those numbers into the Exchange Calculator for Exchange 2010. Confirm the IOPS and CPU core calculation is correct for what you have deployed.
cdshreveAuthor Commented:
Finaly got to run this on my mailbox server.  Running through the PAL right now.

VMware tools are up to date and the other 7 mailbox servers are running fine in the environment.

figuring out how to read LDAP read and search times and RPC average latency.
cdshreveAuthor Commented:
After looking through the data it was another virtual machine on the same SAN running an engineering calculation (Very long runs) taking up all of the disk I/O.!!    Moved those off of that SAN and like magic it returned to normal.  Thank you all for your patience and advice!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.