Link to home
Start Free TrialLog in
Avatar of pmclean739
pmclean739

asked on

Exchange 2013 Keeps Crashing with Maxed Out CPU

We have two Exchange 2013 servers, fully patched. Both are Server 2012 R2 virtual machines running off of different VM hosts. Both hosts are ESXi 5.5. They both back up to the same BDR.

Over the last week each server has crashed about once a day. Each time it is very sudden. Memory and CPU utilization is normal and then within a minute or two, the CPU is pegged and the server is unresponsive. We generally have to reset the VM in order to get access to the server. Since it is so sudden, we don't have time to see which process is maxing out the CPU.

We have several other servers in the same environment, including an Exchange 2010 server, which all work fine.

The event logs don't give a clear indication of what is causing the sudden spikes in CPU use. These are the only two Exchange 2013 servers we have running at this collocation, and they are the only two having this problem. It may just be a coincidence, but it's all we really have to go on at this point.

Appreciate any help you can give.

Thanks!
Matt
Avatar of Zacharia Kurian
Zacharia Kurian
Flag of Kuwait image

Check the vCenter event logs for a clue. Make sure that you have installed the  latest VMware tools and your VM's NIC is VMXNET3. Also make sure that your VM's hard ware version is set to the latest.

Are you using any 3rd party backup solution for your exchange servers? If so check the time of crash against the backup schedule.

Zac.
You can use the UserMonitor tool from Microsoft to check for any unusual or overly intense  access.
Incorrectly configured mobile devices can cause excessive traffic.
Is the Exchange server behind a file wall? Check for any Denial of Service attacks on it.
Avatar of pmclean739
pmclean739

ASKER

Thanks, Zac.

The VCenter logs unfortunately didn't have much to say. One of the servers was on a different NIC, so I changed that to VMXNET3 when it went down this morning.

We are using a 3rd party backup solution that leverages ShadowProtect for backups. Something interrupted backups on the BDR and left all the servers doing a DiffMerge (essentially we had 8 backups running at once), so I disabled backups on most of the servers to make sure that wasn't the issue. Unfortunately it didn't seem to make any difference.

I will look into making sure VMware Tools and Hardware and see if anything needs to be done there.

Will keep you posted if I find anything new.

Thanks,
Matt
Hi Peter,

Thanks for your help. I have searched around for the User Monitor tool, but everything I see says this is not compatible with Exchange 2013. Is there another tool I can use for this?

The Exchange server is behind a firewall. I can check the log if this happens again, but this only seems to affect the Exchange server. All other servers behind this firewall are not affected.

Thanks again for your help!

Matt
Sometimes these tools can still work on newer versions of Exchange even though it doesn't specifically state that version of Exchange.
Hi Peter,

I have tried to install the UserMonitor tool on this Exchange server, but it won't install. In the course of my research I found many people saying that the tool will not work on Exchange 2013.

Is there a tool you have used with Exchange 2013?

Thanks,
Matt
Sorry, I am still on Exchange 2010 at present...

I have not seen that many new tools for Exchange 2013 or later.
The other tools use could try is Process Monitor, Process Explorer,  and Perfmon.

Also, increase the Diagnostic Logging Properties for Exchange server to higher levels (remember to increase Application Event log size).
Finally, for IIS, you can use the Trace logging feature to capture more low-level diagnostic info.
ASKER CERTIFIED SOLUTION
Avatar of pmclean739
pmclean739

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Not confident this is the only cause of the issue, but after getting all the server backups on the network situated, this issue has not recurred.