We help IT Professionals succeed at work.

Why does security logging on the DC eat all the CPU?

Daniel
Daniel asked
on
Hi guys,

Our primary domain controller was having most if not all of it's CPU power being utilised by the svchost.exe process hosting Eventvwr, DHCP client and lmhosts.  The security logs were almost 2GB and were logging at a rate of 15 per second but I've seen up to 25 per second.  In the last 3 1/2 hours is has recorded close to 190,000 security events.  

The categories are usually 'Detailed File share" "File share" or "Filtering Platform Connection"  (stating that The Windows Filtering Platform has permitted a connection.) and the obvious logon logoff ones.  

The default domain policy has all of the Local Policies/Audits Policy setting as Success, Failure which I figured will be contributing to the problem (Policy is listed at the bottom), but the other two DCs are governed by this policy and have no problems.  Log retention is 30 days.  I am thinking that patches may have been pushed out to the server and something has taken issue with it, maybe?   Rebooting has done nothing,  the only thing that stops it is killing the svchost which obviously is no good.  

The max log size was set to 1310720 KB but I think that would have been a typo as the other DC's had it set to 131072, so I've changed that and redirected the logs to fill up a separate spare drive while I sort this out.

Any ideas where I should start?

Thank you all in advance,

Dan

Policies:

Audit account logon events Success, Failure
Audit account management Success, Failure
Audit directory service access Success, Failure
Audit logon events Success, Failure
Audit object access Success, Failure
Audit policy change Success, Failure
Audit privilege use Success, Failure
Audit process tracking Success, Failure
Audit system events Success, Failure
Comment
Watch Question

CERTIFIED EXPERT

Commented:
As this is a shared svchost.exe, I would suggest  as a first step to split off all the associated services using it, to their own dedicated svchost.exe process.

This can be done by identifying the services by running a tasklist /svc in a command prompt and noting the  services associated with a particular svchost,exe process or via the Task Manager in Windows 2008+.

Once you have the services identified, run SC Config Servicename Type= own for each of the Servicenames.
Eg, SC Config DNS Type= own

Note that there is a space between the = and own.

Restart the services or the server. Now when the CPU begins to spike, you will know the exact process which is causing the issue.

I've recently used this method for the same high CPU issue on a SharePoint server and identified the offending process. A quick search revealed a hotfix for that process that resolved this issue for us.
CERTIFIED EXPERT

Commented:
Note that besides what I have given above, you can also follow the troubleshooting guide for high CPU usages on DCs @ http://technet.microsoft.com/en-us/library/bb727054.aspx

Author

Commented:
Hi,

Thanks for the reply.  I isolated the service as you suggested and it is indeed the the Windows Event Log,  I'm not surprised given how many logs per second it is pumping out.   What I need to find out is why that this DC is recording so many events to the point the CPU can't keep up and the other two are fine.  Then, what I can do to get it back to normal

Thanks again
CERTIFIED EXPERT
Commented:
As you now have the Windows Event Log running in its own svchost instance, I would suggest doing the following:
*Download Procmon from http://technet.microsoft.com/en-au/sysinternals/bb896645.aspx to your server
* Kill the svchost.exe process that is running Windows Event Log
* Run procmon.exe  - as it starts up with full monitoring, I suggest pressing Ctrl-E when the program opens up after accepting the EULA. This will stop all monitoring.
* Start the Windows Event Log service and note the PID in task manager
* in Procmon, click the Filter Menu and select Filter. In the Filter window, select "PID" is "PID of relevant svchost.exe" and click Add
* Start Monitoring by going to File -> Capture Events  (or Ctrl-E)

You will see everything that is hitting that particular PID now. You do not need to run this very log and can stop it with another Ctrl-E.

The output you get will need to be investigated to workout what is connecting and hammering away at that service.

It can for example,  be something that is doing a continual scan and triggering your policies as it authenticates and accesses objects that are being audited.
You may find that the registry is being hammered - one of the causes for your type of issue was caused by thousands of entries in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\eventlog\Application\ registry key (ref: http://social.technet.microsoft.com/Forums/windowsserver/en-US/5b4e4435-199c-436f-9ca9-85ff69fa7a3a/evenlog-viewing-causes-cpu-usage-100-on-windows-server-2008-sp2?forum=winservergen)

You are auditing a lot of stuff though, so if nothing has changed in your environment, it can be simply be a threshold has been reached with regard to how much the server can process.

Author

Commented:
I ran procmon and isolated the PID to see what was happening.  It appears to be reading the security.evtx file sometimes hundreds of times per second (one particular second it did it 350 times) .  Explains whats hogging my cpu ( see attached) .   The network guy is looking at replacing this with a 2012 box I'm now told, so I think we're just going to push forward with that rather than waste more time fixing this one.

Thank you for your help ckumar42 .

Author

Commented:
Oops, forgot to attach
Capture.JPG
harry-you-da-manCo-Owner (with son Andrew M. Stein)

Commented:
I'd like to ask you some very important questions as I am experiencing the same issue.  (1) are you running in a virtual machine (2) are you using mcafee endpoint protection?

Author

Commented:
Hey,

It's been a while, so I don't remember too much specific.  But the box was a Physical 2008R2 and was not running McAfee - we were using Sophos.
harry-you-da-manCo-Owner (with son Andrew M. Stein)

Commented:
Ok.  That's helpful.  Do you recall solving this or did you just move to the 2012 server?

Author

Commented:
We replaced the server in the end, so sorry I won't be much help which is a shame because I remember being a painful issue to take care of.
harry-you-da-manCo-Owner (with son Andrew M. Stein)

Commented:
Okay, thanks.
Gary WhiteNetwork Applications Manager

Commented:
@Harry-you-da-man, did you find a resolution to this issue in your environment? We are experiencing same and I have yet to find a fix.
This resolved our high spiking CPU issue but isnt best practice.

Default Domain Policy or Default Domain Controllers Policy
Computer Configuration
Policies
Security Settings
Local Policies/Audit Policy
Policy Setting
Audit account logon events Failure
Audit logon events Failure