Link to home
Start Free TrialLog in
Avatar of Reinhard Rensburg
Reinhard RensburgFlag for South Africa

asked on

AD-DS Service stops occasionally on 2008 DCs - pls help

We have 8 Domain Controllers country-wide situated at 7 geagraphical locations, all of them have the Global Catalog role enabled. Five of them runs on 2008 R2 64-bit STD and three runs on 2008 R1 32-bit STD.

If they are "left alone" for long periods (+/- a month) the AD-DS Service seems to stop, one can then not login to the Server at all (Remote Desktop or at the Console), it reports an incorrect username/password (possibly because AD-DS is not running and it can't authenticate).

I've also tried to login with the "recovery account" (username: .\Administrator and our recovery password) and this is also denied (I've tried it via RDP, not sure if the recovery acc can be used with RDP or should one use it at the Server etc.)

Bottom-line: both versions of the Domain Controllers does this from time to time (2008 R2 and 2008 R1 DCs) with no specific pattern, the only common dinominator I can find is the fact that it seems to happen to DCs if we "leave them alone" for long periods without rebooting them.

When this happens it influences users logging into AD at that division as well as DHCP (as it runs on our DCs) so we need to then force a reboot by pressind the power button on the DC...

Any idea why this could happen or where one could look to confirm that the AD-DS Service is indeed the culprit and of course if it is a common problem how one resolves it?

Avatar of wantabe2
wantabe2
Flag of United States of America image

First I would make sure all domain controllers were up to date on service packs as well as any security or OS patches. I would reboot all of them & see if the problem is still there.
Avatar of Reinhard Rensburg

ASKER

Hi wantabe2,

Thank you for your suggestions,

We use WSUS, the Servers are updated regularely and all on the latest Service Pack for the Operating System respectively.

We do reboot them from time to time, not all at once, but the ones giving problems were rebooted +/- a month or longer ago, all goes well for some time and then the same problem reoccurs again.

Thanks,
Reinhard
Avatar of Darius Ghassem
I have never had this issue but it is a good idea to reboot the servers on a schedule to remove any memory leaks that could be present or crop up.

Run dcdiag on the network lets see if we can see anything on this report
Hi dariusg,

thank you for the post,

Should DCDIAG be run with any switches to specify specific info to gather or just as is so that it does everything?

Thanks,
Reinhard
No switches for now just want a basic DC to look over first
Hi dariusg,

Apologies for the long delay but finally here is the results of the DCDIAG I ran on our Domain Controller (one of the main ones with three of the FSMO roles on it):

** I just replaced the Server's name and our domain name in the DCDIAG data **



Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: DMZ\SERVER_NAME
      Starting test: Connectivity
         ......................... SERVER_NAME passed test Connectivity

Doing primary tests
   
   Testing server: DMZ\SERVER_NAME
      Starting test: Replications
         ......................... SERVER_NAME passed test Replications
      Starting test: NCSecDesc
         ......................... SERVER_NAME passed test NCSecDesc
      Starting test: NetLogons
         ......................... SERVER_NAME passed test NetLogons
      Starting test: Advertising
         ......................... SERVER_NAME passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SERVER_NAME passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SERVER_NAME passed test RidManager
      Starting test: MachineAccount
         ......................... SERVER_NAME passed test MachineAccount
      Starting test: Services
         ......................... SERVER_NAME passed test Services
      Starting test: ObjectsReplicated
         ......................... SERVER_NAME passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SERVER_NAME passed test frssysvol
      Starting test: frsevent
         ......................... SERVER_NAME passed test frsevent
      Starting test: kccevent
         ......................... SERVER_NAME passed test kccevent
      Starting test: systemlog
         ......................... SERVER_NAME passed test systemlog
      Starting test: VerifyReferences
         ......................... SERVER_NAME passed test VerifyReferences
   
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : dcd
      Starting test: CrossRefValidation
         ......................... dcd passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... dcd passed test CheckSDRefDom
   
   Running enterprise tests on : our.local.domain
      Starting test: Intersite
         ......................... our.local.domain passed test Intersite
      Starting test: FsmoCheck
         ......................... our.local.domain passed test FsmoCheck


Just as a note: two of our Domain Controllers "hung" again during late December / early January because they were not rebooted for about a month, so again this seems to happen when the Domain Controllers are left untouched, at that point we cannot Remote Desktop to it because it can't authenticate us (do to AD-DS suspectedly hanging or stopped) and the power button needs to be pressed to restart it after which everything is fine again for a month or so.

Thanks,
Reinhard
The report looks good.

Are you getting any errors in the Event logs?

Can you post ipconfig /all?
Hi dariusg,

I can check the event logs, just not sure which log to look at and what to look for, there's quite a lot of events logged.

Do you want the ipconfig /all of any one of the Domain Controllers giving this problem?

Thanks,
Reinhard
Yes from anyone that is having the issue.

Really any Error that Jumps out at you in the Event log. Maybe around the time things hang
Hi there DariusG,

This morning we had a problem again with a Domain Controller that "stopped responding", we then reset it like always to get it to work but I went to the event logs afterwards like you suggested and found that the Server runs out of memory and cannot allocate memory to AD-DS to replicate and function which causes all the issues,

I now have the cause of the problem (memory leak) but not sure how to find out what is eating up the memory on the Server. It must be AD-related as it is only our Domain Controllers doing this, and we've gad various DCs doing this all over the country, even on different platforms (some 2008 R1 32-bit, other is 2008 R2 64-bit).

Attached it is .PDF with 3 print-screens of the events I found, any assistance with how to find the app that eats up the memory causing AD-DS to stop working would be appreciated.

Thanks,
Reinhard
VENCO-AD-Server-hung-when-runnin.pdf
ASKER CERTIFIED SOLUTION
Avatar of Darius Ghassem
Darius Ghassem
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi DariusG,

Thanks for the pointer, I read through it and also found the page where it describes how to change the MaxPoolThreads per processor, I will do this, reboot the DCs and advise on the outcome.

Regards,
Reinhard