Solved

Server (Domain Controller) Unresponsive - Have to HARD REBOOT Server to Fix Lockups

Posted on 2012-03-09
11
973 Views
Last Modified: 2012-08-30
Problem:  COMPLETE NETWORK LOCKUP:  
    All workstations freeze;
    Cannot authenticate into any network device until reset;  
    Server (DC) has to be hard powered down to restart.  
    Hard reset on server fixes problem

Notes:

1. Domain Controller is 2003 R2 X64 SP2 with all critical and security patches applied

2. No events are written to log files when lockups occur

3. Server IP (and other devices) can be pinged during lockup

4. Login screen can be accessed on all other servers but if user name and password is entered, the system "locks" up.  (authentication failing)

5. Disk arrays are healthy

6. Network traffic normal during lockup

HELP!
0
Comment
Question by:sfjcpu
  • 5
  • 3
  • 2
  • +1
11 Comments
 
LVL 57

Expert Comment

by:Mike Kline
ID: 37703021
Is this your only DC?

Thanks

Mike
0
 

Author Comment

by:sfjcpu
ID: 37703050
No.  There are 2 others.
0
 
LVL 57

Expert Comment

by:Mike Kline
ID: 37703122
Odd, do the clients have the IP of the other DCs in their DNS.  Does this box that freezes hold all the FSMO roles.

Are the other DCs also global catalog servers?
0
 

Author Comment

by:sfjcpu
ID: 37703399
Yes the clients use one of the other DCs as their secondary DNS. The server holds all FSMO roles except for infrastructure. The server that holds the infrastructure roll is not a global catalog server.
0
 
LVL 3

Accepted Solution

by:
StuWhitby earned 400 total points
ID: 37703555
Something on the DC is still responding to a heartbeat, so workstations keep trying to access that one.  When you say you have to hard reboot.... is that because a soft shutdown hangs or because you can't start it due to entire UI hang?  The reason I ask is that running up Process Monitor and leaving it on the DC may show you what's going on at the time of the hang.

How many CPU cores in the DC?  If there's only one, it may simply be that one thread has gone crazy, resulting in an unresponsive system.  If so, run up Process Explorer and increase the window manager and Process Explorer's priority to Realtime.  This will now take precedence over a single thread, but may have a slight performance impact on the system.  Once the system hangs, you should now be able to look and see which process is eating CPU, examine the threads in that process and see what they're doing.  

Both tools available from http://technet.microsoft.com/sysinternals.  Set up Process Explorer with the symbol path "srv*c:\symbols*http://msdl.microsoft.com/download/symbols" under Options/Configure Symbols.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 26

Assisted Solution

by:Leon Fester
Leon Fester earned 100 total points
ID: 37706830
If you're not seeing anything written to the windows logs, then it generally points to a hardware issue.
Now I've never come across a scenario, where 1 DC out 3 fails and the other two can't still authenticate users.

Have you run a AD health check, are all DC's healthy?
Run a DCDIAG to test.

To me "lockup" means that the keyboard/mouse/screen is non-responsive.
Is that what you mean when you say "lockup", or what exactly does it mean to you?
0
 
LVL 3

Expert Comment

by:StuWhitby
ID: 37707886
I disagres that "nothing written to the windows logs [generally] equals hardware issue".  This is regularly that it's so busy doing something that requilres no logging that nothing gets written.  It's rare with a hang that anything gets logged as all it's actually doing is waiting for something.  It is entirely possible that this is an intermittent issue with the disk though, where networking is sitll working fine but the disk can't be accessed to write event logs or read to authenticate users.

The other systems don't authenticate users because the primary is still responding on the network, so things are working for this system.  

To check the disk, try setting up ftp on the system disk and see if you can access data that way.  rpcinfo -a should also give you further information as regards what services this system is seeing as available behind the networking (not exhaustive, but proves that the OS itself is answering basic commands).
0
 

Author Comment

by:sfjcpu
ID: 37710112
DC's are healthy.  There is only one DNS entry on workstations.  We will add another DNS entry.  Of course that will not fix the problem but it needs to be done.  Thanks for that advice.  I downloaded Sysinternals (nice set of tools).   Process monitor and process explorer are running on the DC this morning.  I'm logged on server and am keeping the logon open so I can see screen.   Client will contact immediately if the DC becomes unresponsive.  I'll post the results.
0
 

Author Comment

by:sfjcpu
ID: 37782094
Still having lockups.  We suspect a profile issue coming from a Windows2000 Terminal Server login.  The DC usually hangs about the same time some users are logging into the old TermServer.  The log files grow too quickly and too large to be practical in this diagnostic.  Any ideas on whether or not a profile could hang up the DC?
0
 
LVL 3

Expert Comment

by:StuWhitby
ID: 37782307
If Process Monitor logging is too large to be usable and you notice this issue as soon as it occurs, set up a scheduled job to stop and start process monitor and "interact with desktop" on an hourly basis.  When it hangs, get to the DC, wait until it's responsive again, stop the capture, save the log (pml format) and stop the scheduled job.  Then go back through the log to a point during which the system was unresponsive and try to figure out what led up to that hang.
0
 

Author Comment

by:sfjcpu
ID: 37789177
Stu, Thanks for the post.  The hangups are so random, days in between, it would be difficult to do that.  When it hangs, my client restarts the server right away since they have all their profiles and shares set up on this server.  

If I could write a script to save the log files for 10 minutes, then stop ProcMon, save the log file to another file name, then start it again.  Maybe keep 10 of these files and have the script cycle through the names, such as ProcMonLog1, ProcMonLog2, ProcMonLog3...ProcMonLog10.

Would this work, and if so, do I need to post a separate question on how to write the script?

Thanks!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

As network administrators; we know how hard it is to track user’s login/logout using security event log (BTW it is harder now in windows 2008 because user name is always “N/A” in the grid), and most of us either get 3rd party tools, or just make our…
Learn about cloud computing and its benefits for small business owners.
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now