Solved

Windows 2008 Primary domain controller behaving badly

Posted on 2010-09-16
7
2,689 Views
Last Modified: 2012-05-10
My primary domain controller (a windows 2008 server), is exhibiting some EXTREMELY strange behavior.  The server is only about a year old, and does almost nothing other than its DC roles.  It is also a WSUS server, and is a CA server, and serves as the main CA for most of our self signed certificates in the company.  We have one other 2008 DC, that is also running several application server type functions.  We've had a couple weird AD and group policy issues that were usually solved with a reboot.  We also had some WSUS problems, but now the server has pretty much completely stopped working.  It's kind of hard for me to define what the problem is, because there are so many.  But here are a list of some of the things that are happening.  For reference, my DC's name is "WIGUUM":

Symptoms:
Every time you reboot, hangs on the Advanced boot options/safe mode screen.  You have to physically go in front of the server and choose "Start windows normally"
All Desktop icons and toolbars, etc. have disappeared, with a message about not being able to connect to the User Profile Services Service
I cannot browse to the server's files.  I get the following error: "Windows can't find '\\Wiggum'. Check the spelling and try again."
Can't connect to AD Users and Computers or Group Policy via the RSAT.  Error: "The following domain controller could not be contacted: wiggum.domain.com.  The RPC server is unavailable"
I use this DC to authenticate my Cisco VPN (similar to Radius), but that no longer authenticates and returns an error on the VPN client
WSUS not working: WSUSService is stopped, and will not start.  There is a warning event 7042 that says "the WSUS administration console was unable to connect to the WSUS Server Database.  Verify that SQL server is running on the WSUS Server.  If the problem persists, try restarting SQL.  System.Data.SqlClient.SqlException -- Cannot open database "SUSDB" requested by the login.  The login failed.  Login failed for user 'NT AUTHORITY\NETWORK SERVICE'.
The Network Policy and Access Services Role is not working:  The "RemoteAccess" (Routing and Remote Access) service is stopped, and will not start.
The Event log is also flooded with errors 25 and 20103
Error 20103 - RemoteAccess: "Unable to load C:\Windows\System32\iprtrmgr.dll."
Error 25 - NPS: "The address of remote RADIUS server 10.1.2.6 in remote RADIUS server group Wiggum Radius resolves to local address 10.1.2.6. The address will be ignored."
Lots of Event 18456 occurrences: "The description for Event ID 18456 from source MSSQL$MICROSOFT##SSEE cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

      If the event originated on another computer, the display information had to be saved with the event.

      The following information was included with the event:

      NT AUTHORITY\NETWORK SERVICE
       [CLIENT: <named pipe>]

      The specified resource type cannot be found in the image file"
The Microsoft Windows Error Reporting systems usually comes up with a window after booting that shows this error message:

Host Process for Windows Services stopped working and was closed. A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
When I run many of the "netdom /QUERY" commands, for example with the WORKSTATION, SERVER, DC, OU, TRUST, etc. switches, I get one of the following two errors:
The specified network name is no longer available.
The requested API is not supported on the remote server.

Testing I have tried:
When I run "netdom /QUERY FSMO" on Wiggum, I show the following roles:
Schema master             wiggum.domain.com
Domain naming master  wiggum.domain.com
PDC                               wiggum.domain.com
RID pool manager          wiggum.domain.com
Infrastructure master    wiggum.domain.com
When I run "dcdiag", it passes most tests, but fails the system event log.  It shows TONS of events.  I went to copy them down, but they're not showing up when I run the test anymore.  Here is what the test shows now:
Directory Server Diagnosis Performing initial setup: Trying to find home server... Home Server = wiggum * Identified AD Forest. Done gathering initial info. Doing initial required tests Testing server: Axiom\WIGGUM Starting test: Connectivity ......................... WIGGUM passed test Connectivity Doing primary tests Testing server: Axiom\WIGGUM Starting test: Advertising ......................... WIGGUM passed test Advertising Starting test: FrsEvent ......................... WIGGUM passed test FrsEvent Starting test: DFSREvent ......................... WIGGUM passed test DFSREvent Starting test: SysVolCheck [WIGGUM] An net use or LsaPolicy operation failed with error 64, The specified network name is no longer available.. ......................... WIGGUM failed test SysVolCheck Starting test: KccEvent ......................... WIGGUM passed test KccEvent Starting test: KnowsOfRoleHolders ......................... WIGGUM passed test KnowsOfRoleHolders Starting test: MachineAccount Could not open pipe with [WIGGUM]:failed with 64: The specified network name is no longer available. Could not get NetBIOSDomainName Failed can not test for HOST SPN Failed can not test for HOST SPN ......................... WIGGUM passed test MachineAccount Starting test: NCSecDesc ......................... WIGGUM passed test NCSecDesc Starting test: NetLogons [WIGGUM] An net use or LsaPolicy operation failed with error 64, The specified network name is no longer available.. ......................... WIGGUM failed test NetLogons Starting test: ObjectsReplicated ......................... WIGGUM passed test ObjectsReplicated Starting test: Replications ......................... WIGGUM passed test Replications Starting test: RidManager ......................... WIGGUM passed test RidManager Starting test: Services Could not open Remote ipc to [wiggum.axmfin.com]: error 0x40 "The specified network name is no longer available." ......................... WIGGUM failed test Services Starting test: SystemLog An Warning Event occurred. EventID: 0x825A0024 Time Generated: 09/16/2010 13:26:35 Event String: The time service has not synchronized the system time for 86400 seco nds because none of the time service providers provided a usable time stamp. The time service will not update the local system time until it is able to synchron ize with a time source. If the local system is configured to act as a time serve r for clients, it will stop advertising as a time source to clients. The time se rvice will continue to retry and sync time with its time sources. Check system e vent log for other W32time events for more details. Run 'w32tm /resync' to force an instant time synchronization. ......................... WIGGUM passed test SystemLog Starting test: VerifyReferences ......................... WIGGUM passed test VerifyReferences Running partition tests on : ForestDnsZones Starting test: CheckSDRefDom ......................... ForestDnsZones passed test CheckSDRefDom Starting test: CrossRefValidation ......................... ForestDnsZones passed test CrossRefValidation Running partition tests on : DomainDnsZones Starting test: CheckSDRefDom ......................... DomainDnsZones passed test CheckSDRefDom Starting test: CrossRefValidation ......................... DomainDnsZones passed test CrossRefValidation Running partition tests on : Schema Starting test: CheckSDRefDom ......................... Schema passed test CheckSDRefDom Starting test: CrossRefValidation ......................... Schema passed test CrossRefValidation Running partition tests on : Configuration Starting test: CheckSDRefDom ......................... Configuration passed test CheckSDRefDom Starting test: CrossRefValidation ......................... Configuration passed test CrossRefValidation Running partition tests on : axmfin Starting test: CheckSDRefDom ......................... axmfin passed test CheckSDRefDom Starting test: CrossRefValidation ......................... axmfin passed test CrossRefValidation Running enterprise tests on : axmfin.com Starting test: LocatorCheck ......................... axmfin.com passed test LocatorCheck Starting test: Intersite ......................... axmfin.com passed test Intersite[/bullet] [bullet]I have manually started the User Profile Services service, and then if I log out and log back in, the desktop and user profile looks normal again. But it shuts down again next time I reboot the server
I have scanned for any type of Malware, and everything has come up clean

Next Actions:
I realize this is such a dynamic problem, and is kind of hard to quantify.  This server is VERY messed up.  So I don't know whether or not to try and fix this thing, or just scrap it and rebuild.  It doesn't do a whole lot, so I'm not too opposed to rebuilding it.  But I would want to be very careful to make sure I don't miss any steps.  Here are some of the things I would want to make sure got done right during a rebuild:
I would want to make sure all DC roles were transferred to my other DC before demoting this server.
Since this is our main certificate authority for all self-signed certificates, I would want to make sure we had the certificate thing figured out before rebuilding.
Make sure we're not going to lose any WSUS stuff during the rebuild.  It's pretty much defunct right now, so that's not a HUGE concern

So, does anyone have any thoughts on this very strange case?
0
Comment
Question by:Jake Pratt
7 Comments
 
LVL 3

Expert Comment

by:novaspoonman
Comment Utility
I would start with:

dcdiag /test:dns
netdiag (support tools)
trying to figure out the time discrepancy - NTP issue? Try to at least get the clocks on both DCs reasonably close by manual clock change.

I suspect AD is not replicating correctly either. Repadmin may help later.
0
 
LVL 7

Accepted Solution

by:
grantsewell earned 500 total points
Comment Utility
Well... first of all, ouch.

Have you looked to see which services are and aren't running? You might want to troubleshoot, but it probably would be best to rebuild.

My initial thought after reading the fourth of fifth problem was rebuild. Love the server name though :) Maybe SCORPIO can be the new one...

That being said, here are some ideas:

FSMO Roles: Here's the TechNet article on FSMO role placement. Migration is pretty easy, just make sure they're in the right spot. If you're gonna have only one domain controller, they all go in one place.
http://support.microsoft.com/kb/223346

Moving the CA: Very important, follow the TechNet article to the letter to make sure the old one is decommisioned properly and the new one comes up like it should.
http://support.microsoft.com/kb/298138

WSUS: WSUS relies primarily on Group Policy (if you configured it that way) and doesn't contain a lot of information. The big thing is re-downloading all the updates. Personally, I would put WSUS on a different server (and maybe the CA too), that way you don't have to run web services on your domain controller! Anyways, I've moved WSUS before, this article is a really good reference:
http://exchangeserverpro.com/how-to-move-wsus-30-to-a-new-server

Make sure when you're done with the DC that you demote it - don't just shut it down and delete! Ghost DC's floating around are never fun for active directory. The new DC should have a different name as well, to prevent conflicts. Always follow the TechNet articles and read everything twice. Have someone else review your steps to make sure you didn't interpret something incorrectly.

AD is your friend when it's working right :)

Good luck!
0
 

Author Comment

by:Jake Pratt
Comment Utility
Thanks for the info, both.  I ran the dns tests, and they pass on my good DC, but fail on my bad one with the error: Authentication failed with specific credentials.

I think I am going to rebuild.  I kind of feel like trying to fix it would be like putting a band-aid on an amputated limb.  Thanks for the tech articles.  I may actually try to move WSUS somewhere else.  When everything stopped working, and no one could get updates, I removed the group policies and let everyone start getting their own updates.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 3

Expert Comment

by:novaspoonman
Comment Utility
First step would be to transfer FSMO roles.

After that, you could dcpromo down to a member server, reboot, and try the dcpromo back to a domain controller. That may fix several issues without having to rebuild the DC from scratch.
0
 

Author Comment

by:Jake Pratt
Comment Utility
I thought about demoting it and re-promoting it again.  The thing that still makes me lean towards a total rebuild is the fact that it always boots up to the advanced boot options screen.  That kind of sounds like OS install problems to me, but I haven't looked into it too much.  I might try that first to see if it works, before a total rebuild.
0
 
LVL 59

Expert Comment

by:Darius Ghassem
Comment Utility
Have you disabled the Firewall service by chance? You should not disable this service.
0
 

Author Comment

by:Jake Pratt
Comment Utility
After demoting the server, there were still TONS of problems, so I just stayed late and rebuilt him.  Thanks to grantsewell for the articles.  I was able to manually transfer/seize all the roles, and backup/restore my CA as well.  Things seem to be working quite well today.  I ended up naming the new server "Jemaine"... any FOTC fans out there?!!  Thanks for all the help guys.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Redirected folders in a windows domain can be quite useful for a number of reasons, one of them being that with redirected application data, you can give users more seamless experience when logging into different workstations.  For example, if a use…
Introduction You may have a need to setup a group of users to allow local administrative access on workstations.  In a domain environment this can easily be achieved with Restricted Groups and Group Policies. This article will demonstrate how to…
This tutorial will give a an overview on how to deploy remote agents in Backup Exec 2012 to new servers. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as connecting to a remote Back…
This tutorial will walk an individual through the process of configuring their Windows Server 2012 domain controller to synchronize its time with a trusted, external resource. Use Google, Bing, or other preferred search engine to locate trusted NTP …

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now