Domain Controller that contained the FSMO roles has gone down

I have a domain among a handful of schools that run the student labs. The Domain Controller that was first used in building this domain, and that houses the FSMO roles, has gone down as a result of a bad RAID card. A new RAID card was installed about 36 hour later and the DC was fired back up. However, their are now issues with this domain. No DNS communication, Group Policy issues, etc. since this DC that is the holder of the FSMO roles went down.

What is the approach I must take now.

I am thinking about seizing the FSMO roles and transfering them to another DC within the domain. Then, I would have to clean up the metadata for the DC that originally housed the FSMO roles and then re-add that DC back to the domain again using a different server name.

Would this be the correct approach or am I missing something here?

Thanks for your assistance.

Sean
skenny10IT ManagerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

JonathanSpitfireSenior Solutions EngineerCommented:
sounds good to me.

I would:

Seize the roles
demote the bad DC
Make sure AD is clean of the bad DC (ADSIEdit, etc)
I would format and resinstall the OS on the bad DC if you can....not just change the name. There are few things worse than a questionable DC that you're not sure you can trust.
then promote it back after the install (use a different name to be sure there are no discrepancies)

Hope this helps!

Jonathan
arnoldCommented:
Was the RAID controller replaced and the system brought up as it was or was the RAID data unrecoverable such that the system was restored from backup?
If restored from backup, you are getting rid related errors?
The restore/bringing the DC from backup might have been a mistake.
One should never restore a DC from image when there are other DCs in the environment.
Where was/is the DHCP server?
Best approach is shutdown this server (failed DC with FSMO roles) if everything was functional when it was down before it was brought back.

You probably need to go through sites and services.ntds and make sure every remaining DC has a check mark in the GLobal catalog option.


In a multi DC environment, using ntdsutil one can seize the roles. And the repaired system can then be readded after OS reinstall.

.....

Usually the master DC restoration of AD has to go through non-authoritative restore, but that is not an option when the system is restored from an IMage/backup.
MikeIT ManagerCommented:
Seize the roles, demote the bad DC, unjoin the bad DC from the domain, do a metadata cleanup, reinstall the OS on the bad DC, join back to domain, promote back to Domain Controller, move FSMO roles back.
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

skenny10IT ManagerAuthor Commented:
After the RAID controller was replaced, the system brought up as it was previous. There was no restore process taken. The DHCP server resides on another box at this same school. All other DCs were setup as GC servers as well.
MikeIT ManagerCommented:
What happens if you run repadmin /syncall and repadmin /showrepl from the "bad DC"?
Will SzymkowskiSenior Solution ArchitectCommented:
Would this be the correct approach or am I missing something here?
If the DC has been restored and you have the opportunity to transfer the roles gracefully I would try this method first. If you cannot transfer the roles gracefully then perform the role seize.

If this DC fsmo role holder is online (power it off) then perform the seize role operation. When you transfer the PDC role to another DC you will also need to ensure that you configure the external time source as well.

https://support.microsoft.com/en-us/kb/816042
http://blogs.technet.com/b/nepapfe/archive/2013/03/01/it-s-simple-time-configuration-in-active-directory.aspx

Make sure that you also transfer any other roles that this DC may hold DHCP etc before seizing the roles and powering it off.

When you seize the roles this DC that use to hold the roles can never come back online with that name/sid.

Will.
Lee W, MVPTechnology and Business Process AdvisorCommented:
So AD is having issues... what kind of diagnostics have you done?  Any?  Checked the Event Logs?  DCDIAG?  Did you check the status of services on the failed DC? You can seize the roles if you want, but in theory, the DC's failure should have been equivalent to an unexpected power hit and then restart.  If the system booted it should be fine.  Now hard power offs can cause problems, but before seizing roles, I would be looking at what's going on... and what happens if you turn off the repaired server?  Do things work normally?   Could be as simple as the DNS server didn't startup (given the amount of information provided)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
arnoldCommented:
There is something missing, there is no way that a system down for 36 hours the failed RAID controller replaced, the RAID config read from the drives, the system booted, that you should be running into this issue.  This type is equivalent to a power loss on the system and no one realized turning it on 36 hours later.

something is not making sense.  was the RAID controller replaced or were the drives moved to an equivalent system such that the IP allocation to the new system based on the MAC address changed the IP allocated to this system. i.e. DCname was on 192.168.0.2 and now it got a 192.168.0.5 the DHCP IP allocation scope is pointing to 192.168.0.2 as the only DNS server versus pointing to as many DNS servers that exist in the environment?

If the issues only exist when this DC is up and connected, I would shut it down (disconnect from the network) in the event you want to get data off of it using USB/external drives.

dcdiag/netdiag/repadmin should only be run with the earlier failed DC offline.

The only thing I could think a Lee pointed out to the poweroff deals with Journaling, is there an error dealing with netlogon can not be brought up because of a journalling error and the detail includes the fix i.e. edit registry add a key with a value of 1 then restart a service and within 5 minutes, netlogon is restored/rebuilt?
skenny10IT ManagerAuthor Commented:
Thanks for everyones input. Upon closer analysis there was a DNS issue here. Changing the DC that had gone down to point to another one of the Domain Controllers for DNS got things communicating again, saving the hassel of me having to take my original approach.

Many thanks.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2012

From novice to tech pro — start learning today.