Possible AD database corruption Server 2012 R2 Standard

2 out of 4 rebooting PCs are losing trust with the domain and I cannot re-join them.  The DC will not reset/delete/create new computer or user accounts stating 'internal error'.

I have inherited a 3 DC environment, all hosted on VMs and all running Server 2012 R2.  
Site 1: PDC holding FSMO role
2 separate remote sites with DCs.
All of them are GCs.
VPN connection between the PDC and each remote DC, but no connection between the remote DCs.  All 3 DCs in the single DEFAULTIPSITELINK.

I need to break out the single link into two:  PDC to remote DC1 and PDC to remote DC2.  However, the PDC and one of the remote DCs will no longer allow me to create or delete AD objects because of an 'internal error', and none of them are replicating so I'd appreciate any advice on the order in which to do things.  Some of the errors point to a D4/D2 FRS restore, and some suggest an AD database restore.

On the PDC, Directory Services are throwing event IDs:
1699 when trying to connect to the DC at Site 2 saying "1127 while accessing the hard disk, a disk operation failed even after retries"
1435 KCC unexpected error on the Default-First-Site-Name problem 14 (Bad address), data -510 and the same for remote Site 1
DNS is constantly throwing out Event ID 4015 with problem 14 (Bad address), data -510 for the remote DCs
Dcdiag on this server reports KCC encountered an unexpected error while performing an Active Directory Domain Services operation, no replication with Site 1 since 4-29 and a latency warning, and says nothing about Site 2.
Within DNS, the PDC is showing SRV records under the _msdcs.domain.local>gc>sites>Site 2 and Default-First-Site
Cannot create, delete or reset anything in AD Users & Computers

On the Site 2 DC which was recently installed/promoted to replace failed DC:
Directory Service is throwing event IDs 1925, 1865, 1311 all complaining about Site 1, for which I know there is no direct link.
FRS is throwing event ID 13508 saying it cannot resolve the DNS name to the PDC or FRS is not running on PDC.
Dcdiag on this server - fails advertising DsGetDcName returned information for \\PDC.domain.local when we were trying to reach Site2DC. Server is not responding or is not considered suitable. FrsEvent has warning or error events within the last 24 hours after the SysVol has been shared. KCC says all servers in the following site that can replicate the directory partition over this transport are currently unavailable, unable to form a complete spanning tree. NetLogons failed with error 67 network name cannot be found. Replication latency warning stating replication path was perempted by higher priority work from PDC to Site2DC while accessing the hard disk, a disk operation failed even after retries. Remote bridgehead Site1DC is not eligible as a bridgehead due to too many failures.
Cannot create, delete or reset anything in AD Users & Computers

On Site 1 DC:
Directory Service is throwing event IDs 1925, 1865, 1311 all complaining about Site 2, for which I know there is no direct link.
Dcdiag on this server - KCC says all servers in the following site that can replicate the directory partition over this transport are currently unavailable, unable to form a complete spanning tree.  Replication latency warning stating replication path was perempted by higher priority work from PDC to Site1DC while accessing the hard disk, a disk operation failed even after retries.  Dynamic registration or deletion of one or more DNS records associated with DNS domain domain.local failed.  Remote bridgehead Site2DC is not eligible as a bridgehead due to too many failures.
I am able to create users and work with computer objects on this server.

I can ping each of the servers from the other.  Any advice or help is appreciated.
QITGHAsked:
Who is Participating?
 
MaheshArchitectCommented:
From when you are facing this issue?

It seems that some hardware / disk problem on PDC server?
Are you able to transfer fsmo to site 1 or site 2?
Try below
1st open any any network ports between site 1 and site 2 dcs and see if they are able to communicate and replicate with each other
If yes, Then transfer fsmo to site 1 or 2 and see if you are able to create new objects
I guess remote dcs are unable to contact current PDC

If fsmo transfer, replication and new object creation worked as expected, u can demote primary site dc and promote it again on clean hardware
If in case ad replication is working between site 1 and site 2 and if fsmo transfer not happening, u can seize fsmo on either site. 1 or 2 dc and then demote primary site dc

If none of above worked, then u are in serious trouble like you need to look after forest recovery procedures which involves restoration of ad from last good system state backup
0
 
QITGHAuthor Commented:
Thanks Mahesh.  We weren't able to open a network path between sites 1 & 2, but site 1 was able to seize FSMO and and a successful rebuild of PDC seems to be doing the trick.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.