Link to home
Start Free TrialLog in
Avatar of SW111
SW111

asked on

Promoting secondary AD domain controller

We have 2 domain controller (w2k3, ad), in our network.
1. Server1 is the primary dc
2. Server2 is the secondary dc

About 6 months ago, the hdd on server 1 failed and it took us about 2 months to recover the system. But when we put the server back on, it never worked quite as well as it should be.

The symptoms are:
1. sometimes, admin has to reenter credential in order to access network shared folder. And it has to be in the form of user@mydomain and cannot be just the username.
2. We had to chenge shortcuts to point to ip address instead of the url address.

BUT the immediate and urgent problem now is:
1. We need to join a computer (w2k3) to the domain. But after inputting admin credential, the joining process is rejected because of target name is problematic.

The computer I'm trying to join has a different subnet from the dc (dc = 50.0.0.11), while this comp is to be the local server in a new office (50.0.3.11).
I've created site and subnet on the "active directory sites" window. But it cannot join the domain.

Replication of user name and credentials seem to work,because when I added a user on server2, it will automatically bbe added to server1 too.

Please help me with:
1. To fix this problem, Can I simply disable/shutdown server1, and use server2 ad dc instead?
2. How do i do #1 ?
3. How can we identify what is wrong with the system?

Thank you so much in advance.
SOLUTION
Avatar of MidnightOne
MidnightOne
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I am guessing that the Server 1 is in USN Rollback that's why you are facing such an issue, please check if the Server 1 has DSA Not Writable registry key in HKLM\System\CCS\Services\NTDS\Parameters.
It also seems like that Kerberos Authentication is failing that's why you have to use IP Address instead of FQDN which works on NTLM.
SOLUTION
Avatar of Shabarinath TR
Shabarinath TR
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of SW111
SW111

ASKER

Hello Experts, thanks for the reply. I do suspect that the problem has something to do with the tombstone thing.

MidnightOne: Yes, I suspected as much. Only that I wanted to be sure before doing anything as drastic (it sounded risky). And I'm not 100% sure on how to this yet.

Certexpert: i will try this when i get back to the office on monday. From what i gather, USN is somewhat similar to tombstone? (i.e, we didnt fix server1 fast enough at the time?)
Honestly though, I dont remember doing any role switching, promoting, etc. So at the time, server2 was not promoted to primary. I didnt know it was necessary.

Shabarinath:
thanks. Will look at the link and run repadmin and replmon on. Monday. In the mean time, here is the result of dcdiag:

 
C:\>dcdiag

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: HQ\SERVER1
      Starting test: Connectivity
         ......................... SERVER1 passed test Connectivity

Doing primary tests

   Testing server: HQ\SERVER1
      Starting test: Replications
         [Replications Check,SERVER1] Inbound replication is disabled.
         To correct, run "repadmin /options SERVER1 -DISABLE_INBOUND_REPL"
         [Replications Check,SERVER1] Outbound replication is disabled.
         To correct, run "repadmin /options SERVER1 -DISABLE_OUTBOUND_REPL"
         ......................... SERVER1 failed test Replications
      Starting test: NCSecDesc
         ......................... SERVER1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... SERVER1 passed test NetLogons
      Starting test: Advertising
         Warning: DsGetDcName returned information for \\server02.backbone.sakaj
kt.com, when we were trying to reach SERVER1.
         Server is not responding or is not considered suitable.
         ......................... SERVER1 failed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... SERVER1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... SERVER1 passed test RidManager
      Starting test: MachineAccount
         ......................... SERVER1 passed test MachineAccount
      Starting test: Services
            NETLOGON Service is paused on [SERVER1]
         ......................... SERVER1 failed test Services
      Starting test: ObjectsReplicated
         ......................... SERVER1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... SERVER1 passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... SERVER1 failed test frsevent
      Starting test: kccevent
         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 05/06/2011   17:12:17
            Event String: All domain controllers in the following site that
         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) has
         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) was
         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 05/06/2011   17:12:17
            Event String: All domain controllers in the following site that
         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) has
         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) was
         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 05/06/2011   17:12:17
            Event String: All domain controllers in the following site that
         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) has
         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) was
         An Warning Event occured.  EventID: 0x8000061E
            Time Generated: 05/06/2011   17:12:17
            Event String: All domain controllers in the following site that
         An Error Event occured.  EventID: 0xC000051F
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) has
         An Warning Event occured.  EventID: 0x80000749
            Time Generated: 05/06/2011   17:12:17
            Event String: The Knowledge Consistency Checker (KCC) was
         An Warning Event occured.  EventID: 0x80000785
            Time Generated: 05/06/2011   17:12:17
            Event String: The attempt to establish a replication link for
         An Warning Event occured.  EventID: 0x80000785
            Time Generated: 05/06/2011   17:12:17
            Event String: The attempt to establish a replication link for
         An Warning Event occured.  EventID: 0x80000785
            Time Generated: 05/06/2011   17:12:17
            Event String: The attempt to establish a replication link for
         An Warning Event occured.  EventID: 0x80000785
            Time Generated: 05/06/2011   17:12:17
            Event String: The attempt to establish a replication link for
         An Warning Event occured.  EventID: 0x80000785
            Time Generated: 05/06/2011   17:12:17
            Event String: The attempt to establish a replication link for
         ......................... SERVER1 failed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0x40000004
            Time Generated: 05/06/2011   16:20:47
            Event String: The kerberos client received a
         An Error Event occured.  EventID: 0x40000004
            Time Generated: 05/06/2011   16:33:54
            Event String: The kerberos client received a
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 05/06/2011   16:59:40
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 05/06/2011   16:59:40
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x00000457
            Time Generated: 05/06/2011   16:59:40
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0x40000004
            Time Generated: 05/06/2011   17:12:17
            Event String: The kerberos client received a
         ......................... SERVER1 failed test systemlog
      Starting test: VerifyReferences
         ......................... SERVER1 passed test VerifyReferences

   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom

   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation

      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : backbone
      Starting test: CrossRefValidation
         ......................... backbone passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... backbone passed test CheckSDRefDom

   Running enterprise tests on : backbone.mydomain.com
      Starting test: Intersite
         ......................... backbone.mydomain.com passed test Intersite
      Starting test: FsmoCheck
         ......................... backbone.mydomain.com passed test FsmoCheck

Open in new window


Thank you
That's a classic set of tombstoned DC errors.

The demotion/promotion isn't more than running DCDIAG from the command prompt unless you have other domains (either same forest as disjointed namespace of child domain). Just make sure SERVER2 - the server that didn't get tombstoned - has the FSMOs and is a global catalog.
Avatar of SW111

ASKER

Midnightone, server2 is a global catalog. I remember doing this, but how do i check if it is fsmo?

What we do have is another server but it is on a different site. (server3, 50.0.2.11) and we're tring to add a new server4 for a new site.

Once i confirm that it is an fsmo, can i go ahead and unplug srever1, then promote server2?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of SW111

ASKER

Hi CERTExpert,

I'm trying to digest the information on the microsoft link you provided. Can you please help me with some items:

1. To seize the fsmo role, should SERVER1 be unplugged from the network? (So it will not interfere with SERVER2 promotion?

2. And then, which steps should I do? (There's a lot of information on that link). Should I do the "Transfer FSMO roles" and then "Seize FSMO roles"?

2b. On which server should I do these steps?
"Transfer FSMO roles" on SERVER1 and
"Seize FSMO roles" on SERVER2 ?
WHich means step #1 above is wrong and SERVER1 should not be unplugged during the process?

Thank You





Avatar of SW111

ASKER

the result of "Netdom Query FSMO" is SERVER1 for all 5 roles.

So I suppose we need to change this to SERVER2?

On that matter, hardware-wise, SERVER1 seems to be running fine. Is it at all possible to fix SERVER1 instead of having to move the roles to SERVER2?

Thanks
Avatar of SW111

ASKER

Ok. after reading CERTExperts link on USN *http://support.microsoft.com/kb/875495", I ran the "repadmin /showutdev" command. Here is the result:

 
C:\BACKBONE>repadmin /showutdvec server1 dc=backbone,dc=mydomain,dc=com
Caching GUIDs.
..
Branch1\BranchSERVER01               @ USN     87411 @ Time 2010-07-08 19:57:45
HQ\SERVER1                           @ USN   8140909 @ Time 2011-05-09 10:09:44
HQ\SERVER02                          @ USN   2625728 @ Time 2010-07-08 20:57:45


C:\BACKBONE>repadmin /showutdvec server02 dc=backbone,dc=mydomain,dc=com
Caching GUIDs.
..
Branch1\BranchSERVER01               @ USN     89191 @ Time 2010-07-18 16:49:27
HQ\SERVER1                           @ USN   7874213 @ Time 2010-08-08 05:49:07
HQ\SERVER02                          @ USN   3052066 @ Time 2011-05-09 10:10:39

Open in new window


But the guide is saying:
"The output from DC1 shows a local USN of 10. DC2 has inbound-replicated USN 50 and will ignore the Active Directory updates that correspond to the next 40 USN numbers from the originating DC1."

However, in our case, the number on SERVER1 (=DC1?) is NOT lower than SERVER2 (=DC2?) and so the case might not be the same as the example?
Avatar of SW111

ASKER

Btw, I'm running W2K3, SP2. So further down on the guide, it suggest error messages that I should be looking for on event viewer. I don't find the same exact match, but these are the ones I found on event viewer on SERVER1:

 
ON FILE REPLICATION SERVICE:

The File Replication Service is having trouble enabling replication from SERVER02 to SERVER1 for c:\windows\sysvol\domain using the DNS name server02.backbone.mydomain.com. FRS will keep retrying. 
 Following are some of the reasons you would see this warning. 
 
 [1] FRS can not correctly resolve the DNS name server02.backbone.sakajkt.com from this computer. 
 [2] FRS is not running on server02.backbone.mydomain.com. 
 [3] The topology information in the Active Directory for this replica has not yet replicated to all the Domain Controllers. 
 
 This event log message will appear once per connection, After the problem is fixed you will see another event log message indicating that the connection has been established.

Open in new window


 
ON DNS SERVER:

The DNS server timed out attempting an Active Directory service operation on ---.  Check Active Directory to see that it is functioning properly. The event data contains the error.

Open in new window



 
ON DIRECTORY SERVICES (1):

Directory partition: 
CN=Configuration,DC=backbone,DC=mydomain,DC=com 
Source domain controller: 
CN=NTDS Settings,CN=BranchSERVER01,CN=Servers,CN=Branch1,CN=Sites,CN=Configuration,DC=backbone,DC=mydomain,DC=com 
Source domain controller address: 
14ef34f8-f975-4428-972c-a21cb27b633f._msdcs.backbone.mydomain.com 
Intersite transport (if any): 
CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=backbone,DC=mydomain,DC=com 
 
This domain controller will be unable to replicate with the source domain controller until this problem is corrected.  
 
User Action 
Verify if the source domain controller is accessible or network connectivity is available. 
 
Additional Data 
Error value: 
8457 The destination server is currently rejecting replication requests.

Open in new window


 
On Directory Service (2):

The Knowledge Consistency Checker (KCC) was unable to form a complete spanning tree network topology. As a result, the following list of sites cannot be reached from the local site. 
 
Sites: 
CN=Branch1,CN=Sites,CN=Configuration,DC=backbone,DC=mydomain,DC=com

Open in new window


 
The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition. 
 
Directory partition:
CN=Configuration,DC=backbone,DC=mydomain,DC=com 
 
There is insufficient site connectivity information in Active Directory Sites and Services for the KCC to create a spanning tree replication topology. Or, one or more domain controllers with this directory partition are unable to replicate the directory partition information. This is probably due to inaccessible domain controllers. 
 
User Action 
Use Active Directory Sites and Services to perform one of the following actions: 
- Publish sufficient site connectivity information so that the KCC can determine a route by which this directory partition can reach this site. This is the preferred option. 
- Add a Connection object to a domain controller that contains the directory partition in this site from a domain controller that contains the same directory partition in another site. 
 
If neither of the Active Directory Sites and Services tasks correct this condition, see previous events logged by the KCC that identify the inaccessible domain controllers.

Open in new window


Thank you
Avatar of SW111

ASKER

more event viewer details:

 
The kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/branchserver01.backbone.mydomain.com.  The target name used was LDAP/14ef34f8-f975-4428-972c-a21cb27b633f._msdcs.backbone.mydomain.com. This indicates that the password used to encrypt the kerberos service ticket is different than that on the target server. Commonly, this is due to identically named  machine accounts in the target realm (BACKBONE.SAKAJKT.COM), and the client realm.   Please contact your system administrator.

Open in new window

The Tombstone period is expired. So bringing Server1 online to the network is useless, where as you can do the rolse seizing.
You can follow the following article:-
http://support.microsoft.com/kb/255504
Avatar of SW111

ASKER

Vikastyagi, that's pretty much what CERTExpert posted above.
The posts that I posted afterwards is to make sure that we've identified the right problem, and so will be doing good instead of further damage.

Also, I need to check on how to do what on which computer.

Thanks
Avatar of SW111

ASKER

@CERTExpert,

The value in
HKLM\System\CurrentControlSet\Services\NTDS\Parameters

is: 0x00000004 (4)

What does this mean?

Thanks
Avatar of SW111

ASKER

URGENT-URGENT-URGENT
PLEASE HELP URGENTLY

So I decided to go ahead and do a "SEIZE FSMO role" only.
This is the steps that I've done:

1. Shut down SERVER1
The idea is that if we assume server1 fails, server2 should be able to be promoted by itself. I choose this scenario, in case somehow server01 will affect the promotion process.

2. I did step 1-6 on the "Seize FSMO role" on http://support.microsoft.com/kb/255504

3. When doing step #7, on the first role I'm trying to seize (domain naming master), I get this error:

 
fsmo maintenance: seize domain naming master
Attempting safe transfer of domain naming FSMO before seizure.
ldap_modify_sW error 0x34(52 (Unavailable).
Ldap extended error message is 000020AF: SvcErr: DSID-0321032A, problem 5002 (UN
AVAILABLE), data 8524

Win32 error returned is 0x20af(The requested FSMO operation failed. The current
FSMO holder could not be contacted.)
)
Depending on the error code this may indicate a connection,
ldap, or role transfer error.
Transfer of domain naming FSMO failed, proceeding with seizure ...
Server "SERVER02" knows about 5 roles
Schema - CN=NTDS Settings,CN=SERVER1,CN=Servers,CN=HQ,CN=Sites,CN=Configuration,
DC=backbone,DC=mydomain,DC=com
Domain - CN=NTDS Settings,CN=SERVER02,CN=Servers,CN=HQ,CN=Sites,CN=Configuration
,DC=backbone,DC=mydomain,DC=com
PDC - CN=NTDS Settings,CN=SERVER1,CN=Servers,CN=HQ,CN=Sites,CN=Configuration,DC=
backbone,DC=mydomain,DC=com
RID - CN=NTDS Settings,CN=SERVER1,CN=Servers,CN=HQ,CN=Sites,CN=Configuration,DC=
backbone,DC=mydomain,DC=com
Infrastructure - CN=NTDS Settings,CN=SERVER1,CN=Servers,CN=HQ,CN=Sites,CN=Config
uration,DC=backbone,DC=mydomain,DC=com
fsmo maintenance:

Open in new window


What went wrong? Did I messed up the system?
Please help.
Avatar of SW111

ASKER

Sorry for the many posts. As you can probably realize, this whole thing is giving me some anxiety attack...

Anyhow, I proceeded with seizing the remaining roles. BECAUSE I saw this website: http://www.petri.co.il/seizing_fsmo_roles.htm
which has the same exact error and seem to be ignoring it.

Using Netdom, I can see that the 5 roles now belong to SERVER2.

My question now is:

1. What now? Is it fixed?

2. At the bottom of http://www.petri.co.il/seizing_fsmo_roles.htm
It warns that:
 
Note: Do not put the Infrastructure Master (IM) role on the same domain controller as the Global Catalog server. If the Infrastructure Master runs on a GC server it will stop updating object information because it does not contain any references to objects that it does not hold. This is because a GC server holds a partial replica of every object in the forest.

Open in new window

But SERVER02 is infact a GC. Should I disable GC? (I thought GC was supposed to be on?)

3a. Should I format and reinstall SERVER1? Will it then be secondary server?
3b. Should the name be changed, or can I still user SERVER1 as the name (although new install)?
3c. Should the IP address be changed? Or can I keep using the old SERVER1 IP Address?

Thank You
Avatar of SW111

ASKER

?????
URGENT-URGENT-URGENT

After performing the above "Seize FSMO role" and rebooting SERVER02,
When logging back in, I'm given the following error:

"The system could not log you on due to the following error:
The specified domain either does not exist or could not be contacted."

?????? PLEASSE HELP URGENTLY...
Avatar of SW111

ASKER

sigh...
I tried entering adminname@domain.com as the login credential, I could login.
It took >1 hour to do this first login, and about 10 minutes to logout.

Subsequent login seems to work fine. I need only to enter my username (not full fqdn).

PROBLEMS:

1. I still cannot join computers to the domain
2. If I try to ping server2.mydomain.com from client computers, I will get the failure message:
"Ping request could not find the host server2.mydomain.com"
Although if I do this from server2 itself, I will get a reply.
Pinging IP address of server2 from clients will get reply.

3. on server2>DNS>Event properties, I now have this error:

 
This DNS server was unable to open Active Directory. This DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it. Check that active directory is functioning properly and reload the zone.

Open in new window


4. So I guess the question becomes: how do I check if AD is functioning properly?
Avatar of SW111

ASKER

Update:

I tried rebooting the system, and now I ended up not being able to login to the system at all.
It says my domain does not exist.
Avatar of SW111

ASKER

anyone?
This is a long thread and I have not read it all in detail. so to save me (and others) the time.... can you please confirm my summary is correct.
- Server1 (FSMO Role Holder) and Server2
- Server1 failed and was offline for two months and was brought back online
- You have since taken server1 back offline
- You have seized the FSMO roles to Server2
- Server1 is still off the network
- since doing this you are unable to log onto server2 and it appears that active directory is not starting?
Avatar of SW111

ASKER

We solved it!!
It turns out that I'm having the exact same problem with this person:
http://www.petri.co.il/forums/showthread.php?t=26637&page=6

And the solution is to run (see post by Kennhon #59):
netdom resetpwd /server:Server2 /userd:mydomain\Administrator /passwordd:*
Only that in my case I need to change "Server2" to IP Address (probably because DNS not working yet)

And after that reboot, and when I relogin, the dns filled itself and is back to its normal state.

My other EE post also describes the process more clearly:
https://www.experts-exchange.com/questions/27025826/windows-2003-active-directory-dns-post-fsmo-problem.html
Avatar of SW111

ASKER

Was supposed to close and award point. Apparently I pressed the wrong button?
Hey Sorry SW111, I couldn't check the site ..didn't get the time at all.. :(
Please feel free to ask if u have any question.
Avatar of SW111

ASKER

Thanks Certexpert. No worries, problem was solved. Thanks for your help