[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

DC Failover

Posted on 2005-04-16
21
Medium Priority
?
430 Views
Last Modified: 2008-01-09
Gotta quick question concerning DC failover. Below is my configuration:

DC1: PDC, RID, Global Catalog
DC2: Schema, Domain Naming, Infrastructure, Global Catalog

(Having the Infrastructure Master on the same machine as a GC should be acceptable since BOTH DC's are GC's)

The question is this. I've noticed that if I have DC2 turned off, or down for maintenance, the domain login scripts don't seem to run on the user machines. Clients ARE properly authenticating to the domain and sucessfully logging in my utilizing DC1, but the script doesn't seems to run because none of the drives get mapped, ect.

I'm not experiencing any problems with the machines, no replication errors, no errors in the event log on either of the machines, the sysvol share with the scripts are present on both DCs.

Am I missing something? I thought that the DC's should fail over in a way that would be completley transparent, especially since both of my DC's are global catalog servers.
0
Comment
Question by:jschweg
  • 11
  • 9
21 Comments
 
LVL 97

Expert Comment

by:Lee W, MVP
ID: 13798424
What Drives are getting/not getting mapped?  How do you map them?
0
 
LVL 4

Author Comment

by:jschweg
ID: 13798789
I map my drives via login script, the login script appears to not be even running.
0
 
LVL 4

Author Comment

by:jschweg
ID: 13799307
I may have jumped the gun with this one.

After shutting down DC2, there seemed to be a short lapse of around 5 minutes where the login scripts didn't run, however after that short period, everything just started working. Wierd. Is a short failover period like that expected?
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13802132
I am aware that user logon scripts are not required to complete before the user can use the computer.

I am aware that computer Startup scripts are required to complete before a user can logon to that computer.

Maybe its taking a few minutes to apply properly. Are the mapped drives on DC2 or DC1 or on another server or something?
0
 
LVL 4

Author Comment

by:jschweg
ID: 13802216
All of my mapped drives are from my fileserver, a completley different machine than DC1 or 2
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13802346
OK, so you state that after 5 minutes the mapped drivers are properly mapped. Then I would say this would have something to do with the clients waiting for the scripts from DC2 and since there is a timeout, DC1 is contacted and scripts are applied.

Now since user logon scripts are not required to complete for a user to begin using the computer, and since authentication is succesfull, then you can start using the computer and bam, 5 minutes later the drives are mapped correctly.

I would say you try applying the scripts to Computer Startup instead of User Logon just to test out if what I am saying is what is happening.

I can't find any documentation on microsoft technet on this though. I do have some Microsoft MCSE white pages on this matter.

I know for example that you can change the way user logon scripts acts so that the user must wait for the scripts to complete before being able to use the computer, instead of assigning the scripts to the computer which would cause problems if a user you don't want to have access to the networked drives logs on to the computer.

This type of "cached" DC or soemthing sounds strange. I really don't even believe myself, but I personally cannot find any other explanation for it.

Perhaps testing out what I said will tell us whats happeneing.

As soon as I get some info on this from microsofts MCSE whitepaper I'll post it up ...

Good luck.
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13802360
The PDC is the primary Group Policy operation master and holds the primary copy of GP for the domain. So I wouldn't understand why DC1 wouldn't automaticlly apply the scripts, as it is the master for the GP of your domain.

Oh well ...
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13802361
You used a UNC path to the scripts correct?
0
 
LVL 4

Author Comment

by:jschweg
ID: 13803025
I appreciate the help.

It's actually not a 5 minute delay, what I meant to say was after about 5 minutes, if I re-login, the script runs normally. Before that time, I can log in, but the script doesn't run at all.
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13803181
OK, now that sounds a bit more interesting.

What happens when clients authenticate themselves to the domain is that they contact the closest DC to be able to authenticate.

If they cannot find this DC, the PC uses a cached "Access Token/Ticket" and the user is authenticated anyways becuase windows 2000 by default considers any previous logon credentials as valid enough to be able to logon to the domain even if it cannot do a realtime authentication against a DC.

However, Scripts for example may not be applied becuase the DC isn't available.

This shouldn't apply however if you would have more than 1 DC in the same domain in the same site.

Perhaps this is something related to your site configuration or domain configuration.

How many domains, sites, etc do you have configured? any specific detials you could give me as for me to sort out what the DC's are doing ...
0
 
LVL 4

Author Comment

by:jschweg
ID: 13803484
There really shouldn't be anything abnormal about the setup, at least I don't think so.

I have a single domain with 2 DC's (The AD roles are in my origional post), All of the PC's/Servers are all on the same subnet, so everything should be a single site.

It's very strange. After the first DC goes down, it's almost like the second one isn't available for a short period of time. Here is something I noticed that might explain the scripts not running:

During the period that the scripts aren't running, I noticed that if I type in the UNC to the share on my fileserver, the machine just sits there unresponsive. I then typed in the UNC to the netlogon share on my remaining DC, same thing just sat there. After a few minutes both shares popped into view and everything started working. I then logged into another one of my user machines and the login scripts were now running.

What I'm thinking is that the login scripts are not running becuase the user machines cannot resolve the UNC of the DC to access the netlogon share where the scripts are. After that magical 3-5 or so minutes, everything works.

Really makes no sense, if there was really a problem, I would expect to see errors/replication issues, which I see none.

0
 
LVL 4

Author Comment

by:jschweg
ID: 13803502
Both DC's are DNS Servers (AD Integrated), which are pointed to eachother as primary and themselves as secondary.
DC1 (The one I am shutting down) is also a WINS server, all client machines are configured to use it

I thought maybe it had to do something with the WINS server being down, but it should just use DNS if it isn't available, so I doubt it's that.
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13811312
Thats a big diffrence mate ...

You stated you were shutingdown DC2 .. not DC1 that changes everything ...

I'll be right back and explain ...
0
 
LVL 4

Author Comment

by:jschweg
ID: 13811384
I aplogize, that is a typo, DC2 is the one getting shut down. DC2 is the WINS server.
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13811679
I read your post throughly a bit more and came to the following conclusions ... as I see it this is not necesarily true: "I thought maybe it had to do something with the WINS server being down, but it should just use DNS if it isn't available, so I doubt it's that."

Really depends mate. If using a fully qualified name to the resource than DNS is used. If using Netbios name WINS or broadcast is used.

Wether one name or another is used depends on if you are using a DNS suffix correct??

I said that there is a diffrence between shuting down DC1 and DC2 becuase DC1 is PDC and holds the primary copys of all the GPO of the domain. When you modify a GPO the PDC is contacted and the GPO is modified there. It then replicates to other DCs (by defualt this happens, you can change it).

However, it really shouldn't even make a difference wether its primary GPO holder or not. You should have problems modifying your GPO, not problems with them being applied. I hesitated and what I said is not entirly true.

Your going to have to troubleshoot at a very low level, maybe get some active logging and stuff and try to see what is taking the computers so long to apply the scripts. What I do know is that users scripts to don't have to complete to be able to start using the computer.

What I am imagning is that one user script is taking too long to apply and times out after 10 minute defualt. After that the rest of the scripts are applied succesfully.

Another script apart from the mapped drives has to be failing.

You said that after 5 minutes the drives are mapped correct?

Are all other GP applied instantly and correctly on the user apart from the scripts. If so than its a specfic problem with the scripts. But not necesarilly related to their location.

To enable loggin check out these links. I also posted additional resources to other sites. I know they are alot, but reading a few things from here and from there made me realise how many things could influnce on this. And there really is no point in me explaining when microsoft explains so much better than me ...

Sorry, it seems like I am telling you to read and do all the stuff and research it on your own, but I rather give you proper links and proper whitepapers and not "try" to explain what you should do.

Your the one at the site, and your the best person to answer all the troubleshooting questions microsoft asks you in order to get the problem fixed...

>> This is what I was talking about in one of my first replies:

Group Policy is not applied due to cached credentials. But this is aparently not happening since GP are applied. What isn't applying are the scripts:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/0bead5a1-afba-4c58-984b-11881be5348e.mspx

>> These are some possible resources to find out whats going on.

Fixing Scripts policy settings problems:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/9d563f94-99e5-402b-bab1-abb597f6a974.mspx

Fixing Group Policy problems by using log files:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/0907105e-7856-4c93-b97f-a9a306623af5.mspx

Logon script replication:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/ServerHelp/12cedcb6-a076-461d-bc73-0cc4048eff0d.mspx

Order of events when starting up and logging on:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/ServerHelp/b74be6d3-ea6c-432f-9240-61e73168021d.mspx

Troubleshooting Group Policy Problems:
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/Operations/dfe7b84d-8727-4561-9767-ccb47a5bf9ba.mspx

Good luck mate. Please post any results wether failed or succesfull.

UICE
0
 
LVL 4

Author Comment

by:jschweg
ID: 13825852
Thank you for all of the information, I've gotten a little further with the problem. I don't know why I didn't check this before, but upon logging in during the period where the scripts are not running, I am receiving the following NETLOGON error in the client machine Event Log:


Event Type:      Error
Event Source:      NETLOGON
Event Category:      None
Event ID:      5783
Date:            4/16/2005
Time:            5:41:09 PM
User:            N/A
Computer:      <My Machine>
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller <FQDN of DC1>  for the domain <My Domain> is not responsive.  The current RPC call from Netlogon on <My Machine> to <FQDN of DC1> has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


It looks as though the machine is actually logging in using cached credentials because it cannot contact the DC that is still powered up, magically after a few minutes the DC becomes available.
0
 
LVL 4

Author Comment

by:jschweg
ID: 13825911
Also, during this time, if I attempt to access ANY share on ANY server on my network by referencing the UNC in the Start/Run line, it is unresponsive.

Few minutes later... bam, everything works.

I'm thinking that there isn't anything wrong with the DC, but for some reason the client machines cannot resolve any netbios names during this period, therefore when the machine attempts to login and attempt to connect to the DC via NETBIOS, it fails to contact the server.

What the heck would cause this to occur?, maybe it does have something to do with the WINS server.
0
 
LVL 4

Author Comment

by:jschweg
ID: 13825935
Therefore the scripts not running is NOT actually the root of problem, but rather just the portion of the problem I am seeing with my own two eyes.
0
 
LVL 8

Accepted Solution

by:
Leandro Iacono earned 2000 total points
ID: 13828959
This is probably DNS problem .... nothing refered to with WINS .. although it seems like it. AD relies on DNS, not on WINS. Wins is only used for comunication between clients with //UNC .. .thats pretty much it. When it comes to AD authentication and stuff its all DNS ...

"Netlogon Event ID 5783

Problem:
The source server listed in the error message was unable to complete a remote procedure call (RPC) call to the destination server. Most commonly, this means that either the source server could not locate the server in DNS or the RPC interface on the destination server is not working.

Solution:
If the source server could not locate the server in DNS, troubleshoot Active Directory replication failure due to incorrect DNS configuration.

If this is not a DNS problem, troubleshoot RPC problems."

Troubleshooting Active Directory—Related DNS Problems:
http://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/activedirectory/maintain/opsguide/part1/adogd10.mspx

For RPC TroubleShooting download w2k3 Server Resource Kit & Use the following utilities:

Rpccfg.exe: RPC Configuration Tool
Rpcdump.exe
Rpcping.exe
RPing: RPC Connectivity Verification Tool

http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-b18c4790cffd&displaylang=en

I couldn't find any info on how to use the RPC tools. Not even in Microsoft Technet Database. Maybe you could search it out ... sorry mate, but nothing popped up ...

Good luck mate. It all depends on those 2 factors. Now we know it has something to do with one of those 2 things. I personally think its due to RPC, cuase you aren't having trouble with anything else between DCs for example.

Run the RPC diag tools and see what comes up ...

I'll do a little more research later on. Sorry for leaving it hanging again on you ..

Cheers.
0
 
LVL 4

Author Comment

by:jschweg
ID: 13869916
All the dumps from netdiag and the RPC testers passed with flying colors on both DC's. Figures.

The only thing I can think of is the way my DNS is configured. Both DC's have AD Integrated DNS installed, point to eachother as primary and themselves as secondary. A MS Tech told me that this is the way it is supposed to be configured, and it makes sense to me. I would think that it shouldn't cause a problem being that it should just use itself as a backup, but dunno.

It MUST be some type of DNS or RPC problem, so you deserve the points for breaking this down to one of these two services.
0
 
LVL 8

Expert Comment

by:Leandro Iacono
ID: 13871964
I thank you for the points, and I am terribly sorry we couldn't actually figure out what was going wrong ...
It's strange that all the tests past and yet your getting errors in the event log...

I am terribly sorry, and please be sure to post any updates or any questions on this matter. I really don't like to give up, but I really don't know what to do anymore ...

If you are getting errors in the event log, but all diagnostic utilities pass, there must be a more serious problem. What did MS Tech suggest doing?
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

NTFS file system has been developed by Microsoft that is widely used by Windows NT operating system and its advanced versions. It is the mostly used over FAT file system as it provides superior features like reliability, security, storage, efficienc…
This article will help to fix the below errors for MS Exchange Server 2016 I. Certificate error "name on the security certificate is invalid or does not match the name of the site" II. Out of Office not working III. Make Internal URLs and Externa…
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
Suggested Courses
Course of the Month18 days, 16 hours left to enroll

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question