• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 985
  • Last Modified:

Member Server and Active Directory Site association

Hi,

I have three Active Directory sites (A, B & C). Sites A & B have been in operation for many years and have no issue. Site C is new.

Site C consists of the following subnets
- 10.0.1.0/24 - web servers
- 10.0.2.0/24 - app servers
- 10.0.3.0/24 - domain controllers

All Subnets, Sites and Site Links etc have been created in Sites and Services.  
Site C has 2 domain controllers up and running fine (in 10.0.3.0/24 subnet)
Replication between all domain controllers in all sites works without error. I've run the following to confirm replication is working:

Repadmin /REPLSUM
Repadmin /showrepl
Repadmin /showreps
dcdiag /q
dcdiag /test:checksecurityerror
dcdiag /test:checksecurityerror /replsource: Site-B domain controller

Problem
I've used the nltest /dsgetsite command on member servers in Site C to check which site each server is associated with.  Each Site C member server shows it's a member of either Site A or B, but not C.  Also, the system variable LOGONSERVER= on all Site C member servers shows a domain controller in Site A or B. When I run nltest /dsgetsite on the Site C domain controllers, they both show correctly that they belong to Site C. There have been rare cases where, very briefly, I've seen the nltest /dsgetsite command return the correct site for Site C member servers, but shortly after the value changes to either Site A or B.

This above problem scenario has been reproduced in all of the Site C subnets, even 10.0.3.0/24, where the domain controllers themselves are ok. Test servers have been created in each Site C subnet to check if correct site associate occurs - it doesn't.

I've confirmed that the firewall config between all subnets in Site C is correct - I've basically opened an bidirectional any/any rule to assist with troubleshooting. A similar rule has been created for the bridgehead servers in Sites A & B to the bridgehead server in Site C. Every port in every direction is open.

I've been working on this issue for a while now but haven't been able to crack it. Any and all suggestion are welcomed.

Thanks!

Joel
0
AVIVOL
Asked:
AVIVOL
  • 19
  • 14
1 Solution
 
Dan McFaddenSystems EngineerCommented:
Is could be an issues with your DNS's SRV records.  What is the output from the following command?:

1. open command prompt
2. run nslookup
3. at the nslookup prompt run, set q=srv
4. at the nslookup prompt run, _ldap._tcp.<site name>._sites.dc._msdcs.<domain name>

Also, what subnet are the client computers on?  Is that subnet defined in Sites & Services?  How are the 3 sites connected?

Is there a catch all subnet defined in any other sites?  For example, is there a subnet defined in AD as 10.0.0.0 and assigned to a site?
0
 
AVIVOLAuthor Commented:
Hi Dan,

Thanks for your reply. Here's the answers.

nslookup result. Ran from a Site C domain controller. The DNS request time out message is confusing.

> _ldap._tcp.site-c._sites.dc._msdcs.domain-name
Server:  dc02.domain-name
Address:  10.0.3.51

DNS request timed out.
    timeout was 2 seconds.
_ldap._tcp.site-c._sites.dc._msdcs.domain-name     SRV service location:
          priority       = 0
          weight         = 100
          port           = 389
          svr hostname   = dc02.domain-name
_ldap._tcp.site-c._sites.dc._msdcs.domain-name      SRV service location:
          priority       = 0
          weight         = 100
          port           = 389
          svr hostname   = dc03.domain-name
dc02.domain-name      internet address = 10.0.3.51
dc03.domain-name       internet address = 10.0.3.52

Client computer subnet?
Client computers (servers) have been added to all three Site C subnets to test the problem. So that's 10.0.1.0/24, 10.0.2.0/24 and 10.0.3.0/24. All are in Sites & Services and all are associated with the Site C site

How are sites connected
Sites A & B are connected via private 20mb fibre link. Been in place for years - no problems.
Site A -> C and Site B -> C connect via OpenVPN. This is the setup:
Site A has OpenVPN server running. OpenVPN client is running in Site C and connects to Site A Open VPN server.
Site B has OpenVPN server running. OpenVPN client is running in Site C and connects to Site B Open VPN server.
Site C has OpenVPN server running. OpenVPN client is running in Site A & B both connect to Site C Open VPN server.
Firewall routes have been configured to route all relevant traffic to appropriate client in each site. This is working as AD replication and all other tcp/udp connectivity is fine.

Catch all subnet?
No. Good question though.
0
 
Dan McFaddenSystems EngineerCommented:
What do the DC ipconfigs look like... specifically the DNS Server config?
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
AVIVOLAuthor Commented:
Each server has two domain controllers. Each domain controller has itself as the primary DNS server and the domain controller in the same site as the secondary DNS server.
0
 
Dan McFaddenSystems EngineerCommented:
I would modify your DC DNS config a bit.  This may not address your issue, but is it a best practice when managing a multi-site AD infrastructure.  There are 2 positions on the DNS cfg item...

#1 as you have them configured.  1st DNS self, 2nd DNS another DNS server.
#2 1st DNS as another AD DNS server, 2nd DNS self.

In both configs, the 3rd DNS could be 127.0.0.1.  But this depends on how DNS is configured.  If you allow DNS on all available IPs, then fine.  If you stick DNS operation to a specific IP, then the 3rd config'ed server will only satisfy a best practice audit.

Best Practice Article:  http://abhijitw.wordpress.com/2012/03/03/best-practices-for-dns-client-settings-on-domain-controller/

I would use config #2.  There are known race conditions where certain networking dependent services start before DNS is locally available, this can (and does) throw errors which are just annoying.

Are you using GPOs to manage settings on the DCs?  Are you forcing the firewall on for the domain policy?  If not, I would test with forcibly turning off the firewall on the site C DCs.

Are you managing your IP address range with DHCP?  Is there some odd (legacy) local or global scoped configuration for subnets?

Are there any error or warning events on the clients when logging in?

Sorry for the "goose chasing" but its not so easy when you can't just look at it all.

Dan
0
 
AVIVOLAuthor Commented:
Thanks again for your response, Dan. No issues with the goose chase - I've been on the trail quite a while already.

I've updated the DNS on the Site C DCs as per your #2 scenario. I also added the 127.0.0.1 - we allow 'all available ips'.

Using GPOs: Yes
Forcing Firewall: No. Firewalls have been disabled on all DCs
Managing IPs with DHCP: DHCP is operating in Site C (should mention it’s located in an AWS AZ), but all member servers are configured with static IPs/DNS. It’s a new site so

Event log (error or warnings): After updating DNS to scenario #2, I rebooted each Site C DC, ensuring DC02 was fully available before rebooting DC03.

DC02 results:
System Log: no errors/warnings

Application Log: no errors/warnings

Active Directory Web Services Log: had the following error, but from yesterday (not after the reboot)

Warning Event: Active Directory Web Services could not find a server certificate with the specified certificate name. A certificate is required to use SSL/TLS connections. To use SSL/TLS connections, verify that a valid server authentication certificate from a trusted Certificate Authority (CA) is installed on the machine.
 
 Certificate name: DC02.domain-name

This is curious. I've looked at the certs on DC02. There's only one with the cert name mentioned in the event log. It sits in the Remote Desktop folder. I tried auto-enrolment, but it's disabled. It's probably the right time to confess certs are not my strength. Also, I can't be positive, but I don't think Site C DCs would have had connectivity to Site A DCs (and the CA) when they were made DCs. There was a connection between Site C and Site B, which is how the new Site C DCs authenticated to the domain, but I'm pretty sure Site A wasn't connected to Site C at that stage.

Once again, thanks for your help.

Joel
0
 
AVIVOLAuthor Commented:
...and to add to the confusion. I have a single server that site in 10.0.2.0/24 subnet, that always shows the correct association (site C). I created a script a couple of weeks ago that runs the "nltest /dsgetsite" every 5 minutes and writes the results to a log file. Every time the server is started, it shows Site A for a few minutes and then, correctly, changes to Site C.
0
 
Dan McFaddenSystems EngineerCommented:
OK, so more questions:  Are all your DCs GCs?

On the clients hitting the wrong site, what is the value of the following reg key:

HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\DynamicSiteName

Can you look into the netlogon.log file and post the contents?  Or as much as you comfortable with, from a problem child server/client and the server on the 10.0.2.x subet.

Link below describes some troubleshooting ideas & best practice references.

http://blogs.dirteam.com/blogs/paulbergson/archive/2010/04/19/ad-clients-not-authenticating-to-its-local-site.aspx
0
 
AVIVOLAuthor Commented:
All DCs GCs: Yes

Reg Key Value on clients hitting wrong site: client in 10.0.1.0/24 = Site B // client in 10.0.2.0/24 = was Site A, then after about 5-10 min dynamically changed to Site B (both should be Site C)

Netlogon.log was empty on both clients (10.0.1.0/24 and 10.0.2.0/24). The NetSetup.LOG had some interesting info but I'll need to sanitise before posting. Hopefully soon.

Will review the link you provided.

Thanks again,

Joel
0
 
AVIVOLAuthor Commented:
Still working through the netsetup.log file.

Thought I'd update you with DynamicSiteName reg value. I'd read about this key a couple of weeks ago in a forum - one of many, many things to look at. I changed the value to the correct site on on server in 10.0.2.0 to Site C. Sure enough, the nltest /dsgetsite command showed the correct site - no surprises. The NLTEST /DSGetDC:ozvol.org.au however, still shows connection to a DC in site B. I rebooted. On reboot, the nltest /dsgetsite command showed Site B (sigh...), as did NLTEST /DSGetDC:ozvol.org.au. I kept checking. Approx 5 min later, /dsgetsite showed Site C and /DSGetDC:ozvol.org.au correctly showed connection to a DC in Site C.

I rebooted the server approx 5 times and the same thing happened, although the correct site association seems to happen a lot faster (<2 min). I also reproduce it on a server in one of the other subnets and it worked identically.

So, it looks like the manual change of the  DynamicSiteName key has reminded, coerced, forced the server to identifying and staying in the correct site.

But....I don't want to have do this for every server/client.

Any thoughts?
0
 
Dan McFaddenSystems EngineerCommented:
Can you run the following commands and post?

1.  nltest /dclist:<YOUR-SITE.COM>
2.  nltest /dnsgetdc:<YOUR-SITE.COM>

The process of updating the DynamicSiteName key goes thru several DNS queries inorder to update (if needed).  Here's an explanation of the process:  http://windowsitpro.com/windows-server/q-how-can-client-computer-determine-which-site-it-belongs

Also, what happens if you run the command:  nltest /sc_reset:<YOUR-SITE.COM>, froma server/client suffering the wrong site issue?

Dan
0
 
Dan McFaddenSystems EngineerCommented:
Another few more questions:

1. How are you sync'ing time in your domain?
2. Is there a local clock time difference on your domain controllers?  If so, is it more than 5 minutes?
3. Have you configured your PDC to be a trusted time source?
-- Link:  http://technet.microsoft.com/en-us/library/cc786897(v=ws.10).aspx
0
 
AVIVOLAuthor Commented:
I ran  NLTEST /DSGetDC:domain-name from a member server in 10.0.1.x (Site C). It has never shown correct site association. Here’s the output (the 19.168.1.x subnet is associated with Site A)

      Address: \\192.168.1.x
     Dom Guid: 81fa0a1c-ef87-40cd-90c7-64f3c3e4a3ab
     Dom Name: domain-name
  Forest Name: domain-name
 Dc Site Name: Site A
Our Site Name: Site A
        Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_DC DNS_DOMAIN DNS_FOREST CLO
SE_SITE FULL_SECRET WS
The command completed successfully

Result from nltest /dnsgetdc:domain-name. Does the "non-site specific" refer to the query results being shown in non-site-specific order or that the DCs aren't considered to be associated with a site? Surely not the latter.

List of DCs in pseudo-random order taking into account SRV priorities and weight
s:
Non-Site specific:
   dc03.¬domain-name 10.0.3.xx  (Site C)
   dc02. ¬domain-name  10.0.3.xx  (Site C)
   dcx1.domain-name  10.1.1.xx (Site B)
   dcx2.domain-name  192.168.1.xx (Site A)
   dcx3.domain-name  10.1.1.xx (Site B)
   dcx4.domain-name  192.168.1.xx(Site A)
   dcx5.domain-name  10.1.1.xx (Site B)
The command completed successfully

Here’s the result of nltest /sc_reset: (nothing changed. The server remained incorrectly associated with Site A and the same DC in Site A)
Flags: 30 HAS_IP  HAS_TIMESERV
Trusted DC Name \\dcx2.domain-name  (Site A)
Trusted DC Connection Status Status = 0 0x0 NERR_Success
The command completed successfully



Time sync details
Apart from the DC that hosts PDC operation master role (Site A), we rely on Windows hierarchical relationship to manage time (as per here: http://support.microsoft.com/kb/816042). The DC hosting PDC op master is configured to use an external time server from the NTP Pool Project (http://www.pool.ntp.org/)

Local clock time difference on DCs?: No. All at identical.
PDC trusted time source?: Yes
0
 
AVIVOLAuthor Commented:
I wanted to add something else to this - more of gut feeling than anything. I'm fairly sure that a couple of the subnets in Site C weren't set up in Sites and Services when member servers were added to the domain. As there was no subnet or site associated, I'm guessing AD just assigned one (perhaps randomly??). The subnet and site association was then created, but it's as if somewhere in AD the initial association is holding so when new servers are added, they are associated with the initial "random" site. Changing the DynamicSiteName reg key value seems to override this hold. Thought it might be relevant.

Do you know where, or if, site associations are stored? Perhaps there's a static entry for these subnets. If so, could adsiedit be used to correct?
0
 
AVIVOLAuthor Commented:
Update: I tried deleting the affected subnets from sites and services, initiating dc replication, confirming all dcs picked up the change, then recreated the subnets and associated with Site C.

Received mixed results (remembering all Site C subnets were not associating with Site C):
10.0.1.x = Site C (great!)
10.0.2.x = Site A, then Site B (not great)
10.0.3.x = Site A (not great)
10.0.4.x = Site C (great!)

It makes me think that somewhere in AD there's some sort of static record.

So, 2 out of 4 subnets are now correctly associated. Summary:
New servers added to 10.0.1.x and 10.0.4.x are correctly associate with Site C.
New servers added to 10.0.2.x and 10.0.3.x are correctly associate with Site C. When I change the DynamicSiteName reg key on servers in subnets 10.0.2.x and 10.0.3.x, then reboot, the Site C association works and remains. But again, I don't want to manually edit the registry of every new server.
0
 
Dan McFaddenSystems EngineerCommented:
The "non-site specific" just refers to the list being a global query for the data.  Its saying this is not the view of the world from a specific AD Site.

I would double check your Inter-Site Transports next.  You probably have 2 IP transports defined like so, assuming SiteA is your main location:

Name:          Type:        Cost:  Repl Interval:
SiteA-SiteB  Site-Link   100    180
SiteA-SiteC  Site-Link   100    180

Is this how they are setup?  Is there a SiteB-SiteC link defined?  If so, how is it defined?

Can you verify the properties of the site links?  I would look at the "Sites in this site link" area.
0
 
AVIVOLAuthor Commented:
I have three Inter-Site Transports. All IP Site Links.

Name:         Type         Cost  Rep Int    Sites-In-This-Link
SiteA-SiteB  Site-Link  100    15            SiteA, SiteB
SiteA-SiteC  Site-Link  100    15            SiteA, SiteC
SiteB-SiteC  Site-Link  100    15            SiteB, SiteC

And just to confuse things, I started up the test servers I created for each subnet in Site C to see if the site associations were still the same. You may recall they looked like this:

Received mixed results (remembering all Site C subnets were not associating with Site C):
10.0.1.x = Site C (great!)
10.0.2.x = Site A, then Site B (not great)
10.0.3.x = Site A (not great)
10.0.4.x = Site C (great!)

They're now this:
10.0.1.x = Site C (great!)
10.0.2.x = Site C (great!)
10.0.3.x = Site A (not great)
10.0.4.x = Site B (not great)

What the hell is going on! I just can't understand why the servers 1) don't associate correctly, and 2) why the association jumps between SiteA & B so often.

Once again, thanks for your perseverance.

Joel
0
 
Dan McFaddenSystems EngineerCommented:
And the properties of your site links contain only the sites referenced in their names?

What is the connectivity configuration between sites?  Do all 3 sites have direct connects to each other?  What is the bandwidth between sites?  

Is there a physical firewall between these subnets somewhere?  Or a router with ACLs on?

It appears that clients on the subnet that aren't hitting the correct site, can't find a local-site DC.  It seems as though DNS SRV records are correct meaning that after being told which DC to talk to, the client can't contact the local DC, so it asks for another DC which is out of the site.

If AD and DNS appear to be correctly config'ed, it time to look at the network infrastructure supporting the services.
0
 
Dan McFaddenSystems EngineerCommented:
OK, I just realized I asked the connectivity questions again.

So, I suggest a test.  Raise the site link cost of the SiteB to SiteC, to 300.  Essentially meaning that the preferred path for AD traffic will be thru Site A.  Make the change on the PDC Master, then force replication and wait a few minutes.

Then verify replication and verify (again) your site SRV records in DNS, from the PDC master.

As for the DynamicSiteName key.  the article I referenced above refers to another key to use to override the DynamicSiteName key.

There are only 3 sites in AD, right?  There isn't another site that is causing a conflict?
0
 
Dan McFaddenSystems EngineerCommented:
Joel, is there any update on the site issue?
0
 
AVIVOLAuthor Commented:
Hi Dan. Weird. I've been checking EE for the last couple of days but there have been no updates to the thread. I received an email earlier saying there's been a post and see your question re "any update". I look back and see you've made to posts on 28th - they weren't listed for the last couple of days. Anyway....


Here are the answers to last set of questions:

Q: sites referenced in their name?
A: not sure exactly what you mean here. If you mean name versus, for example, a rouge guid, then yes. only their names are being displayed.

Q: connectivity/bandwidth between sites?
A: Site A <-> B = 20mb dark fibre (via vpn). Site A <-> C = 12mb adsl (via vpn). Site B <-> Site C = 20mb pub internet (via vpn)

Q: physical f/w or router?
A: for site A & B = y. Site C (AWS) = n. All f/w rules between all sites = any/any (to assist with t/shooting)

Q: It appears that clients on the subnet that aren't hitting the correct site, can't find a local-site DC.  It seems as though DNS SRV records are correct meaning that after being told which DC to talk to, the client can't contact the local DC, so it asks for another DC which is out of the site
A: The wierd thing is that a server in Site C will be rebooted and correctly associate with Site C. The next reboot, it'll incorrectly associate with the wrong site (A or B). Even when changing the DynamicSiteName key, this random assignment of site association still occurs.

Q: There are only 3 sites in AD, right?  There isn't another site that is causing a conflict?
A: Yes, only 3 sites. No other site causing a conflict

Re the test: Can you explain why changing the cost to 300 would impact the SRV record in DNS (from PDC maste).

I'm happy to make the cost suggestion, but would opt for a slightly different option: the connection between Sites B & C is better than Site A & C. So this is the change would be to raise the site link cost of the SiteA to SiteC to 300, thereby prioritising the connection between Sites B & C.

Once I hear back from you re the cost/SRV association I'll make the change.

Cheers,

Joel
0
 
Dan McFaddenSystems EngineerCommented:
My first question relates to AD Site & Services > Sites > Inter-Site Transports > IP > YourSiteLinkName.  So if you have a site link named:  SiteA-SiteB, that only "SiteA" and "SiteB" are in the "Properties > Sites in this site link" field.  SiteX should not be in a site link unless it relates to itself and another site's link.

See attached example.  Note the yellow highlighted areas.

I agree with you on which site link cost to raise.  The links with the least bandwidth should be the higher cost unless there is a well defined technology or business reason.

As for the site link cost and DNS records, nothing.  It is just of the process of troubleshooting and controlling the replication traffic preferences of AD.

Oh ha... AWS for site C.  Didn't know that.  Thought we were just dealing with a remote site.  I would check out this article that specifically speaks about deploying a AD thru AWS:  http://runyourweb.blogspot.de/2011/05/running-microsoft-active-directory-on.html

Also, a link to a PDF that addresses extending an AD infrastructure into AWS:  https://d0.awsstatic.com/whitepapers/Implementing_Active_Directory_Domain_Services_in_the_AWS_Cloud.pdf

Maybe there is something in these 2 links.

Dan
AD-Sites-n-Services.png
0
 
AVIVOLAuthor Commented:
Thanks.

Re Site membership of site links: yes, only sites involved in the SL are members. You can see the setup in my post at 2014-08-28 at 03:11:17 for more details

Changing Site Link cost: I’ve raise the site link cost of the SiteA to SiteC to 300, thereby prioritising the connection between Sites B & C. The thing is though, this may (?) resolve the Site C servers looking to Site A for DCs but won't impact Site C servers going to Site B - which is happening. Will see how it goes.

Re AWS: I mentioned AWS in my 2014-08-25 at 15:55:19 post. Maybe missed in all the detail of other posts.

Re the ref docs: I reviewed the official AWS pdf prior to launching the new site and again last week. Nothing is jumping out at me. I’ll take a look at the second ref you sent through and let you know.

So to recap, the only new change made is the Site Link cost. Will let you know how it goes.

Update: made the site link cost change. Left it until replication occurred across sites. rebooted a single Site C server in each of the following Site C subnets: 10.0.2.x, 10.0.3.x, 10.0.4.x. All servers came up showing association with Site A. So, the cost for the SiteA-<>C link was increased to 300, so as to reduce the chances of Site C servers from associating with Site A DCs, yet every  Site C server did just that: they all associated with DCs in SiteA. Go figure.

I'll reboot all 3 Site C servers a few times and check associations each time, just to see if there's a pattern.

Cheers,

Joel

PS: that last URL you recommended: it was actually on my "read" list from when I set up Site C. Everything checks out.
0
 
AVIVOLAuthor Commented:
So I ran a few restart/check site association tests. Each test involved rebooting a server in each of the Site C subnets: 10.0.2.x, 10.0.3.x, 10.0.4.x. I wanted 15-20 min between each test - results below. Curious that there was only one Site B DC association.

Test 1:
10.0.2.x server = Site A
10.0.3.x server = Site A
10.0.4.x server = Site A

Test 2:
10.0.2.x server = Site C
10.0.3.x server = Site C
10.0.4.x server = Site A

Test 3:
10.0.2.x server = Site A
10.0.3.x server = Site A
10.0.4.x server = Site A

Test 4:
10.0.2.x server = Site C
10.0.3.x server = Site B
10.0.4.x server = Site A

Test 5:
10.0.2.x server = Site B
10.0.3.x server = Site A
10.0.4.x server = Site A

So...once again, it seems very random or, at least, not favouring Site C.

Sites        # assoc            %age assoc
Site A      10                    66.66
Site B      2                      13.33
Site C      3                      20
            
Total      15                     100
0
 
Dan McFaddenSystems EngineerCommented:
After reading a bit more thru more AWS docs, I am beginning to get a feeling that the issue my lie in what clients are getting back from DNS queries.

This article speaks of internal AWS name and real or expected names of servers.   http://www.yobyot.com/cloud/how-to-use-aws-dns-in-a-vpc/2014/05/01/

Also, this is an interesting announcement from Amazon about DNS issues with 2008 servers.  https://forums.aws.amazon.com/ann.jspa?annID=1039

Can you show the DNS Interfaces tab properties from the DCs in AWS?

Also, there could be something that needs to be reviewed at the AWS networking level.
0
 
AVIVOLAuthor Commented:
Ok. We might be onto something....maybe.

The first link you provided got me thinking. AWS adds a stack of domains into the suffix section of DNS properties. I had previously tried reordering the list so my domain came first. It had made no difference. I thought I'd try a few things out to see what was being returned. I ran the following command in each test: nslookup -type=ns -debug sitec-dc-name,

Test1:
Scenario:
Ran the command on two servers - 1 with correct site association and 1 with incorrect. Both servers had my domain suffix at the BOTTOM of the suffix list (default position after adding server to the domain).
Results of nsloop query were identical for both servers - see nslookup-where-domainname-is-listed-last.txt (attached)

Test2:
Ran the command on two servers - 1 with correct site association and 1 with incorrect. Both servers had my domain suffix at the TOPof the suffix list (default position after adding server to the domain).
Results of nsloop query were identical for both servers - see nslookup-where-domainname-is-listed-first.txt (attached)

I'd really thought I was onto something. The results from Test2 gave me hope. I created a new server. I tried adding my domain suffix to the suffix list before adding the server to the domain. I rebooted. Things looked promising. The new server associated correctly with Site C. Rebooted. Again, correct association. I repeated the "new server" test just described for 2 new servers in the other to Site C subnets. No joy. All three servers (yes, including the one that worked for 2 reboots), were being randomly associated with any site.

Like you, and despite the results from Test1, I'm still suspicious that the original suffix order and the resulting query results, may be in play....somehow.

I've also attached a copy of the DNS interface tab, as per your request. Note the "org.au" entry at the bottom. That was created, for some reason, when a server is added to my domain, which is domain-name.org.au.

Re AWS networking level. As described previously, the f/w (SG) rules have been opened up to allow any<->any from/to all subnets. It can't be much more open than that.

Once again, thanks for your help. I really appreciate you persistence.

Cheers,

Joel
nslookup-where-domainname-is-listed-firs
nslookup-where-domainname-is-listed-last
DNS-interface-tab.jpg
0
 
AVIVOLAuthor Commented:
see last post for answers to your questions.

Based on the assumption that (perhaps) the members servers weren't able to contact the Site C DCs due some wacky AWS DNS issue, I tried a couple of other things, just to see what would happen.

1) I removed all suffix from DNS except my domain. flushed dns. After reboot, AWS reinserted the ap-southeast-2.ec2-utilities.amazonaws.com and org.auentry. Servers still exhibited random site association.

2) added the name/IP of the two Site DCs to the hosts file of each of the test servers (one in each Site C subnet). Theory being that if AWS dns servers were in the mix, that they wouldn't be queried if there was a record in the hosts file. And I created two entries for each DC - 1) the netbios name 2) the FQDN. Servers still exhibited random site association.

Sigh....
0
 
AVIVOLAuthor Commented:
..note there are two previous posts before this one.

I've attached the contents of the netlogon.log file of a member server in Site C. The contents of the files are from a the server was added to the domain and restarted. The “*****” entries are lines that got my attention.

Specifically these:

09/03 15:59:07 [SITE] DsrGetSiteName: Returning site name ‘SITEA' from local cache.
I'm wondering if no matter what we change, the local cache is being used. Using cached records is mentioned throughout the file.

And this: 09/03 15:59:11 [CRITICAL] NetpDcGetNameIp: SITEA-DC(ALSO PDC).my-domain.org.au: No data returned from DnsQuery.

And this: 09/03 16:00:04 [CRITICAL] NetpDcGetName: SITEA-DC(ALSO PDC).my-domain.org.au: IP and Netbios are both done.
netlogon-log-file.txt
0
 
Dan McFaddenSystems EngineerCommented:
Well, found a support issue on Microsoft, indicating this is a known issue and there is a hotfix available for systems that exhibit this symptom.  Reference:  http://support.microsoft.com/kb/2666938

 Hotfixes from MS can be tricky, I've deployed 5 hotfixes in the last 10 years and only have had success with 2.  So, thought its an option, I would recommend testing if this works:

Reference: http://technet.microsoft.com/en-us/library/cc978016.aspx

<quote>
Never change dynamically determined values. To override the dynamic site name, add the SiteName entry with the REG_SZ data type in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters. When a value is present for the SiteName entry, the DynamicSiteName entry is not used.
</quote>

I suggest using the SiteName reg entry stated above and testing if this resolves the issue.  If not, then I would try for the hotfix, but I would see if you can get support from Amazon first, maybe they have experienced the issue and have a fix.

If the reg hack works, you could deploy it with GPO so you won't have constantly worry about having to manually put it in place.

Dan

PS:  I feel you pain in this issue... but solving issues like this, is what its all about.  Fighting the monster, succeeding and moving onto the next thing hiding around the corner in the dark.
0
 
AVIVOLAuthor Commented:
Hi Dan,

Thanks for your last tip. I've been away from the office so haven't had a chance to try it out. Will be in touch later this week with an update.

Cheers,

Joel
0
 
Dan McFaddenSystems EngineerCommented:
ok
0
 
AVIVOLAuthor Commented:
Hi Dan,

Well, after being worn down by this issue I've opted for the "easy" option. I've implemented a GPO that applies the sitename reg key for all servers in SiteC. It's not ideal as it'll require the IT dept to ensure Site C servers/clients are within the correct OU from the outset (something that doesn't always happen).

Once again, thanks for your perseverance.  It's been epic. If you're ever in Melbourne, Australia, I'll buy you a beer.

Cheers,

Joel
0
 
AVIVOLAuthor Commented:
Final solution was more a work around, than a fix. Dan did a great job. Knows his stuff. Thanks
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

  • 19
  • 14
Tackle projects and never again get stuck behind a technical roadblock.
Join Now