Link to home
Start Free TrialLog in
Avatar of TomPro
TomProFlag for United States of America

asked on

Slow Login with Domain Account from Separate Forrest on Windows 2008 R2 Server

Architecture is this:
We have a domain (PCS.DOMAIN.COM), which has a one-way-trust with a different domain (NORTHAMERICA.DOMAIN.COM) in a different forest (yes, different forests, even though the root domain names are identical)

I have x2 AD servers in this PCS.DOMAIN.COM, both are W2008 R2 systems.
I have x2 Terminal Servers in PCS.DOMAIN.COM, both are W2008 R2 systems.
TS1 was built from the ground up, and TS2 is an actual clone of TS1 (that has been sysprepped)

TS1:
- I can log into server with a PCS.DOMAIN.COM domain (local) account without incident.
- I can log in with a NORTHAMERICA.DOMAIN.COM domain (through the one-way trust) account without incident.

TS2:
- I can log into server with a PCS.DOMAIN.COM domain (local) account without incident.
- I can log in with a NORTHAMERICA.DOMAIN.COM domain (through the one-way trust) account, but it takes upwards of 10 minutes to log in (it hangs at applying settings for about 8-10 minutes, then quickly logs in after that).

Attached are listings of the GPSVC Log files
Note that the TS2 - Bad file shows that there are 8 minutes of time that elapse between the first line of the log and the 2nd.
Everything else happens at the same speed as the other two good files.
But there is nothing that indicates why it is taking 8 minutes for only that one machine (TS2)

Any thoughts?
TS1---Good-NORTHAMERICA-Account-.txt
TS2---Bad-NORTHAMERICA-Account-L.txt
TS2---Good-PCS-Account-Login.txt
Avatar of Korbus
Korbus

No solution here, just a few thoughts:

I assume TS1 and TS2 are configured with exactly the same network info except host name & IP address?

No good reason except past experiences, make me want to blame DNS.  
  Does the trusted network have DNS entries for the local network's servers?
  There is no machine with same name as TS2 on the trusted network, right?
  Can you confirm the TS server can resolve the trusted authentication server's domain name quickly (and vise-versa)?

I also notice a couple of entries in the slower logon that did not occur in the normal speed logon, not sure why, but perhaps this is a clue:

GetWbemServices: CoCreateInstance succeeded
ConnectToNameSpace: ConnectServer returned 0x0
CSessionLogger::Log: restoring old security grps
CSessionLogger::Log: new Site is NULL
LogRsopData: Successfully logged Rsop data
ProcessGPOs: Logged Rsop Data successfully.

Hmm, does this indicates the two TS machines are not using the same GPO?  RSOP stands for "resultant set of policies", which is the group policy elements to be applied for a particular user/computer combination, based on your group policy rules.
No good reason except past experiences, make me want to blame DNS.  
This was my very first thought.

Does the trusted network have DNS entries for the local network's servers?
This was also one of my thoughts.  I would elaborate further and say that each side of the trust, regardless if it's a one way or not, should have forwarders setup pointing to the other domain's DNS servers.
There is no machine with same name as TS2 on the trusted network, right?
Again, another one of my thoughts.  Taking it further again, I'd make sure of this, and also no two IPs can be the same; this is obvious as you would have got a conflict message and would probably know it, but in the land of trusts and VPNs I've seen more problems caused by same subnets on two different clients.

Are these TS machines in the same OU in AD?
Avatar of TomPro

ASKER

Korbus -

Thank you for your responses.

Answers with respect to your questions:
1) Yes, trusted network has DNS for local network's servers
Yes, I have confirmed that trusted network can resolve correct IP for TS2 (and TS1)
Yes, I have confirmed that computers in trusted network can properly ping TS2 (and TS1)

2) No, there are no machines with the same name as TS2 in the trusted network

3) Yes, TS1 and TS2 can both resolve trusted network's domain name quickly, and can ping the name

4) I think that TS1 and TS2 should use the same RSOP.
Both machines are in the same OU, which inherits the same set of GPOs.
Default Domain Policy
Terminal Server Policy
And if your reading of those alarms is accurate, then TS1 has no problem parsing the GPOs and/or applying them to that same user from the Trusted domain.
Avatar of TomPro

ASKER

Brad -

Answers to your questions:

I would elaborate further and say that each side of the trust, regardless if it's a one way or not, should have forwarders setup pointing to the other domain's DNS servers.
Yes - both domains have forwarders working in both directions, and TS1 (as well as other machines in the same local domain) has no problem processing the login from the trusted domain's user.

Taking it further again, I'd make sure of this, and also no two IPs can be the same; this is obvious as you would have got a conflict message and would probably know it, but in the land of trusts and VPNs I've seen more problems caused by same subnets on two different clients.
We have strict policies about naming of systems, and the names of both TS1 and TS2 indicate that they are the only two machines in all networks (local domain and all other corporate domains (including Trusted Domain)) that have that same name (obviously TS1 and TS2 are not the actual system's names).
I've also done domain, DNS, and WINS searches in the trusted domain and find no other machines with the same name.
Private IPs are allocated from corporate IT, and as such no one other than our systems can use that subnet.
I've double checked with corporate IT that no other locations use that same subnet.
In "Trusted Domain," I have connected to the PDC itself, and the secondary DC (the one that our local domain actually authenticates with) and both servers resolve the name of TS2 properly, and route properly to the TS2 machine itself.

And as stated before, yes, both TS1 and TS2 are in the same OU/share the same GPOs
Some more questions (bare with me):

1.  What sort of distance is in between these domains/sites?
2.  What sort of Internet connection does each site have?
3.  Are there firewalls at each site?
4.  Are there any virus/security scanners on either end?
SOLUTION
Avatar of Brad Bouchard
Brad Bouchard

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok, good answers.   Hmm, I'm just fishing now:

Is TS2 a newly setup machine, or has it been in production?  In particular, I'm wondering if TS2 logons were working without delay in the past?

Can we do a sanity check on the physical connections: maybe switch the network cables between ts1 <-> ts2, then test to see if the problem followed the cable.  Or even just a ping /t from ts2 to the trusted networks DC (for about 10 minutes), to confirm no loss of connection.

The logs you initially posted are from the DC on the trusted domain, is this right?
  If so, can we take a look at the TS2 logs too?
  If NOT, what are they from, and can we look at the logs from the trusted DC?

Note sure how you did the clone, are Ts1 and Ts2 virtual machines?  If so, make sure the virtual NIC's have different MAC addresses.

Possible down & dirty resolution: unjoin & rejoin TS2 from/to the domain.
Avatar of TomPro

ASKER

Sorry about the delay, had to move to a different project for a few days.  Back to this now....

Some more questions (bare with me):

1.  What sort of distance is in between these domains/sites?
2.  What sort of Internet connection does each site have?
3.  Are there firewalls at each site?
4.  Are there any virus/security scanners on either end?


1. It's only the one machine (TS2) that has a problem.  TS1 and TS2 are both in the same building, and the remote domain is at the same physical location as the local domain.

2. It's a LAN connection between domains

3. Yes there are firewalls, but all firewall rules are identical for TS1 and TS2, so no justification why one works and the other doesn't

4. Yes, SEP corporate, but again, all applications are identical between all machines in local domain.
Avatar of TomPro

ASKER

Brad -

This is for 2003, but have a look; specifically the reply that talks about Loopback.  http://social.technet.microsoft.com/Forums/en-US/0914bf38-2980-414c-aa28-419f367c9fcd/cross-forest-domain-user-taking-delay-login-time-into-my-domain?forum=winserverDS

Also, I know you already covered this, but make sure of these:
http://social.technet.microsoft.com/Forums/windowsserver/en-US/f21e14a3-4544-4c76-a00b-0b3080c3235b/slow-ad-authentication-across-domains

http://blogs.technet.com/b/grouppolicy/archive/2013/05/23/group-policy-and-logon-impact.aspx


I'm using the same user, so it can't be that one user has directory redirect and the other doesn't.

DNS tests were completed, and DNS is good on both sides.  Again, if the DNS was the issue, then I'd think that it would fail on all systems, not just the one.
Avatar of TomPro

ASKER

Korbus:

Ok, good answers.   Hmm, I'm just fishing now:

Is TS2 a newly setup machine, or has it been in production?  In particular, I'm wondering if TS2 logons were working without delay in the past?

Can we do a sanity check on the physical connections: maybe switch the network cables between ts1 <-> ts2, then test to see if the problem followed the cable.  Or even just a ping /t from ts2 to the trusted networks DC (for about 10 minutes), to confirm no loss of connection.

The logs you initially posted are from the DC on the trusted domain, is this right?
  If so, can we take a look at the TS2 logs too?
  If NOT, what are they from, and can we look at the logs from the trusted DC?

Note sure how you did the clone, are Ts1 and Ts2 virtual machines?  If so, make sure the virtual NIC's have different MAC addresses.

Possible down & dirty resolution: unjoin & rejoin TS2 from/to the domain.




System Age:  TS2 is a clone of TS1, so technically TS1 is older than TS2, but TS2 has never functioned properly.

Sanity Check:
Both systems are VM guests, so don't have physical network connections, but there are other systems on the same VM host as TS2 that don't have a the same problem as TS2.  TS2 is only system in the local domain of maybe a dozen guests that all function properly.
Continuous pings from TS1, TS2 and SQL2 (both guests on the same VM host, SQL2 functions properly) all have identical connectivity response times/success

GPSRC Logs:
Logs were from TS1 and TS2 themselves, not the domain controller.  Names of the files indicate which machine they are from, and whether they are a successful (short) login or an unsuccessful (long duration) login.
Please note the first two lines of the file that is marked as bad:
GPSVC(34c.e68) 16:42:51:418 Setting GPsession state = 1
GPSVC(34c.428) 16:50:07:309 SID = S-1-5
GPSVC(34c.428) 16:50:07:309 bMachine = 0 
GPSVC(34c.428) 16:50:07:309 Setting GPsession state = 1
GPSVC(34c.428) 16:50:07:309 Message Status = <Applying user settings...>
GPSVC(34c.ffc) 16:50:07:309 StartTime For network wait: 29843ms

Open in new window

The duration of time between the first call and the second is 8 minute .
Can one of you explain what is happening during these two steps?

Unfortunately, I can look at the DNS and the configuration of the trusted domain controller, but I don't have the rights to look at the GPSVC logs for it, so I can't get you that full listing, but I did talk to the admin, and he said that their logs show no delays in processing the request.

Cloning:
TS1 and TS2 are both vmware guests, and one was a clone of the other.
Sysprep was run on both machines.
I need to check on MAC Addresses, to confirm, but I think we'd be having a lot of other communication issues if/when we tried to communicate with the machine if we had a duplicate MAC, but I will check now anyway.

Unjoin the Domain:
We already pulled TS2 out of the domain previously, and it did not fix the issue.
Avatar of TomPro

ASKER

MAC Addresses:   Confirmed, MACs are different.


I'm starting to ask myself whether we should just be nuking the TS2 system, and just cloning it again from TS1 to see if we can fix the problem.
Thoughts?
TS2 system, and just cloning it again from TS1 to see if we can fix the problem.
At this point that is probably wise.
Unless you have something else to try, might as well try the re-clone.  
I'm out of ideas too.  

Not sure WHAT is happening behind the scene's between those two log entries.   The first line mentions GP session state- googling that lead me to some "slow logon" posts that recommend http://support.microsoft.com/kb/2561285
Avatar of TomPro

ASKER

Because of the holiday weekend, we couldn't make any changes Thursday or Friday.
We'll look into trying to install the hotfix on probably Monday or so.
If it doesn't work, then we'll probably resort to old-reliable (nuke and re-clone).
Heya, just had a thought on a test:
If you generate a RSOP(resultant set of policies) in the group policy applet, specifying that TS2 computer and a random user:  does THAT take a long time to process?  When procesing is complete, do you notice anything odd in the applied settings?

If so, we can narrow the issue search down to group policy processing.
Avatar of TomPro

ASKER

I've run the RSOP test, and it didn't delay, so I'm feeling like it's still somewhere in the profile somewhere for the specific users....like a home drive redirect that is getting blocked by a firewall.
We have GPO settings that deny that redirect and they are being applied properly on the TS1, so it looks like for some unknown reason, TS2 isn't processing the GPO the same way??

Either way, I think we agree with both of your suggestions above: it's time to look into a re-clone of the TS1 to replace TS2.

I'm going to keep this question open until the clone is complete so that I can document the success or failure of that as an option.  I'll let you know how it goes.
>>so it looks like for some unknown reason, TS2 isn't processing the GPO the same way??

That's my favorite thing about using RSOP.  Generate an RSOP for a problem user twice, once for TS1, once for TS2.  Then compare the final applied settings (look for a "settings" tab) generated by each RSOP to confirm they match exactly.  (Of course, if it's not processing some issued command in the same way, that's a different story.)
Avatar of TomPro

ASKER

I did compare the RSOP's and they were identical.

That's the point I was trying to make:  The RSOP happened properly, but it looks like that RSOP isn't actually being applied properly within windows.  There are settings (disallowing home directory redirection, for instance) that are not actually set properly once that user waits the 8 minutes for logon and then I check the local security settings.
That's why I said that it looks like the Windows OS on TS2 isn't processing the GPO, even though its getting an accurate listing of policies.  It's looking like some security rights or some part of the GPO processing (turning the RSOP into actual registry settings changes) is scrogged.

We've scheduled an outage for Thursday this week to take the TS1 out of production and clone it so we can see if that will fix the problem.
Oh, I understand now.  Thanks for clarification, I have not seen that before.   So, why can't/wont it modify the registry on the workstation?

Makes me want to check some security settings on the TS2. Lets confirm that the domain groups like domain administrators, is indeed listed as a member of the machines LOCAL administrator group, and domain users group is listed in the local users group.  I think there are a couple other group membership that need to be in place, but can't look those up right now.

Another way to test this, (for domain administrator at least), is by opening regedit on the TS1 server, then try to connect it to TS2 (file->connect network registry).  Once connected make, and then undo a random change, to confirm edit permissions.
Avatar of TomPro

ASKER

Update:
We did a full system clone of TS1 and then sysprepped and renamed the clone to TS2.
You'd think this would fix all problems.

But new TS2 still has all the same problems as old one.
So now I'm stumped.
I thought it was internal, so replacing with a clone of a good functioning system should have fixed the problem, but not so.

Maybe there are settings in the registry on TS1 that are allowing it to work properly, and sysprepping the system zero out those settings?

On the infrastructure side: everything is identical on those two systems:
Same domain
Same OU within that domain
Same IP subnet
Same firewall rules
Different VMware hosts, but both hosts are connected to the same switch, and all other guests that are on the host with TS2 are functioning properly.

What am I doing wrong here?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of TomPro

ASKER

We found that none of us understood why it was failing, so there the solution to start from scratch is best, and starting from scratch was previously suggested by both Brad and Korbus.

I'm also selecting as secondary solutions both Brad's suggestion of where to look for failing GPO processing and Korbus's suggestion that the secondary problem might have been sysprep related, and I've given partial points to both of them.